Crowdfunding_ETL

Builded an ETL pipeline using Python, Pandas, Python dictionary methods and regular expressions to extract and transform the data. Created four CSV files and use the CSV file data to create an ERD and a table schema. Finally, uploaded the CSV file data into a Postgres database. It involves extracting data from multiple sources, cleaning and transforming the data using Jupyter Notebook with pandas, numpy, and datetime packages, and loading the cleaned data into a relational database using pgAdmin.

Extract, Transform and Load the Crowdfunding Data

Hey, let's play with "Crowdfunding" data files and win the Data Science by extracting completely, transforming appropriately and loading in the PgAdmin engine as Postgres database.

Introduction

Current trends in the world of economic funding and investment have revealed exponential growth with regard to crowdfunding. Crowdfunding works by taking small sums of capital from a variety of people in order to fund up and coming business ideas and projects. Crowdfunding campaigns have proven to be very successful by raising funds without the upfront fees.

This ETL process for 'Crowdfunding' data break into two deliverables.

Deliverable 1: Extract & Transform Using Jupyter Notebook.

Deliverable 2: Load the data to Postgres Database.

Deliverable 1: Extract & Transform Using Jupyter Notebook

1.1 Prerequisites

Before you begin, ensure you have the following installed:

Python 3.6 or higher
Numpy
JSON
Pandas (for data analysis)

1.2 Data Sources

We get the data resources from two files 'crowdfundig.xlsx' and 'contacts.xlsx' using Pandas.

1.3 Create the Category and Subcategory DataFrames

Extract and transform the crowdfunding.xlsx data to create a 'crowdfunding_info_df' DataFrame.

Split each "category & sub-category" column value into "category" and "subcategory".

To create the category and subcategory identification numbers, use a list comprehension to add the "cat" string or the "subcat" string to each number in the category or the subcategory array, respectively.

Create the category DataFrame as 'category_df' and subcatgeory DataFrame as 'subcategory_df'.

Show the 'category_df' DataFrame with top five rows.

Show the 'subcategory_df' DataFrame with top five rows.

1.4 Create the Campaign DataFrame

Create a copy of the 'crowdfunding_info_df' to transform the crowdfunding.xlsx data.

Rename the specific columns and sets their appropriate data types for 'campaign_df' DataFrame.

Convert the 'launched_date' and 'end_date' columns to UTC datetime format.

Drop the unwanted columns for campaign DataFrame.

Confirm the number of columns after dropping in the DataFrame and then export the campaign DataFrame as campaign.csv.

1.5 Create the Contacts DataFrame ( Using Python Dictionaries)

Extracting and transforming the data from the 'contacts.xlsx' excel data.

Iterate through the 'contact_info_df' DataFrame to get the data values of all rows in a list.

Create a 'new_contact_info_df' DataFrame for contacts data.

Split each "name" column value into a first and last name, and place each in a new column.

Reorder the columns and Display the new DataFrame i.e. 'new_contact_info_df' with first ten rows.

1.6 Create the Contacts DataFrame ( Using Regular Expression)

Extracting and transforming the data from the 'contacts.xlsx' excel data into 'regex_contact_info_df' DataFrame.

Extract the four-digit contact ID number. Extract the "contact_id", "name", and "email" columns by using regular expressions.

Split each "name" column value into a first and a last name, and place each in a new column.

Display the created 'new_regex_contact_info' DataFrame with first ten rows.

Deliverable 2: Load the Data to Postgres Database

2.1 Prerequisites

PostgreSQL database
pgAdmin

2.2 Data Modeling

Inspect the four CSV files, and then sketch an Entity Relationship Diagram of the tables.

To create the sketch, we use a QuickDBD as one of best tool for design ERD.

Here is how the database model prepared.

During the inspection of the data files, we discovered some dependencies between the tables with repect to the columns of data.

We set the 'contact_id' as primary key for 'contacts' table, however in the 'campaign' table this acts as a foreign key as the 'campaing' data is dependent to 'contacts' data. Considering this dependency, we set many to one relationship between them.

Likewise, the 'campaign' table is dependent on the category and subcategory table with 'category_id' and 'subcategory_id' as foreign keys. Considering the unique values the same fields are primary keys in the 'category' and 'subcategory' table.

2.3 Data Engineering

Create the Database and Table Schema

Create the 'crowdfunding_db' database in pgAdmin using SQL.

Create a table schema for each of the four CSV files.

Specify the data types, primary keys, foreign keys, and other constraints.

Create the tables in the correct order to handle the foreign keys.

Import Data Files

we can see how successfully the process for all four files completed.

Tables Data

Initially, we wrote the below 'select' queries in order to fetch the data from their respective tables.

Tables data at a glance:

Display the results of 'contacts' table for the first 10 rows.

Exhibit the results of 'category' table for the first 10 rows.

List the results of 'subcategory' table for the first 10 rows.

Show the results of 'campaign' table for the first 10 rows.

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
.ipynb_checkpoints		.ipynb_checkpoints
Crowdfunding Database Files		Crowdfunding Database Files
Images		Images
Output Files		Output Files
Resources		Resources
.DS_Store		.DS_Store
.gitattributes		.gitattributes
ETL_Mini_Project.ipynb		ETL_Mini_Project.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Crowdfunding_ETL

Extract, Transform and Load the Crowdfunding Data

Introduction

Deliverable 1: Extract & Transform Using Jupyter Notebook

1.1 Prerequisites

1.2 Data Sources

1.3 Create the Category and Subcategory DataFrames

1.4 Create the Campaign DataFrame

1.5 Create the Contacts DataFrame ( Using Python Dictionaries)

1.6 Create the Contacts DataFrame ( Using Regular Expression)

Deliverable 2: Load the Data to Postgres Database

2.1 Prerequisites

2.2 Data Modeling

2.3 Data Engineering

Create the Database and Table Schema

Import Data Files

Tables Data

Authors

Jalees Moeen GitHub

Maira Syed GitHub

About

Releases

Packages

Contributors 2

Languages

JaleesMoeen/Crowdfunding_ETL

Folders and files

Latest commit

History

Repository files navigation

Crowdfunding_ETL

Extract, Transform and Load the Crowdfunding Data

Introduction

Deliverable 1: Extract & Transform Using Jupyter Notebook

1.1 Prerequisites

1.2 Data Sources

1.3 Create the Category and Subcategory DataFrames

1.4 Create the Campaign DataFrame

1.5 Create the Contacts DataFrame ( Using Python Dictionaries)

1.6 Create the Contacts DataFrame ( Using Regular Expression)

Deliverable 2: Load the Data to Postgres Database

2.1 Prerequisites

2.2 Data Modeling

2.3 Data Engineering

Create the Database and Table Schema

Import Data Files

Tables Data

Authors

Jalees Moeen GitHub

Maira Syed GitHub

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages