Movies-ETL

Objective

Create an automated ETL pipeline
- Extract data from multiple sources
- Clean and transform the data automatically using Pandas and regular expressions
- Load new data into PostgreSQL

Technologies

Write Python script that performs all three ETL steps on Wikipedia and Kaggle dataset

This project involves extracting, transforming, and loading into a database. Extraction was made from Wikipedia and Kaggle. Transformation involves cleaning and joining relevant datasets. Loading the data involves loading cleaned dataset into a SQL database.

Assumptions

Due to the large amount of data, only first 5 and last 5 records were observed. We're assuming the rest of the data is intact and usable.
There are no hidden duplicates entries that could not be dropped by dropping duplicates.
Any entries that are dropped due to NA are worthed the same as any other entries. We hope that we're not dropping important entries.
Try-Except blocks were used to help automate ETL script by not halting or producing corrupt data during ETL process.
Data connection through pgAdmin is functional.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
.ipynb_checkpoints		.ipynb_checkpoints
__pycache__		__pycache__
.gitignore		.gitignore
Movies_ETL.ipynb		Movies_ETL.ipynb
README.md		README.md
challenge.py		challenge.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Movies-ETL

Objective

Technologies

Assumptions

About

Releases

Packages

Languages

hhnguyenn/Movies-ETL

Folders and files

Latest commit

History

Repository files navigation

Movies-ETL

Objective

Technologies

Assumptions

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages