Skip to content

A solo continuation of the ETL pipeline project during the Data Engineering course at Northcoders

Notifications You must be signed in to change notification settings

ajschofield/ETL-Project

Repository files navigation

ETL-Project

Python Azure Terraform Postgresql GitHub Actions

For the original ToteSys project, please see here.

This project implements a robust, serverless data processing platform that extracts data from an operational database, archives it in a data lake, and transforms it to be loaded into an easily accessible OLAP data warehouse. It is designed to be reliable, scalable and fully automated.

This platform includes the following key functions:

  • Extracts data from a PostgreSQL database at regular intervals
  • Stores raw data in a data lake for archival purposes
  • Transforms the data to conform to a star schema optimised for analytical queries
  • Loads the transformed data into a cloud-based data warehouse
  • Ensures data consistency, with a maximum delay of 30 minutes from source to warehouse

The original solution used Amazon Web Services, but this solo iteration will be using Azure, requiring a rewrite of the Terraform configuration and Python code.

The deadline for completion is the end of September.

Original Contributors

Below are the contributors to the original Totesys project.

ellsymonds
Ellie Symonds
lian-manonog
Lianmei Manon-og
T-Aji
Tolu Ajibade
HastarTara
Joslin Rashleigh
bulve-ad
Anzelika Belotelova
ajschofield
Alex Schofield

About

A solo continuation of the ETL pipeline project during the Data Engineering course at Northcoders

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published