For the original ToteSys project, please see here.
This project implements a robust, serverless data processing platform that extracts data from an operational database, archives it in a data lake, and transforms it to be loaded into an easily accessible OLAP data warehouse. It is designed to be reliable, scalable and fully automated.
This platform includes the following key functions:
- Extracts data from a PostgreSQL database at regular intervals
- Stores raw data in a data lake for archival purposes
- Transforms the data to conform to a star schema optimised for analytical queries
- Loads the transformed data into a cloud-based data warehouse
- Ensures data consistency, with a maximum delay of 30 minutes from source to warehouse
The original solution used Amazon Web Services, but this solo iteration will be using Azure, requiring a rewrite of the Terraform configuration and Python code.
The deadline for completion is the end of September.
Below are the contributors to the original Totesys project.
Ellie Symonds |
Lianmei Manon-og |
Tolu Ajibade |
Joslin Rashleigh |
Anzelika Belotelova |
Alex Schofield |