- This repo contains projects relating to data engineering concepts
- Further information and details about certain concepts can be found in the Intro to Basics folder
- Linux and Shell Scripting
- This project applies my abilities of Linux and shell scripting to complete a fictional scenario as a linux developer at a top-tech company.
- Building Data Pipelines with Airflow
- Apache Airflow is a great open source workflow orchestration tool that lets you build and run workflows
- This project will collect data available in different formats, and consolidate it into a single file
- Building Data Pipelines with Kafka
- Apache Kafka is a very popular open source event streaming pipeline
- This project will create a data pipeline that collects streaming data and loads it into a database using Kafka
- Building Data Pipelines with Shell
- Create a shell scripts to extract, transform, and load data
- Create and populate a PostgreSQL table
- Data Warehousing with Postgres
- Apply my knowledge and skills to design and load data into a data warehouse using facts and dimension tables
- Write aggregation queries using CUBE and ROLLUP functions and create materialized query tables (materialized view)
- NoSQL with MongoDB, Cassandra and IBM Cloudant
- This project applies my abilities to work with several NoSQL databases to move and analyze data
- Move data from one type of database to another and run basic queries on various databases
- Data Engineering and Machine Learning with Spark
- Use Apache Spark for Data Engineering and Machine Learning
- Create a Spark application end-to-end that includes ETL and model training