Skip to content

This repository contains the implementation of an ELT (Extract, Load, Transform) pipeline for a Bike Store dataset using modern data tools. The pipeline integrates Airbyte for data extraction, dbt for data transformation, Airflow for orchestration, and Snowflake as the data warehouse.

Notifications You must be signed in to change notification settings

nabilraihann/ELT-Pipeline-Bike-Store

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ELT Pipeline Bike Store with Airflow, Aribyte, dbt, and Snowflake

This repository contains the implementation of an ELT (Extract, Load, Transform) pipeline for a Bike Store dataset using modern data tools. The pipeline integrates Airbyte for data extraction, dbt for data transformation, Airflow for orchestration, and Snowflake as the data warehouse.

Overview

The purpose of this project is to demonstrate a complete ELT pipeline setup, where:

  • Airbyte extracts data from various sources and loads it into Snowflake.
  • dbt (Data Build Tool) transforms the raw data into a usable format for analytics.
  • Airflow orchestrates the data workflow, ensuring that tasks are executed in the correct order and on schedule.
  • Snowflake acts as the centralized data warehouse where all data is stored and queried.

elt-pipeline-design

Airflow DAG

airflow-dag

Task Definition

  • airbyte-sync-bike-store = This task triggers a sync operation in Airbyte, which is used to extract data from Bike Store OLTP database and load it into Snowflake data warehouse.
  • dbt-test = This task triggers the dbt test command within the dbt project. It runs tests on your data models to ensure their validity and accuracy, identifying any issues before data transformations are applied.
  • dbt-snapshot = This task triggers the dbt snapshot command within the dbt project. It captures and stores the current state of your data, allowing you to track historical changes over time.
  • dbt-run = This task triggers the dbt run command within the dbt project. It executes the data transformation models defined in dbt, applying them to the data to prepare it for analysis.

Task Dependencies

  • airbyte_sync >> [dbt_snapshot, dbt_test]: This defines the dependencies, where airbyte_sync must complete successfully before both dbt_snapshot and dbt_test can start.
  • dbt_snapshot >> dbt_run << dbt_test: This indicates that both dbt_snapshot and dbt_test must complete before dbt_run starts. It also shows that dbt_snapshot and dbt_test are parallel tasks.

dbt DAG

About

This repository contains the implementation of an ELT (Extract, Load, Transform) pipeline for a Bike Store dataset using modern data tools. The pipeline integrates Airbyte for data extraction, dbt for data transformation, Airflow for orchestration, and Snowflake as the data warehouse.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages