This repo contains the code for End to End Distributed Deep Learning Process Pipeline.
The Process happens in 7 steps:
- Real-Time Streaming Data or Batch Data is captured using Debezium.
- Captured Stream or Batch Data is pushed as Apache Kafka Topics using Kafka Connectors.
- Apache Flink is used to perform ETL operations.
- The Streaming/Batch Data Predictions are received from Models Deployed using TensorFlow Serving on Docker.
- Frequent Data Caching is achieved with RocksDB.
- Once the required predictions are made, all the data is pushed into Apache Druid where further processing takes place.
- The data present in Druid is now very powerful and can be used for making personalized predictions, cancellation probabilities, time-series forecasting etc.
Made with ❤️ by Praneet Pabolu