- Acquire and ingest real-time events from external API using google Pub/Sub
- Pipeline using gcloud library
- Pipeline using Apache Beam
- Acquire and ingest static files
- Store raw data in BigQuery
- Create new BigQuery table using transformation (handle real-time ?)
- Data visualization using google cloud tools (datastudio?) and/or WebApp (AppEngine)
(This image is a property of Google)
Cloud Pub/Sub -> Apache Beam+Cloud DataFlow -> BigQuery
Using 1 Cloud Compute Engine instance, but can run in properly configured local environment
First you MUST modify set_env_encrypted.sh to match your environment
sudo sh install-deps.sh
source ./set_env_encrypted.sh
python3 publish_api_data.py
python3 pipeline_streaming.py --streaming
** OR **
python3 pipeline_streaming.py --project $PROJECT --temp_location $BUCKET/tmp --staging_location $BUCKET/staging --streaming
python3 pipeline_streaming_beam.py --streaming
python3 pipeline_streaming_beam.py --project $PROJECT --temp_location $BUCKET/tmp --staging_location $BUCKET/staging --streaming