Skip to content

Car traffic acquisition and analysis in real-time using Google Cloud Platform

Notifications You must be signed in to change notification settings

nQuery512/gcp-realtime-API-pipeline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

28 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Realtime Google Cloud Pipeline with and without Apache Beam

  • Acquire and ingest real-time events from external API using google Pub/Sub
  • Pipeline using gcloud library
  • Pipeline using Apache Beam
  • Acquire and ingest static files
  • Store raw data in BigQuery
  • Create new BigQuery table using transformation (handle real-time ?)
  • Data visualization using google cloud tools (datastudio?) and/or WebApp (AppEngine)

Realtime pipeline schema (This image is a property of Google)

Ingestion pipeline description

Cloud Pub/Sub -> Apache Beam+Cloud DataFlow -> BigQuery

Using 1 Cloud Compute Engine instance, but can run in properly configured local environment

How to run

First you MUST modify set_env_encrypted.sh to match your environment

Prerequisite

sudo sh install-deps.sh
source ./set_env_encrypted.sh
 

Publisher

python3 publish_api_data.py 

Pipeline without Apache Beam

Pipeline Sub -> BigQuery in local mode (without DataFlow)

python3 pipeline_streaming.py --streaming

** OR **

Pipeline Sub -> BigQuery with DataFlow

python3 pipeline_streaming.py --project $PROJECT --temp_location $BUCKET/tmp --staging_location $BUCKET/staging --streaming

Pipeline using Apache Beam

Pipeline Sub -> BigQuery in local mode (without DataFlow)

python3 pipeline_streaming_beam.py --streaming

Pipeline Sub -> BigQuery with DataFlow

python3 pipeline_streaming_beam.py --project $PROJECT --temp_location $BUCKET/tmp --staging_location $BUCKET/staging --streaming

About

Car traffic acquisition and analysis in real-time using Google Cloud Platform

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published