A helm chart to deploy a set of CDC replication connectors to create a data lake from a set of distributed databases
Deployment integrates the following components
- Metabase Data Reporting
- Strimzi Kafka Broker
- Zalando PostgreSQL Data Warehouse
- Strimzi Kafka Connect cluster
- Strimzi Kafka Schema Registry
- Strimzi Kafka Connect PostgreSQL sink
- Strimzi Kafka Connect PostgreSQL sources
helm repo add dataplane https://nephelaiio.github.io/helm-dataplane/
helm repo update
helm install dataplane/dataplane
This is an example values definition for replicating pagila db:
metabase:
admin:
email: metabase@nephelai.io
ingress:
enabled: true
className: nginx-private
hostName: metabase.nephelai.io
cdc:
postgres:
- hostname: pagilahost
connector: pagila-connector
id: pagila
dbname: pagila
exclude:
- "public.staff"
partitions:
- source: "public.payment.*"
sink: "payment"
strimzi:
connect:
secret: "metabase-pagila-db"
kafka:
storage:
class: standard
zookeeper:
storage:
class: standard
zalando:
metabase:
class: standard
The following fields are immutable; modifying them will leave an orphan DB replication slot and will result in space exhaustion:
cdc.postgres[*].id
cdc.postgres[*].partitions[*].sink
In order of priority
- Create python package for maintenance operations
- Create and publish Topic Reroute transform
- Add support for MySQL sources
- Add monitoring for Kafka topics
- Add Opendistro deployment
Chart depends on the following cluster levels components being deployed in the target cluster
- Strimzi Kafka controller
- Zalando Postgres controller
- Nginx Ingress controller
- Storage class with ReclaimPolicy=Retain
Cluster dependencies are provisioned with role nephelaiio.k8s in testing environment
Testing is performed using molecule against a local cluster using Github Actions and can be replicated locally for the latest supported cluster version using the following commands:
make test