Transition Recap to a metadata gateway

After flailing around a bit, I think I've figured out how to split Recap up. I have created [Twister](https://twister.dev/), which is a Java project that converts Avro/Proto to and from Java POJOs. Separately, I have created the [Recap Type Spec](https://recap.build), which defines how Recap models schemas. This repo will now become the metadata gateway portion of Recap. It will wrap metadata sources (data catalogs, DB information_schemas, data lake catalogs, schema registries, etc) in a single shared API. This should allow data engineers and infrastructure developers to build software that works with an organization's stack, whether they use Datahub, Buf.build, Confluent schema registry, Amundsen, Marquez, or all of the above. Step one is an MVP that has: 1. Confluent schema registry + Avro support 2. SQLAlchemy support The Recap type spec is used as the schema API. Future work includes adding more integrations, hardening the existing integrations (which are laughably incomplete), and adding support for lineage using OpenLineage's API as the common format. I might also add write support (not just read), so you can write metadata without worrying about which catalog it's going to.
gabledata · Jun 1, 2023 · 159e009 · 159e009
1 parent eff394d
commit 159e009
Show file tree

Hide file tree

Showing 30 changed files with 1,694 additions and 1,921 deletions.
diff --git a/pdm.lock b/pdm.lock
diff --git a/pyproject.toml b/pyproject.toml
@@ -74,6 +74,9 @@ style = [
     "pylint>=2.16.1",
     "pyright>=1.1.293",
 ]
+kafka = [
+    "confluent-kafka>=2.1.1",
+]
 
 [tool.isort]
 profile = "black"

diff --git a/recap/catalog/client.py b/recap/catalog/client.py
diff --git a/recap/catalog/crawler.py b/recap/catalog/crawler.py
diff --git a/recap/catalog/server.py b/recap/catalog/server.py
diff --git a/recap/catalog/storage.py b/recap/catalog/storage.py
diff --git a/recap/cli.py b/recap/cli.py