Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide the option for Materialized Views for Silver - not just Streaming Tables #119

Open
mweirath opened this issue Nov 11, 2024 · 1 comment

Comments

@mweirath
Copy link

We have seen one issue going into Silver. It looks like our only option right now is streaming tables. This has caused some challenges because our users would like to be able to use Time Travel on tables in Silver. We also have some technical use cases that would benefit from Time Travel on Silver. While this appears to work on the SQL Warehouse it isn't support on the cluster/spark.

In reviewing the code and documentation, it appears the choice between a materialized view (which should support Time Travel) vs. streaming table is based on how you read the underlying table. I think this code in dataflow_pipeline might be our culprit, due to this section of code that always using the "readStream.table" function. Is there anyway we might be able to create a non-streaming option for silver?

    def get_silver_schema(self):
        """Get Silver table Schema."""
        silver_dataflow_spec: SilverDataflowSpec = self.dataflowSpec
        source_database = silver_dataflow_spec.sourceDetails["database"]
        source_table = silver_dataflow_spec.sourceDetails["table"]
        select_exp = silver_dataflow_spec.selectExp
        where_clause = silver_dataflow_spec.whereClause
        raw_delta_table_stream = self.spark.readStream.table(
            f"{source_database}.{source_table}"
        ).selectExpr(*select_exp) if self.uc_enabled else self.spark.readStream.load(
            path=silver_dataflow_spec.sourceDetails["path"],
            format="delta"
        ).selectExpr(*select_exp)
        raw_delta_table_stream = self.__apply_where_clause(where_clause, raw_delta_table_stream)
        return raw_delta_table_stream.schema

@ravi-databricks
Copy link
Contributor

dlt-meta follows medallion architecture hence bronze and silver would be streaming tables and gold can be MVs. Once sql support comes to dlt-meta we can think of adding MVs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants