Upgrade schema-ddl to 0.19.4 #1230

oguzhanunlu · 2023-04-04T13:00:08Z

batch transformer is to be tested

voropaevp

Nice work! Turned out to be smaller than my expectations.

It would be nice to have Processing.spec changes for batch and stream. As they are the closest thing that we got to the integration test.

...c/main/scala/com/snowplowanalytics/snowplow/rdbloader/common/transformation/Flattening.scala

modules/loader/src/main/scala/com/snowplowanalytics/snowplow/rdbloader/db/Migration.scala

...redshift-loader/src/main/scala/com/snowplowanalytics/snowplow/loader/redshift/Redshift.scala

modules/loader/src/main/scala/com/snowplowanalytics/snowplow/rdbloader/db/Migration.scala

oguzhanunlu · 2023-07-20T15:55:30Z

rebased on top of develop hence force push, fixed table creation in migration which got broken after splitting statements, tests still continue

istreeter

This is looking good @oguzhanunlu ! I need to see your changes to schema-ddl before I can finish the review.

Also pointing out that #1287 will need rebasing onto your latest changes in this PR.

istreeter · 2023-08-16T16:58:34Z

.../src/main/scala/com/snowplowanalytics/snowplow/rdbloader/common/transformation/package.scala


 package object transformation {

  private val Formatter = DateTimeFormatter.ofPattern("yyyy-MM-dd HH:mm:ss.SSS")

-  type PropertiesKey = (SchemaListKey, StorageTime)
-  type PropertiesCache[F[_]] = LruMap[F, PropertiesKey, Properties]
+  type ShredModelCache[F[_]] = LruMap[F, SchemaKey, ShredModel]


In the previous version we used StorageTime as part of the key for the cache. We did this so that the cache entry expires in sync with when the iglu-scala-client cache expires. Is it possible to keep StorageTime as part of the cache key in the new version?

good point, just pushed a new commit

istreeter · 2023-08-16T17:17:16Z

...redshift-loader/src/main/scala/com/snowplowanalytics/snowplow/loader/redshift/Redshift.scala

@@ -191,7 +246,7 @@ object Redshift {
                         | ACCEPTINVCHARS
                         | $frCompression""".stripMargin
                  case ShreddedType.Tabular(_) =>
-                    sql"""COPY $frTableName FROM '$frPath'
+                    sql"""COPY $frTableName ($frColumns) FROM '$frPath'


istreeter · 2023-08-16T17:25:27Z

...redshift-loader/src/main/scala/com/snowplowanalytics/snowplow/loader/redshift/Redshift.scala

                val frPath = Fragment.const0(shreddedType.getLoadPath)
                val frCredentials = loadAuthMethodFragment(loadAuthMethod, storage.roleArn)
                val frRegion = Fragment.const0(region.name)
                val frMaxError = Fragment.const0(maxError.toString)
                val frCompression = getCompressionFormat(compression)
+                val extraCols = ShredModelEntry.extraCols.map(_._1.replaceAll(""""""", ""))


I guess you have local changes to schema-ddl which you have not pushed to github yet? I think I need to see those changes before I comment on this section.

you're right, I just pushed a new commit at snowplow/schema-ddl#193

istreeter · 2023-08-16T17:36:04Z

...loader/src/main/scala/com/snowplowanalytics/snowplow/rdbloader/discovery/DataDiscovery.scala

-  columns: List[String]
+  wideColumns: List[String],
+  shredModels: Map[SchemaKey, MergeRedshiftSchemasResult],
+  disableMigration: List[SchemaCriterion]


Here you copy disableMigration from the config into the DataDiscovery case class. You do this so that you can use it in the Migration object.

Did you consider instead simply passing the relevant config down into Migration object but without adding it to the DataDiscovery class?

I don't know if that would look neater or not. Ultimately we should do whichever implementation looks neatest.

I suggest the alternative because DataDiscovery kinda represents information we discover dynamically from the message queue or from Iglu. And disableMigration does not seem to fit that description.

But I will leave it to your judgement whether your implementation is neater than my suggestion. I haven't thought through the details or the impact on the code.

getLoadStatements also use disableMigration to decide which table name should be used but I agree with your sentiment, I just pushed a new commit

istreeter

🎉

istreeter · 2023-08-24T20:11:00Z

...icks-loader/src/main/scala/com/snowplowanalytics/snowplow/loader/databricks/Databricks.scala

@@ -62,7 +63,8 @@ object Databricks {

          override def initQuery[F[_]: DAO: Monad]: F[Unit] = Monad[F].unit

-          override def createTable(schemas: SchemaList): Block = Block(Nil, Nil, Entity.Table(tgt.schema, schemas.latest.schemaKey))
+          override def createTable(shredModel: ShredModel): Block =
+            Block(Nil, Nil, Entity.Table(tgt.schema, shredModel.schemaKey, shredModel.tableName))


Am I right that this createTable should never get called for Databricks? Or for Snowflake?

If yes, I think I'd prefer to see it implemented as:

override def createTable(shredModel: ShredModel): Block = throw new IllegalStateException("createTable should never be called for Databricks")

Otherwise, I find it confusing to see it returns an object using shredModel.tableName.

oguzhanunlu self-assigned this Apr 4, 2023

oguzhanunlu requested a review from voropaevp April 4, 2023 13:00

pondzix force-pushed the develop branch from f37d1a7 to bc64327 Compare April 5, 2023 11:47

oguzhanunlu force-pushed the new-migration branch from 3b43079 to d99cc98 Compare April 6, 2023 14:42

voropaevp reviewed Apr 11, 2023

View reviewed changes

...c/main/scala/com/snowplowanalytics/snowplow/rdbloader/common/transformation/Flattening.scala Outdated Show resolved Hide resolved

pondzix force-pushed the develop branch from 1f2e287 to 57b1f0d Compare May 22, 2023 07:45

oguzhanunlu changed the title ~~Use new ShredModel in transformer~~ Use new schema migration logic in transformer & loader May 30, 2023

oguzhanunlu force-pushed the new-migration branch from 07f3af8 to 71ac670 Compare June 2, 2023 14:48

oguzhanunlu changed the title ~~Use new schema migration logic in transformer & loader~~ Upgrade schema-ddl to 0.19.0 Jun 2, 2023

voropaevp reviewed Jun 6, 2023

View reviewed changes

modules/loader/src/main/scala/com/snowplowanalytics/snowplow/rdbloader/db/Migration.scala Outdated Show resolved Hide resolved

oguzhanunlu force-pushed the new-migration branch 2 times, most recently from eecc449 to 2314134 Compare June 6, 2023 14:56

oguzhanunlu changed the title ~~Upgrade schema-ddl to 0.19.0~~ Upgrade schema-ddl to 0.19.4 Jul 6, 2023

oguzhanunlu force-pushed the new-migration branch from 2314134 to 52ecf5d Compare July 6, 2023 11:59

istreeter reviewed Jul 9, 2023

View reviewed changes

oguzhanunlu force-pushed the new-migration branch from 52ecf5d to 2446613 Compare July 10, 2023 13:40

istreeter approved these changes Jul 10, 2023

View reviewed changes

oguzhanunlu requested a review from istreeter July 10, 2023 15:44

istreeter reviewed Jul 12, 2023

View reviewed changes

modules/loader/src/main/scala/com/snowplowanalytics/snowplow/rdbloader/db/Migration.scala Outdated Show resolved Hide resolved

oguzhanunlu force-pushed the new-migration branch from e9ba726 to fb8d691 Compare July 20, 2023 15:53

oguzhanunlu force-pushed the new-migration branch from fb8d691 to 0754c62 Compare July 20, 2023 15:56

istreeter reviewed Aug 16, 2023

View reviewed changes

oguzhanunlu mentioned this pull request Aug 24, 2023

Disallow non-sequential version snowplow/iglu-server#135

Closed

istreeter approved these changes Aug 24, 2023

View reviewed changes

oguzhanunlu force-pushed the new-migration branch from e9354fa to b5e190a Compare September 5, 2023 11:08

oguzhanunlu mentioned this pull request Sep 5, 2023

Loader: Pre-transaction migrations must always run to completion #1287

Closed

Upgrade schema-ddl to 0.20.0 (close #1265)

51c15e9

oguzhanunlu force-pushed the new-migration branch from b5e190a to 4fa257f Compare September 5, 2023 11:42

Loader: Pre-transaction migrations must always run to completion

bc229f8

oguzhanunlu force-pushed the new-migration branch from c4091e3 to bc229f8 Compare September 5, 2023 12:55

oguzhanunlu merged commit e73eca3 into develop Sep 5, 2023
3 checks passed

oguzhanunlu deleted the new-migration branch September 5, 2023 13:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Upgrade schema-ddl to 0.19.4 #1230

Upgrade schema-ddl to 0.19.4 #1230

oguzhanunlu commented Apr 4, 2023 •

edited

Loading

voropaevp left a comment

oguzhanunlu commented Jul 20, 2023

istreeter left a comment

istreeter Aug 16, 2023

oguzhanunlu Aug 17, 2023

istreeter Aug 16, 2023

istreeter Aug 16, 2023

oguzhanunlu Aug 17, 2023

istreeter Aug 16, 2023

oguzhanunlu Aug 17, 2023

istreeter left a comment

istreeter Aug 24, 2023

Upgrade schema-ddl to 0.19.4 #1230

Upgrade schema-ddl to 0.19.4 #1230

Conversation

oguzhanunlu commented Apr 4, 2023 • edited Loading

voropaevp left a comment

Choose a reason for hiding this comment

oguzhanunlu commented Jul 20, 2023

istreeter left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

istreeter left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

oguzhanunlu commented Apr 4, 2023 •

edited

Loading