Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: UUID incremental replication keys cannot be compared with UUID-strings coming from the state #2754

Closed
1 task
edgarrmondragon opened this issue Nov 11, 2024 · 4 comments
Labels
Milestone

Comments

@edgarrmondragon
Copy link
Collaborator

edgarrmondragon commented Nov 11, 2024

Singer SDK Version

0.41.0

Is this a regression?

  • Yes

Python Version

NA

Bug scope

Taps (catalog, state, etc.)

Operating System

NA

Description

When a SQL tap defines a UUID column as a replication key, the step where the value coming from the record and the value coming from the state dictionary are compared to increment the bookmark crashes because the >= operation is not supported between a uuid.UUID1 instance and a string.

We can fix by simply adding a branch to

def to_json_compatible(val: t.Any) -> t.Any: # noqa: ANN401
"""Return as string if datetime. JSON does not support proper datetime types."""
if isinstance(val, (datetime.datetime,)):
# Make naive datetimes UTC
return (val.replace(tzinfo=UTC) if val.tzinfo is None else val).isoformat("T")
return val

that checks for an instance of uuid.UUID and returns the stringified value. The resulting string-to-string comparison should be safe down to the millisecond.

Note

This may be only be an issue in a very narrow set of use cases, in particular where the experimental sortable UUID Version 72 is enabled via database extensions, such as pg_uuidv73.

Related:

Link to Slack/Linen

No response

Footnotes

  1. https://docs.python.org/3/library/uuid.html#uuid.UUID

  2. https://www.ietf.org/archive/id/draft-peabody-dispatch-new-uuid-format-04.html#name-uuid-version-7

  3. https://github.com/craigpastro/pg_uuidv7

@edgarrmondragon
Copy link
Collaborator Author

Added this to the v0.42 milestone to indicate that we could backport it to a v0.42.1 if we get around to this quickly.

@edgarrmondragon
Copy link
Collaborator Author

@nikzavada in particular let me know if this

The resulting string-to-string comparison should be safe down to the millisecond.

is accurate or if there's a risk of losing data by incrementing state when the underlying timestamp hasn't really moved.

@edgarrmondragon edgarrmondragon removed their assignment Nov 11, 2024
@nikzavada
Copy link
Contributor

@edgarrmondragon as we use the >= operator we should be safe even when the timestamp stays the same. The string-to-string comparison feels like a good solution here.

Thank you!

@edgarrmondragon
Copy link
Collaborator Author

Closed by #2755

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants