Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: handle reserved words #51

Merged
merged 3 commits into from
Jun 14, 2023
Merged

fix: handle reserved words #51

merged 3 commits into from
Jun 14, 2023

Conversation

pnadolny13
Copy link
Contributor

@pnadolny13 pnadolny13 commented Jun 9, 2023

Closes #48

Sqlalchemy handles some of the quoting for reserved words but we also build sql manually so those dont get handled. Also when doing column diffing like to decided whether a column exists or not we need to use uppercase column names because we created them as uppercase with quotes so case matters.

I found a bug with the COPY logic when the stream has schema updates. Logged it in #53 and draft PR in #52. Once that is fixed we can add a schema change to the test records with no key properties so we assert alters work with reserved words when no key properties are being used.

@pnadolny13 pnadolny13 marked this pull request as ready for review June 11, 2023 20:30
@kgpayne
Copy link
Collaborator

kgpayne commented Jun 14, 2023

@pnadolny13 happy with this as-is, but it would be good to review this point in a future issue:

Also when doing column diffing like to decided whether a column exists or not we need to use uppercase column names because we created them as uppercase with quotes so case matters.

For downstream analytics use it would be great to create in a case-insensitive way, even if that means heavier-handed name conforming. We dialled back the default conforming in the SDK, so it'd be up to us in target-snowflake to determine how much to do. As a first pass, I'd look do as much as the PPW variant (assuming they also don't quote column names on creation) 😅

@pnadolny13
Copy link
Contributor Author

@kgpayne yeah 100%. I'm not sure how it should be handled in the general case but these changes mimic what the PPW variant does and what I think is reasonable in the snowflake case.

This is my summary, while its fresh in my head:

I have a stream that has a property select, its lowercase and is a reserved word. In PPW I get a column in snowflake as "SELECT" with the quotes because its a reserved word and fully upper cased. I also get the same behavior now from this variant. Also in both variants any casing is removed for non-reserved words and the output is unquoted uppercase columns in snowflake, I dont know if the PPW variant is explicitly doing this but I know snowflake will uppercase them for you if you dont have quotes around them. The challenge with this variant thats different than PPW is that sqlalchemy looks like it lower cases all column names when it returns the metadata of a table but quoted names keep their casing. So when we upper case normal unquoted columns they get returned to us via sqlalchemy as lower case, but for quoted columns they get returned as upper case. Its just tricky to do accurate diffing with that behavior unless we think its safe to do the "lazy" approach and lower column names on both sides then compare, I dont necessarily see a reason why not to do this but there might be edge cases.

I've used products in the past that keep the exact casing of the source when writing to snowflake meaning lowercase properties would stay lowercase in the snowflake table within quotes. This was a huge pain to query, even the snowflake console had issues querying those columns using a preview back then (probably resolved by now).

@pnadolny13 pnadolny13 merged commit acf4b07 into main Jun 14, 2023
@pnadolny13 pnadolny13 deleted the fix_reserved_words branch June 14, 2023 14:26
@pnadolny13 pnadolny13 mentioned this pull request Jun 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

bug: reserved words cause compilation error
2 participants