Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: Source postgres json & jsonb columns loading to snowflake as varchar true when using meltanolabs tap-postgres and target-snowflake #274

Open
kyle-foerster opened this issue Oct 16, 2024 · 4 comments
Assignees
Labels
bug Something isn't working

Comments

@kyle-foerster
Copy link

kyle-foerster commented Oct 16, 2024

Target Version

3.4.2

Python Version

3.9

PostgreSQL Version

16.4

Operating System

meltano/meltano:v3.4.2-python3.9 docker image

Description

Steps to fully reproduce can be found in this public repo. General configurations in this repo match our production env where we first noticed this behavior.

In swapping from using the transferwise variants of the tap-postgres and target-snowflake plugins to their respective meltanolabs variants, we noticed it will load any columns from the source of the json or jsonb data type to be varchar true in the target.

Source table was created with:

create table mix_data (
	id integer primary key,
	empty_data jsonb,
	data json
);

example source data in postgres:
Screenshot 2024-10-16 at 3 39 40 PM

example target data in Snowflake:
Screenshot 2024-10-16 at 3 40 53 PM

Quick matrix of what was tested. Only using the meltanolabs variant of both the tap-postgres and target-snowflake resulted in the malformed data.

tap-postgres variant target-snowflake variant Replicate correctly?
transferwise transferwise
transferwise meltanolabs
meltanolabs transferwise
meltanolabs meltanolabs

When looking at the schema of the actual messages that come across using the dummy data in the repo linked above, they look largely the same and what I'd expect. Not sure if including the boolean data type in the meltanolabs variant is somehow causing it to get read through incorrectly?

The schema from the meltanolabs tap-postgres:

"properties": {
            "id": {
                "type": ["integer"]
            },
            "empty_data": {
                "type": ["string","number","integer","array","object","boolean","null"]
            },
            "data": {
                "type": ["string","number","integer","array","object","boolean","null"]
            }
        },
        "type": "object",
        "required": ["id"]

Schema from the transferwise tap-postgres:

{
        "type": "object",
        "properties": {
            "id": {
                "type": ["integer"],
                "minimum": -2147483648,
                "maximum": 2147483647
            },
            "empty_data": {
                "type": ["null","object","array"]
            },
            "data": {
                "type": ["null","object","array"]
            }
        },
        "definitions": {
            "sdc_recursive_integer_array": {
                "type": ["null","object","array"],
                "items": {"$ref": "#/definitions/sdc_recursive_integer_array"}
            },
            "sdc_recursive_number_array": {
                "type": ["null","object","array"],
                "items": {"$ref": "#/definitions/sdc_recursive_number_array"}
            },
            "sdc_recursive_string_array": {
                "type": ["null","object","array"],
                "items": {"$ref": "#/definitions/sdc_recursive_string_array"}
            },
            "sdc_recursive_boolean_array": {
                "type": ["null","object","array"],
                "items": {"$ref": "#/definitions/sdc_recursive_boolean_array"}
            },
            "sdc_recursive_timestamp_array": {
                "type": ["null","object","array"],
                "format": "date-time",
                "items": {"$ref": "#/definitions/sdc_recursive_timestamp_array"}
            },
            "sdc_recursive_object_array": {
                "type": ["null","object","array"],
                "items": {"$ref": "#/definitions/sdc_recursive_object_array"}
            }
        }
    }

Link to Slack/Linen

https://meltano.slack.com/archives/C069CQNHDNF/p1728658002710249

@kyle-foerster kyle-foerster added the bug Something isn't working label Oct 16, 2024
@edgarrmondragon edgarrmondragon transferred this issue from MeltanoLabs/target-postgres Oct 16, 2024
@edgarrmondragon
Copy link
Member

edgarrmondragon commented Oct 16, 2024

Hi @kyle-foerster, I've transferred this issue since you mention you're using target-snowflake.

Which version of tap-postgres are you using? i.e. what's the pip_url for that plugin?

@kyle-foerster
Copy link
Author

Ah sorry about that, thanks for transferring.

What we're using matches what's in the example bug repo.

target-snowflake is using meltanolabs-target-snowflake
tap-postgres is using git+https://github.com/MeltanoLabs/tap-postgres.git

@edgarrmondragon
Copy link
Member

edgarrmondragon commented Oct 18, 2024

Hey @kyle-foerster, thanks for the details!

This does sound like a relatively serious bug. I think I've narrowed it down to meltano/sdk#2726 upstream.

I'll try to come up with a failing test and a patch next week. If a fix upstream is hard to implement without introducing breaking changes, I'll then try a fix in this target.

@edgarrmondragon edgarrmondragon moved this from Todo to In Progress in MeltanoLabs Overview Oct 22, 2024
@edgarrmondragon
Copy link
Member

I just published 0.13.0b1 to PyPI, which should address this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: In Progress
Development

No branches or pull requests

2 participants