This repository has been archived by the owner on Sep 23, 2024. It is now read-only.
Integers with NULL's can lead to rounding when using the Parquet file format #404
Labels
bug
Something isn't working
Describe the bug
In certain circumstances where a column is a INT on the source and it contains NULL values in one of the rows, there is an implicit conversion from INT to FLOAT leading to rounding.
This is specific to when Target Snowflake is using the parquet format for loading data into Snowflake. This method has some issues because a Pandas dataframe does not support integers with a NULL value. To resolve this Pandas will automatically convert any dataframe which has nulls in the column to a Float64 datatype. This issue is resolved if a Int64 datatype is used, however the default conversion to a Pandas Dataframe is the basic Int datatype (which doesn't support nulls).
A fix for this is to provide a hint in the conversion to cast every column as an object thus preventing any conversion. This seems to not affect target-snowflake ability to land data correctly.
To Reproduce
Steps to reproduce the behavior:
Expected behavior
Rounding should not occur the actual integer value should be replicated from source unmodified.
Your environment
Additional context
The issue is in line :
pipelinewise-target-snowflake/target_snowflake/file_formats/parquet.py
Line 69 in c0806f0
Before Change:
return pandas.DataFrame(data=flattened_records)
After Change:
return pandas.DataFrame(data=flattened_records,dtype='object')
The text was updated successfully, but these errors were encountered: