You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Parquet files are unable to be read and loaded into the proper ParquetDataset object when used with make_batch_reader. This is due to a deprecated parameter validate_schema=False that was removed in v15.0.0 version of pyarrow.
It seems that the validate_schema argument has been removed with the update of pyarrow.
I resolved this issue using petastorm=0.12.1, pyarrow=10.0.1
Using Reader and make_reader in the petastorm, data is loaded successfully.
However, deprecated warning is being displayed due to the previous version, and It's not pretty.
I hope someone shares a fancy solution using the latest version.
here is my work
fs = s3fs.S3FileSystem(key="ACCESS_KEY", secret="SECRET_KEY", endpoint_url="ENDPOINT")
reader = make_reader(dataset_url="s3a://YOUR/DATA/PATH", filesystem=fs) as reader
or
reader = Reader(dataset_path = "s3a://YOUR/DATA/PATH")
Description
Parquet files are unable to be read and loaded into the proper
ParquetDataset
object when used withmake_batch_reader
. This is due to a deprecated parametervalidate_schema=False
that was removed in v15.0.0 version of pyarrow.Actual behavior
Expected behavior
The dataset is loaded properly into the
ParquetDataset
object so that it can be consumed downstream.The text was updated successfully, but these errors were encountered: