You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I would like to read the result set of a query as streams of Arrow record batches. The QueryResultFormat.ARROW provides serialization of data in Arrow stream format, but I don't see a way I can read those streams directly. Can this be exposed to read directly similar to the ArrowStreamLoader in gosnowflake? see https://github.com/snowflakedb/gosnowflake/blob/master/connection.go#L577
What is the current behavior?
Arrow format is used to serialize data from a result set, but doesn't seem to be exposed to read directly.
What is the desired behavior?
Read the serialized Arrow streams directly.
How would this improve snowflake-jdbc?
By consuming Arrow data directly, entire batches can be read at a time instead of each scalar value, increasing performance.
We're interested in this as well. We're currently converting the JDBC results back to Arrow for a variety of JDBC connectors, but we'd love to keep the data in Arrow format the entire time if possible.
sfc-gh-dszmolka
changed the title
Allow reading Arrow record batch streams from result set
SNOW-840018 Allow reading Arrow record batch streams from result set
Apr 26, 2024
I would like to read the result set of a query as streams of Arrow record batches. The QueryResultFormat.ARROW provides serialization of data in Arrow stream format, but I don't see a way I can read those streams directly. Can this be exposed to read directly similar to the
ArrowStreamLoader
in gosnowflake? see https://github.com/snowflakedb/gosnowflake/blob/master/connection.go#L577What is the current behavior?
Arrow format is used to serialize data from a result set, but doesn't seem to be exposed to read directly.
What is the desired behavior?
Read the serialized Arrow streams directly.
How would this improve
snowflake-jdbc
?By consuming Arrow data directly, entire batches can be read at a time instead of each scalar value, increasing performance.
References, Other Background
Similar interface is provided in gosnowflake, https://github.com/snowflakedb/gosnowflake/blob/master/connection.go#L577
Currently the Arrow ADBC driver makes use of it and shows good performance gains https://github.com/apache/arrow-adbc/blob/main/go/adbc/driver/snowflake/record_reader.go#L242
Arrow stream is being read here https://github.com/snowflakedb/snowflake-jdbc/blob/master/src/main/java/net/snowflake/client/jdbc/SnowflakeChunkDownloader.java#L892
What is your Snowflake account identifier, if any?
The text was updated successfully, but these errors were encountered: