Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The future of qs #103

Open
traversc opened this issue Sep 28, 2024 · 4 comments
Open

The future of qs #103

traversc opened this issue Sep 28, 2024 · 4 comments

Comments

@traversc
Copy link
Collaborator

traversc commented Sep 28, 2024

I plan to deprecate the qs package in the future. There is a replacement available, qs2, on CRAN and GitHub (1).

There are two reasons:

  • New CRAN enforcement regarding certain internal functions that qs relies on (2). Since qs handles serialization for both data and internal objects, maintaining proper serialization has become difficult without broader access to these now-restricted functions.

  • qs was first released in 2019. Since then, there have been numerous changes/improvements to the internals of R and therefore its serialization of internal objects. As a result, R updates have sometimes caused qs to break in unexpected ways. Those breaks obviously cause disruption and have been time consuming to fix.

The new qs2 package addresses these issues.

It uses only approved API functions and is designed to be more future-proof. The package has two new formats:

  • The qs2 format uses R's built-in serialization but improves upon it with better file I/O, zstd compression, byte shuffling and multithreading. This is a good 80/20 solution and doesn`t require any update to ensure it works in the future.

  • The qdata format, a spiritual successor to qs, features its own serialization for data only (vectors, data frames, lists, matrices, attributes). It outperforms qs and qs2 formats, especially with multithreading (3) and I also plan for limited cross-compatibility with Python later on.

Thanks to everyone who used the qs package over the years and I hope qs2 will be a worthy successor!

(3) Benchmarks (4.5 GB mixed numeric/text data)

Single-threaded

Algorithm Compression Save Time (s) Read Time (s)
qs2 7.96 13.4 50.4
qdata 8.45 10.5 34.8
base::serialize 1.1 8.87 51.4
saveRDS 8.68 107 63.7
fst 2.59 5.09 46.3
parquet 8.29 20.3 38.4
qs (legacy) 7.97 9.13 48.1

Multi-threaded (8 threads)

Algorithm Compression Save Time (s) Read Time (s)
qs2 7.96 3.79 48.1
qdata 8.45 1.98 33.1
fst 2.59 5.05 46.6
parquet 8.29 20.2 37.0
qs (legacy) 7.97 3.21 52.0
@traversc traversc pinned this issue Sep 28, 2024
@SebKrantz
Copy link

SebKrantz commented Oct 8, 2024

Thanks @traversc for this note. Your decision is very respectable, although I would urge to keep qs on CRAN as long as possible. Many packages have not, or cannot, move into full API compliance. Major packages including data.table would have great difficulties to do that. I don't see CRAN maintainers starting to crack down on non-complient packages, especially packages that are highly depended upon such as qs.

@traversc
Copy link
Collaborator Author

traversc commented Oct 8, 2024

Thanks @SebKrantz , for now CRAN isn't forcing the issue. I hope data.table gets the official support they need.

@SebKrantz
Copy link

@traversc perhaps one more note here, the CRAN policy suggests that only major x.y.0 updates may be forced to fix all issues. So it should be possible to keep .qs going with minor updates.

@traversc
Copy link
Collaborator Author

@SebKrantz Are you referring to this part?

Maintainers will be asked to update packages which show any warnings or significant notes, especially at around the time of a new x.y.0 release. Packages which are not updated are liable to be archived.

R 4.5 is scheduled for Spring which is hopefully enough time to gracefully deprecate everything.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants