Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

solver: add sqlcachestorage as an alternative to the bolt db #5246

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

jsternberg
Copy link
Collaborator

This creates an sql cache storage that stores the cache key index in an sqlite3 database instead of in bolt. At the moment, this is just functional in the bare minimum as a way to test the feasibility of this method. There are ways to configure the sqlite database for more efficient transactions.

This can be used as an alternative storage for bolt db and is potentially usable as a distributed cache key storage for another implementation in the future.

@jsternberg
Copy link
Collaborator Author

I thought I'd experiment with this idea because I had noticed that the cachedb had a relational pattern. I haven't done any profiling with this, but I did note some things while working on this PR.

  1. sqlite specifically has a WAL journaling mode which could be a performance booster. We could test the feasibility of the WAL being in a temporary storage or native disk.
  2. With SQL, it's also possible to move the cache index off a local disk to begin with. We might want to batch transactions some amount though.
  3. The _backlinks and _byresult buckets from bolt weren't needed since they were just custom index implementations. This implementation just creates an index and lets SQL handle that.
  4. The logic for accessing the database is, in general, more simple in my own opinion because the SQL statements that query and modify it are mostly self-explanatory.
  5. I think that there are some additional changes that could be made to solver/cachemanager.go to do some of the logic directly in the database rather than through Walk statements. Walk statements aren't the biggest deal when the cachedb is local, but incurring a network round trip can be more difficult if we want to make it easy to utilize client/server models for the database. In particular, getIDFromDeps is something I believe that can be rewritten with an SQL query.

Comment on lines +8 to +9
#BUILDKIT_DEBUG: 1
CGO_ENABLED: 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For CGO mode-free you can look at https://pkg.go.dev/modernc.org/sqlite

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can consider this if we need a CGO free mode. At the current moment, we still have the bolt database and we can just easily say "this feature only exists with cgo". I took a look at that project and the C library seems to be faster in most places.

I updated the PR to include some better messages for when cgo is disabled and to also only enable it on linux and darwin which shouldn't even matter much because I don't think you can run buildkit on darwin or windows anyway.

This creates an sql cache storage that stores the cache key index in an
sqlite3 database instead of in bolt. At the moment, this is just
functional in the bare minimum as a way to test the feasibility of this
method. There are ways to configure the sqlite database for more
efficient transactions.

This can be used as an alternative storage for bolt db and is
potentially usable as a distributed cache key storage for another
implementation in the future.

Signed-off-by: Jonathan A. Sternberg <jonathan.sternberg@docker.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants