OPFS read transactions without (blocking) locks #152
Replies: 2 comments
-
This approach has a really interesting possibility: selective page cache invalidation. SQLite's built-in pager cache keeps a set of recently accessed database pages in memory to improve performance. When SQLite detects that another connection has changed the database, the pager cache is invalidated and emptied. If the detected change was small, a lot of cached pages may be discarded unnecessarily. If we disable the SQLite cache and move the cache into the VFS, we can be much more precise. Our connection is notified of the pages changed by each transaction. As part of processing that notification, those pages can be removed from the cache if present. That would be a surgical cache update, removing only newly invalid pages and no others. That should be a performance boost under concurrency, especially when transactions are small compared to the actively accessed part of the database. |
Beta Was this translation helpful? Give feedback.
-
I have implemented this (without a VFS cache) in the dev branch and it seems to work well. Demo is here, benchmarks here. Note that this uses the proposed OPFS access handle modes that are only in Chrome right now. In my implementation read transactions do use a lock after all, but they release them lazily so it is rare that they need to reacquire them. The only times that happens are when another connection performs a VACUUM, which the VFS recognizes and writes an un-permuted file that can be exported from OPFS. This VFS is faster under concurrent access than anything else. Except for VACUUM, multiple readers and a writer don't block each other (access for multiple writers must be one at a time). Readers rarely even need to acquire a lock at all, and don't access IndexedDB. Writers do need to access IndexedDB, but writes are non-blocking. Here are contention test results with 3 readers and 1 writer, (using
For comparison, see this post (the equivalent runs for IDBBatchAtomicVFS and FLOOR are at the bottom). OPFSPermutedVFS is comfortably in front with this load. |
Beta Was this translation helpful? Give feedback.
-
What's that you say? OPFS use doesn't require locks because only one synchronous access handle on a file can be open at any given time? That's about to change. There's a proposal to allow multiple concurrent access handle usage (which I already discussed here), and it's enabled by default in Chrome 121 which just reached the stable channel (ChromeOS in a few more days).
So on Chrome at least we'll be able to read and write the same OPFS file from many contexts at once, and that means some sort of synchronization will be required to avoid corrupting a database. WebLocks are an obvious way to implement the SQLite VFS locking methods, and doing that straightforwardly works fine...but what if there were a way to make it harder and more complicated? Wouldn't that be more fun? 😀
I have an idea for such a way where database readers never need to block. In the straightforward approach, readers are blocked while a writer makes changes. WAL relaxes that, but readers still need to block for checkpointing and no one has succeeded in using multiway WAL on a browser (I'm working on an alternative).
My idea here is to store database pages in an OPFS file in arbitrary order (i.e. not in page index order like a normal SQLite database file), and keep the mapping of page index to file offset in IndexedDB. Entries in IndexedDB would look like this:
On database open, the VFS would read the page map from IndexedDB, verify the digest of pages belonging to transactions added since the last time someone checked, update the last transaction verified, and keep a Map of pageIndex to fileOffset and transactionId in memory. This is the only time a reader would need to access IndexedDB.
The reader will acquire a shared WebLock whose name encodes the last transaction it knows about, e.g. "opfs:/myDatabase.db[42]". This lock will be used as a flag, not as a lock, to let writers know which pages in the database file can't be overwritten with new transactions yet.
When a writer commits a new transaction, it updates the pages in IndexedDB and uses BroadcastChannel to send a message with:
When other connections receive the broadcast message, they update their in-memory Map and their shared WebLock flag. If there is a local transaction in progress when the message is received, the update is deferred until the transaction completes.
Under this scheme, readers never need to wait. I'm guessing this might be the fastest multiway read performance possible, assuming storage is on an SSD so seek times are low.
So what are the tradeoffs/drawbacks here?
An addendum about writes: Write transactions need to be sequenced - only one at a time - so a lock will be needed. The writer will then need to check IndexedDB to confirm that it's view of the database is current (and if not, return SQLITE_BUSY), query WebLocks to determine the oldest transaction any context considers current and adjust the set of available page slots accordingly, write the new transaction pages, write a new entry to IndexedDB (without waiting if relaxed durability is acceptable), and send the BroadcastChannel message.
Beta Was this translation helpful? Give feedback.
All reactions