OPFS read transactions without (blocking) locks #152

rhashimoto · 2024-01-25T20:17:21Z

rhashimoto
Jan 25, 2024
Maintainer

What's that you say? OPFS use doesn't require locks because only one synchronous access handle on a file can be open at any given time? That's about to change. There's a proposal to allow multiple concurrent access handle usage (which I already discussed here), and it's enabled by default in Chrome 121 which just reached the stable channel (ChromeOS in a few more days).

So on Chrome at least we'll be able to read and write the same OPFS file from many contexts at once, and that means some sort of synchronization will be required to avoid corrupting a database. WebLocks are an obvious way to implement the SQLite VFS locking methods, and doing that straightforwardly works fine...but what if there were a way to make it harder and more complicated? Wouldn't that be more fun? 😀

I have an idea for such a way where database readers never need to block. In the straightforward approach, readers are blocked while a writer makes changes. WAL relaxes that, but readers still need to block for checkpointing and no one has succeeded in using multiway WAL on a browser (I'm working on an alternative).

My idea here is to store database pages in an OPFS file in arbitrary order (i.e. not in page index order like a normal SQLite database file), and keep the mapping of page index to file offset in IndexedDB. Entries in IndexedDB would look like this:

{
  pageIndex: number, // This is the IndexedDB key
  fileOffset: number,
  transactionId, number,
  digest: Uint32Array // Can be skipped if writers sync on commit
}

On database open, the VFS would read the page map from IndexedDB, verify the digest of pages belonging to transactions added since the last time someone checked, update the last transaction verified, and keep a Map of pageIndex to fileOffset and transactionId in memory. This is the only time a reader would need to access IndexedDB.

The reader will acquire a shared WebLock whose name encodes the last transaction it knows about, e.g. "opfs:/myDatabase.db[42]". This lock will be used as a flag, not as a lock, to let writers know which pages in the database file can't be overwritten with new transactions yet.

When a writer commits a new transaction, it updates the pages in IndexedDB and uses BroadcastChannel to send a message with:

{
  transactionId: number,
  pages: [pageIndex: number, fileOffset: number][]
}

When other connections receive the broadcast message, they update their in-memory Map and their shared WebLock flag. If there is a local transaction in progress when the message is received, the update is deferred until the transaction completes.

Under this scheme, readers never need to wait. I'm guessing this might be the fastest multiway read performance possible, assuming storage is on an SSD so seek times are low.

So what are the tradeoffs/drawbacks here?

Slower initialization because of reading the page map.
It isn't synchronous - at the moment a read transaction starts, there may already be new write transactions in the database depending on how quickly BroadcastMessage propagates and whether the main thread is released for the listener to execute.
It doesn't have filesystem transparency; the OPFS file can't be exported and directly read as a SQLite database. The database file is permuted so it can't be used without the mapping. The SQLite file format does allow for per-page application data; perhaps this could be used to insert page validity and ordering for an external converter.
Poor performance on mechanical disk storage.
If a read transaction remains open for a long time, write transactions eventually will not be able to overwrite obsolete pages and the file may grow large. A mechanism to reclaim this space would probably need exclusive access.
More SQLITE_BUSY retries from write contention. The asynchronous properties of this approach means that the state a connection sees when it begins a transaction will not always be current. That can be just fine for read transactions, depending on the application, but a write transaction cannot proceed unless the state is current. In those cases, the VFS will return SQLITE_BUSY.

An addendum about writes: Write transactions need to be sequenced - only one at a time - so a lock will be needed. The writer will then need to check IndexedDB to confirm that it's view of the database is current (and if not, return SQLITE_BUSY), query WebLocks to determine the oldest transaction any context considers current and adjust the set of available page slots accordingly, write the new transaction pages, write a new entry to IndexedDB (without waiting if relaxed durability is acceptable), and send the BroadcastChannel message.

rhashimoto · 2024-02-10T04:09:21Z

rhashimoto
Feb 10, 2024
Maintainer Author

This approach has a really interesting possibility: selective page cache invalidation. SQLite's built-in pager cache keeps a set of recently accessed database pages in memory to improve performance. When SQLite detects that another connection has changed the database, the pager cache is invalidated and emptied. If the detected change was small, a lot of cached pages may be discarded unnecessarily.

If we disable the SQLite cache and move the cache into the VFS, we can be much more precise. Our connection is notified of the pages changed by each transaction. As part of processing that notification, those pages can be removed from the cache if present. That would be a surgical cache update, removing only newly invalid pages and no others. That should be a performance boost under concurrency, especially when transactions are small compared to the actively accessed part of the database.

0 replies

rhashimoto · 2024-05-19T20:19:57Z

rhashimoto
May 19, 2024
Maintainer Author

I have implemented this (without a VFS cache) in the dev branch and it seems to work well. Demo is here, benchmarks here. Note that this uses the proposed OPFS access handle modes that are only in Chrome right now.

In my implementation read transactions do use a lock after all, but they release them lazily so it is rare that they need to reacquire them. The only times that happens are when another connection performs a VACUUM, which the VFS recognizes and writes an un-permuted file that can be exported from OPFS.

This VFS is faster under concurrent access than anything else. Except for VACUUM, multiple readers and a writer don't block each other (access for multiple writers must be one at a time). Readers rarely even need to acquire a lock at all, and don't access IndexedDB. Writers do need to access IndexedDB, but writes are non-blocking.

Here are contention test results with 3 readers and 1 writer, (using PRAGMA synchronous=normal to trade durability for performance):

[11:28:45.604] build: asyncify
[11:28:45.604] config: OPFSPermutedVFS
[11:28:45.604] nWriters: 1
[11:28:45.604] nReaders: 3
[11:28:45.604] nSeconds: 1
[11:28:48.071] launch workers
[11:28:48.202] start
[11:28:49.202] worker 1 reader 1388 iterations
[11:28:49.202] worker 3 reader 1394 iterations
[11:28:49.202] worker 2 reader 1378 iterations
[11:28:49.202] worker 0 writer 816 iterations
[11:28:49.202] complete

For comparison, see this post (the equivalent runs for IDBBatchAtomicVFS and FLOOR are at the bottom). OPFSPermutedVFS is comfortably in front with this load.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OPFS read transactions without (blocking) locks #152

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

OPFS read transactions without (blocking) locks #152

rhashimoto Jan 25, 2024 Maintainer

Replies: 2 comments

rhashimoto Feb 10, 2024 Maintainer Author

rhashimoto May 19, 2024 Maintainer Author

rhashimoto
Jan 25, 2024
Maintainer

rhashimoto
Feb 10, 2024
Maintainer Author

rhashimoto
May 19, 2024
Maintainer Author