-
Notifications
You must be signed in to change notification settings - Fork 233
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use concurrent (aka "lockless") hash map in the AtomTable #2553
Comments
Implementing this requires
Do the same for |
Possibilities:
Each seems to have limitations and usability issues... See below for some commentary. Starting with the simplest one, even if it is not the fastest, is probably the best/safest idea. |
No. See issue #2758 Almost all the parallelism problems are due to the CPU implementation of atomic increment. There could be some speedup from a lock-free |
What is actually needed is a drop-in replacement for Comments about the concurrent hash maps, mentioned above.
|
FIXED in #2902 the multimap design was insane. |
In
After the cleanup of #2907 (and the four prior pull reqs) experimenting with alternatives is now possible. |
Confusion reigns: Recompiled both Double-check these results: using |
Some For the locked version: the lock in
and The unlocked versions show no lock contention. Conclude: it is not lock contention that is slowing us down. Latest theory: The smart pointers and other assorted atomics in the code are used so frequently, that they are causing cache-line contention for the locks. (CPU's implement atomics inside of cache-lines.) Bingo! That's exactly it! Things parallelize nicely on modern CPU's! See issue #2758 for a general discussion. |
The AtomTable uses a mutex to guard access to the TypeIndex. This mutex could be mostly avoided by using a concurrent hash map.
AtomTable Status:
After exploring this for a while, there's a problem. It seems like what AtomTable/Typeindex really requires is a concurrent unordered multimap, with thread-safe erasure. There are 5 or 6 packages that provide concurrent hash maps, but only one provides a concurrent multimap: Intel TBB. Unfortunately, it does NOT provide a thread-safe erase. So it seems like a dead-end! There's still one possibility: use a concurrent hash map, and store concurrent linked lists as the value. Yikes!Fixed in #2907Atom Status: The Atom (
Atom.cc
) stores Values instd::map
. It does NOT use a hash table, in order to keep Atoms as small as possible. Although it could be replaced by a lockless hash table, this risks blowing up RAM usage. So: how bad would this be? How would this change the size of an Atom? Atom also uses an ordinarystd::set
for the incoming set .... same as before: we want a tree here, not a hash, to keep the Atom as small as possible...Overall status: This issue primarily affects users with highly threaded workloads, on modern multi-core (more than 8 core) machines. It appears that the mutexes are NOT the primary bottleneck; instead, its the CPU implementation of atomic increment. See issue #2758 for discussion. Conclude: a lockfree
find()
in a hashset could speed things up for truly demanding users, but we don't have users like that.Lockfree comments: Notes below review the available options. We need two things: a lockfree hash set (for
TypeIndex
), and a lockfree tree (i.e. something smaller, less RAM intensive than a hash set, for the Incoming set). Current options are thin, approximating zero: most of them are hard to use or fail to meet requirements. Lockfree theory looks strong; drop-in replacements forstd::set
andstd::unordered_set
are missing. (fb folly comes close) Documentation and benchmarks are lacking or inadequate. This is still the bleeding edge.The text was updated successfully, but these errors were encountered: