-
Notifications
You must be signed in to change notification settings - Fork 516
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a best effort heap-wide object property slot cache #1289
base: master
Are you sure you want to change the base?
Conversation
d5dcb31
to
254d18c
Compare
The current object property part layout is good for linearly scanning keys (which are kept in a packed array) but when a slot cache or hash part lookup succeeds the layout is not ideal because multiple cache lines get fetched. For example, for a successful slot cache lookup:
Depending on cache pressure this may or may not have a significant impact. For hot paths the lines usually stay in memory so this is not a big issue for the most part, but it's still not ideal. For desktop targets it might be better to place the key, value, and attributes into a single record structure and make the property table an array of those. This is not the most memory efficient approach but would be the most cache friendliest when a lookup is done using a hash table, the slot cache, etc, instead of linear scanning of keys. For low memory targets the packed structure with no overhead should still be available because the padding losses add up and are significant when working with very low amounts of RAM. Added a bullet point to #1196 for this question. |
Some thoughts on whether a slot cache could replace hash tables in general. The main reason that wouldn't work well with the pull as it is now is that only properties that actually get accessed get cached. So if a large object has N properties (say 1000) and they all get accessed in sequence, each of them will involve one linear scan followed by caching. The linear scans will take 1000 * 500 = 500k key comparisons, 500 on average. That could be avoided as follows: when doing a linear scan, insert all keys scanned into the slot cache. So, for example, if one were to scan and find the desired key at index 591, keys at indices 0-590 would also be inserted into the slot cache. This would eliminate some of the scans (but still perform poorly if the properties were accessed in increasing property slot index). The downside is that a lot of entries also get overwritten; to work well, the slot cache must be large enough to make this a non-issue in practice. Related approach: all new keys written would always get inserted into the slot cache, and when an object is resized, the compacted key list would be inserted into the slot cache during the resize. This would work well if the slot cache is larger than the effective working set of properties being manipulated so that, on average, slot cache entries wouldn't get continually overwritten and thus again needing too many linear scans. The slot cache could be relatively large in practice because all hash tables avoided by using it could be allocated towards the slot cache instead; the slot cache might need to have a dynamic size to work well. Even a few linear scans would be a problem for a huge object (say 1M properties) that is constructed once and then looked up a large number of times; a linear scan would cost on average 500k key comparisons for such an object. So, it might be possible to avoid an explicit hash table for medium size objects (say < 1000 properties) by just using a much larger slot cache which could be paid for by the memory freed by not using hash tables for many objects, and maybe using a dynamic slot cache. But for extremely large objects this would still not work very well; objects with > 1000 properties are not very common but still occur from time to time in e.g. some temporary tracking data structures in algorithms. |
The reason why it'd be nice to eliminate per-object hash tables is to remove hash table management from the object model. A dynamically sized property/slot cache would avoid upfront cost and react to actual property read/write patterns. So far I don't know how to achieve that without any per-object state because a shared structure always experiences some entry overwrites which causes linear key scans, and that breaks down as a viable model for very large (think 1M properties) objects. This is a shame because the hash table for 1M properties would be around 8MB in size, and a slot cache of that size can hold quite a lot of property slot indices. |
254d18c
to
b460231
Compare
9960349
to
2f76846
Compare
2f76846
to
6bcc886
Compare
Add a heap-wide property slot cache:
duk_hobject *
and a key string hash.Property lookup changes:
The upsides of this approach are that:
There are downsides too:
See also: #1284 (comment).
Tasks: