`Array#map` is vulnerable to concurrent mutation #15161

straight-shoota · 2024-11-05T14:12:58Z

Array#map(&) yields each element to transform it in the block. If the block performs mutating operations on the array, it can lead to invalid memory access or corrupted data.
The same applies to Array#map_with_index, and I mean both when I refer to #map.

This is discussed as an example in https://forum.crystal-lang.org/t/safety-guarantees-of-stdlib-data-structures/7364

A simple reproduction of an example with data corruption:

ary = [1, 2, 3]

ary.map do |e|
  ary.pop if e == 1
  e
end # => [1, 2, 0]

The cause is that Array#map expects size to be static during its runtime. While this is generally a reasonable expectation, it's impossible to exclude that the block mutates the array's size.

Array#map is an optimized override. The base implementation of Enumerable#map does not have this bug because it does not assume any size. It grows the buffer on demand.

The easiest solution is to drop the implementation of Array#map in order to fall back to Enumerable#map. We can still optimize a bit by using the current size as size hint because usually it's gonna be stable.

The text was updated successfully, but these errors were encountered:

yxhuvud · 2024-11-05T17:01:59Z

Actually modifying an array while iterating it has very few if any actual use cases (or the issues would have shown up a lot earlier). Could it be considered to instead simply forbid modifying the array during iteration? I feel it would be a shame to pay a performance price for something that isn't really how people normally use the thing.

One way to implement that would be to use an rw_lock, though perhaps with a variant that raises if it fails to grab the write lock immediately.

Upside: Would cost a lot less performance as it is only done once per iteration
Downside: would use a tiny bit more memory.
Downside: Would disallow actually doing code like that. Not much of a downside.
Downside: The cost of element deletion would be slightly higher.

Thoughts?

straight-shoota · 2024-11-05T17:18:51Z

This is definitely worth considering.
But it's hard to see a net positive effect on performance because the scope is much larger than affecting just #map.

Such a protection must cover all operation that can potentially invalidate the buffer range. That's most mutating methods on Array except those that only perform substitution. And probably these methods are used magnitudes more often than #map.

ysbaddaden · 2024-11-05T19:41:18Z

As outlined by @BlobCodes, Java throws a ConcurrentModificationException exception when it detects a concurrent iterate/mutate situation (no guarantee: it's a best effort).

crysbot · 2024-11-05T19:51:35Z

This issue has been mentioned on Crystal Forum. There might be relevant details there:

https://forum.crystal-lang.org/t/safety-guarantees-of-stdlib-data-structures/7364/17

straight-shoota · 2024-11-05T19:58:30Z

The mention of ConcurrentModificationException has brought me to this previous thread: https://forum.crystal-lang.org/t/raise-if-containers-accessed-while-being-changed/3937

straight-shoota · 2024-11-05T20:15:13Z

Golang detects concurrent modification on map. It does so using a single bit flag. Interestingly it's not even atomic, so one should call it "best effort" as well I suppose.

https://github.com/golang/go/blob/635c2dce04259f2c84aeac543f0305b3e7c8ed7b/src/internal/runtime/maps/map.go#L231-L235

straight-shoota · 2024-11-06T10:08:51Z

Considering that both Go and Java have a protection mechanism to detect concurrent modifications in generic data structures, the trade-off seems to be worthwile.

A major benefit is of course that it also triggers at least for some types of parallel modifications. So it would have a relevant effect thread safety, to error gracefully instead of ending up with corrupted data or a segfault.

straight-shoota · 2024-11-06T12:04:13Z

Digging a bit into the mechanisms in Go and Java it turns out their implementations - and apparently purposes - are quite different.

Go uses a boolean flag which indicates whether a modification is currently happening. It's set via XOR which according to the doc comment should make it likely that if two fibers both try to mutate simultaneously, both notice the collision.
Detection is not a safe guarantee though, just a high probability. I'm not sure why it's not an atomic, but probably to avoid the overhead and the XOR solution is considered good enough.

This protects the integrity of the data structure itself, but not any assumptions about structural stability as would be necessary for the #map use case.

Java on the other hand uses a variable modCount to count structural modifications. It bumps every time the structure of the collection changes (e.g. items added or deleted). Iterators over the collection hold a reference to the modCounter at its creation time and compare it to the current value on each iteration step. A difference means the collection was modified since the iterator's creation and thus it becomes invalid and errors upon access. This mechanism would also fit for the #map use case.

I suppose this could also be used to detect simultaneous modifications, at least after the fact: Before a method does a structural modification, it takes a snapshot of the current modCount. If it differs after applying the change, there has been concurrent modification and the data can be corrupted. It's unclear what should happen then, though. The damage is done and "just" raising an exception won't prevent other fibers from seeing an invalid state. The only safe course of action is to exit the program🤷
According to Java documentation modCount is only used to ensure iterators are in sync.

It might be possible to combine both approaches into a single variable, a modification counter where one bit indicates whether someone is currently writing. 🤔

oprypin · 2024-11-06T12:33:35Z

Thanks for the description of approaches in Go and Java. They sound nice.

I think the approaches that detect concurrent modification after the fact are just fine.
Their goal isn't to prevent any bug from happening even once, it's to prevent them from being unnoticed and shipped to users.
The expected action isn't to try to recover from the error, it's to go back to the source code and rework it.

bcardiff · 2024-11-16T03:01:59Z

At some point I was trying to make Array MT safe without locks in master...bcardiff:crystal:mt/safe-array#diff-a12e3cba63cd83b9e746cd9eab9d85f5ddebd05b696133cd494aa83174a48d16R55 . The idea was to decouple the array buffer from array itself. Each operation like #map would work on the initial buffer. Concurrent modifications would essentially be modifying potentially different buffers.

That would prevent crashing but is a silent error which as discussed are hard to track. Maybe we can add a modCount to check when there was indeed a concurrent modification and raise. That way there is no memory corruption and we have an explicit signal of where something wrong is happening.

The linked code also includes some GC integrations I think I based from Java. To instruct a bit more precisely the GC regarding pointers in buffers. Not the main aspect, but in case someone checks that code. Also it was done before some refactors of the Array internal to make #shift more performant and ... well it causes lots of conflicts and priorities moved me away from those refactors.

If I were to do the same again I would try to leave out the GC performance, and the introduction of Array::Buffer is essentially another Array-like structure that grows by doing realloc and does not have pointer redirection (like String that is size + payload inlined). Then the gist is refactoring Array to use a buffer instead of a pointer and updating to a new buffer in some operations (but maybe with modCount check). The Array::Buffer could be given a nicer name like Vector and have a lower level container. (I should have written this 5 years ago somewhere!)

RX14 · 2024-11-16T11:44:35Z

I think the goal should include that using array concurrently should never cause an unsafe buffer access (freed buffer space, uninitialized buffer space, anything that could be conceivably be attacker controlled, etc.) but also try and flag occurrences to code authors like go/java. The former could possibly be true already, but I'm not sure.

ysbaddaden · 2024-11-18T10:40:28Z

I feel like we're starting to agree on something: avoid exploitable segfaults + best effort to loudly detect errors. Are we?

@RX14 We currently use GC.realloc so we invalidate the previous pointer 💣

We could replace it with a manual realloc (malloc/memcpy) and let the GC collect the old pointer. Since the buffer can only ever grow (never shrink) then iterations should be safe AFAIK...

There still remains the issue that @bcardiff mentioned: we may remove entries in parallel, in which case we clear the memory (to avoid leaking memory) then we may iterate a null pointer 💣

Maybe that's a good argument towards a semi-precise GC that would know which bounds of the buffer should be scanned (bonus effect: no need to clear anymore) 🤔

RX14 · 2024-11-18T10:51:37Z

I think accessing cleared memory cast to i.e. a safe class type is unlikely to be safe, so I believe both behaviours should be prevented. For now, i think iteration should do a bounds check for each step while iterating. I don't think the performance impact will be that high as long as the array stays uncontested (bounds check data cacheline stays valid in L1)

HertzDevil · 2024-11-18T10:54:55Z

We could replace it with a manual realloc (malloc/memcpy)

This might actually happen due to an entirely unrelated issue, unless GC_realloc respects arbitrary existing alignment requirements

straight-shoota added kind:bug A bug in the code. Does not apply to documentation, specs, etc. topic:stdlib:collection labels Nov 5, 2024

straight-shoota linked a pull request Nov 5, 2024 that will close this issue

Fix protect Array#map against concurrent mutation #15162

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`Array#map` is vulnerable to concurrent mutation #15161

`Array#map` is vulnerable to concurrent mutation #15161

straight-shoota commented Nov 5, 2024

yxhuvud commented Nov 5, 2024 •

edited

Loading

straight-shoota commented Nov 5, 2024

ysbaddaden commented Nov 5, 2024 •

edited

Loading

crysbot commented Nov 5, 2024

straight-shoota commented Nov 5, 2024

straight-shoota commented Nov 5, 2024

straight-shoota commented Nov 6, 2024

straight-shoota commented Nov 6, 2024 •

edited

Loading

oprypin commented Nov 6, 2024

bcardiff commented Nov 16, 2024

RX14 commented Nov 16, 2024 •

edited

Loading

ysbaddaden commented Nov 18, 2024

RX14 commented Nov 18, 2024 •

edited

Loading

HertzDevil commented Nov 18, 2024

Array#map is vulnerable to concurrent mutation #15161

Array#map is vulnerable to concurrent mutation #15161

Comments

straight-shoota commented Nov 5, 2024

yxhuvud commented Nov 5, 2024 • edited Loading

straight-shoota commented Nov 5, 2024

ysbaddaden commented Nov 5, 2024 • edited Loading

crysbot commented Nov 5, 2024

straight-shoota commented Nov 5, 2024

straight-shoota commented Nov 5, 2024

straight-shoota commented Nov 6, 2024

straight-shoota commented Nov 6, 2024 • edited Loading

oprypin commented Nov 6, 2024

bcardiff commented Nov 16, 2024

RX14 commented Nov 16, 2024 • edited Loading

ysbaddaden commented Nov 18, 2024

RX14 commented Nov 18, 2024 • edited Loading

HertzDevil commented Nov 18, 2024

`Array#map` is vulnerable to concurrent mutation #15161

`Array#map` is vulnerable to concurrent mutation #15161

yxhuvud commented Nov 5, 2024 •

edited

Loading

ysbaddaden commented Nov 5, 2024 •

edited

Loading

straight-shoota commented Nov 6, 2024 •

edited

Loading

RX14 commented Nov 16, 2024 •

edited

Loading

RX14 commented Nov 18, 2024 •

edited

Loading