-
-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cache housekeeper can panic (seen after 50M+ insertions) #267
Comments
Thank you for reporting the issue! ... attempt to add with overflow',
... /moka-0.10.0/src/common/frequency_sketch.rs:183:9 moka/blob/v0.10.0/src/common/frequency_sketch.rs#L180-L183 180: fn index_of(&self, hash: u64, depth: u8) -> usize {
181: let i = depth as usize;
182: let mut hash = hash.wrapping_add(SEED[i]).wrapping_mul(SEED[i]);
183: hash += hash >> 32; I think the line 183 should be changed to: 183: hash = hash.wrapping_add(hash >> 32); so that a wrapping (modular) addition is allowed in debug build (where the overflow checks are enabled). Let me double check and apply the change. (Duplicate of #113) |
Thanks again for reporting it. I fixed it via #272. I will publish v0.11.1 to crates.io soon with the fix. Here are some details about the bug: Moka cache calculates the hash value ( It is very rare to hit the bug, because:
The reason that your script does not hit the bug in every runs is that the hash values for the same keys are different in every runs. The following code snippet shows the hash values for the same keys in different runs: use std::hash::{BuildHasher, Hash, Hasher};
let key = (0, 0);
// Moka cache uses `RandomState` as the default `BuildHasher`. When a cache is
// created, a `RandomState` instance is created with a random seed.
//
// Here, let's create a `RandomState` instance directly, without creating a cache.
let hash_builder = std::collections::hash_map::RandomState::new();
// Calculate the hash value of the key, just like a cache does.
let mut hasher = hash_builder.build_hasher();
key.hash(&mut hasher);
let hash = hasher.finish();
// Print the hash value. Since `RandomState` is seeded by a random value, the
// hash value is different in each run.
println!("key: {key:?}, hash: {hash}"); $ cargo run
key: (0, 0), hash: 6658444597857261670
$ cargo run
key: (0, 0), hash: 6848010251626297664
$ cargo run
key: (0, 0), hash: 8403121289877648902 Finally, here is the code of the simple test I used to trigger the bug: hash-finder// Cargo.toml
// ---------------------------------------------
// [dependencies]
// getrandom = "0.2.9"
// once_cell = "1.17.1"
// rand = { version = "0.8.5", features = ["small_rng"] }
//
// # Enable overflow checks in release builds.
// [profile.release]
// overflow-checks = true
// ---------------------------------------------
use std::num::NonZeroUsize;
use rand::rngs::SmallRng;
use rand::{Rng, SeedableRng};
// Please download this file having the bug:
// https://raw.githubusercontent.com/moka-rs/moka/v0.11.0/src/common/frequency_sketch.rs
//
// and place it into your `src` directory.
//
mod frequency_sketch;
fn main() {
let num_cores = std::thread::available_parallelism().unwrap_or(NonZeroUsize::new(1).unwrap());
let num_threads = num_cores.get() / 2;
// Spawn threads to search for hashes that trigger a panic.
let handles = (0..num_threads)
.map(|thread_num| {
std::thread::Builder::new()
.name(format!("thread-{}", thread_num))
.spawn(move || find_hashes(thread_num))
.unwrap()
})
.collect::<Vec<_>>();
// Wait for the threads to finish and collect their results.
let results = handles
.into_iter()
.map(|handle| handle.join().unwrap())
.collect::<Vec<_>>();
let attempts = results.iter().map(|(attempts, _)| *attempts).sum::<u64>();
let mut hashes = results
.iter()
.map(|(_, hashes)| hashes)
.flatten()
.collect::<Vec<_>>();
hashes.sort_unstable();
hashes.dedup();
println!("-------------------------------------");
println!(
"number of threads: {}, total attempts: {}, hashes found: {}, chances: once in {:.2} attempts",
num_threads,
attempts,
hashes.len(),
attempts as f64 / hashes.len() as f64
);
for hash in hashes {
println!("{hash}");
}
}
fn find_hashes(thread_num: usize) -> (u64, Vec<u64>) {
const NUM_ATTEMPTS: u64 = 2u64.pow(32);
// The hashes triggering a panic.
let mut hashes_found = vec![];
let mut rng = SmallRng::from_entropy();
let mut sketch = frequency_sketch::FrequencySketch::default();
// The capacity does not matter for this bug.
sketch.ensure_capacity(1024);
for _ in 0..NUM_ATTEMPTS {
// Generate a random hash and check if it triggers a panic.
let hash = rng.gen();
match std::panic::catch_unwind(|| sketch.frequency(hash)) {
Ok(_) => (),
Err(_) => {
println!("{} - FOUND: hash: {}", thread_num, hash);
hashes_found.push(hash);
}
}
}
(NUM_ATTEMPTS, hashes_found)
} and the output: ## In this program, release mode will panic too because the overflow
## checks are enabled in Cargo.toml.
$ cargo run --release
...
thread 'thread-3' panicked at 'attempt to add with overflow', src/frequency_sketch.rs:185:9
3 - FOUND: hash: 6545676238230191093
thread 'thread-3' panicked at 'attempt to add with overflow', src/frequency_sketch.rs:185:9
3 - FOUND: hash: 15599614373087607955
...
-------------------------------------
number of threads: 4, total attempts: 17179869184, hashes found: 16, chances: once in 1073741824.00 attempts
175235467969503931
350323424204063419
2393337716848854001
2607510736887440732
3614626907659264335
4624000782985340440
4651654716317629850
6545676238230191093
8541891011146845666
8980550827530989485
9619494623239763742
11220422092290893381
13296693810304296595
15599614373087607955
15665336588250767017
15687268636519650785 |
I hope I can look into your original issue but it will be very difficult without access trace data. Cache has many moving parts and at least the followings have major impact to size-based evictions:
moka employs "aggregated victims" strategy described in the following paper:
Maybe the strategy does not match your workload, or there is a bug in the implementation. I'm not sure. The access trace data will help us to understand what happened in your workload. No need to include the real key and value of the entry, just the hash value of the key, the access time, and the weighted size will be enough. If you can share it, I will be happy to look into it.
|
Thanks - that's an interesting and satisfyingly simple explanation! I'll see if I can get any time to look into the other issue, I never narrowed it down to moka, it could have also been a bug in our code or something interesting about the test scenario. If I find anything concrete enough to report I'll get back to you. Thanks for your help |
I wrote a script to try and expose a different issue I thought might be caused by moka and encountered an unhandled panic. This happened somewhere between 54.8M and 54.9M insertions. I was asynchronously inserting items from a bunch of tasks to try and replicate the behaviour of our real-world system.
The panic I saw was:
Script that caused the above panic:
I expect this is fairly rare. I've run the script again since and it got past 70M inserts before hitting the issue. I'm not invested in this being fixed so up to you what you want to do with it. Figured it was worth reporting though.
If you're interested I can share more context on the original cache issue I was investigating when I found this (cache capacity slowly reducing as more and more size evictions happen). We dropped our investigation and changed our approach to save time, so I don't think I have enough evidence to raise an issue and confidently point the finger at moka, although at the time we suspected this issue was with this library.
The text was updated successfully, but these errors were encountered: