You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A tricky issue in message-passing based work-requesting/work-stealing is how to efficiently select victims, especially with a large number of cores.
The initial intuition is to use a bitset, but an uint64 can only address up to 64 cores.
What if we have a 8-socket Xeon Platinum 9282 with 56 cores - 112 threads each
hence a total of 896 threads?
A 896-bit bitset would be huge, and we don't really want to have this be a compile-time parameter.
The current solution in Weave is unsatisfactory, we send to the victims an address to our own set of tried/untried victims. This works with shared-memory but not in a distributed computing setting.
The following recurrence generates all random numbers mod m without repetition:
Xₙ₊₁ = aXₙ+c (mod m)
if and only if (Hull-Dobell theorem):
c and m are coprime
a-1 is divisible by all prime factors of m
a-1 is divisible by 4 if m is divisible by 4
The only state to send in steal request would be a and c. Xₙ is the victim ID and m is either the number of threads or for ease of implementation the next power of 2.
iteratorpseudoRandomPermutation(randomSeed: uint32, maxExclusive: int32): int32=## Create (low-quality) pseudo-random permutations for [0, max)# Design considerations and randomness constraint for work-stealing, see docs/random_permutations.md## Linear Congruential Generator: https://en.wikipedia.org/wiki/Linear_congruential_generator## Xₙ₊₁ = aXₙ+c (mod m) generates all random number mod m without repetition# if and only if (Hull-Dobell theorem):# 1. c and m are coprime# 2. a-1 is divisible by all prime factors of m# 3. a-1 is divisible by 4 if m is divisible by 4## Alternative 1. By choosing a=1, all conditions are easy to reach.## The randomness quality is not important besides distributing potential contention,# i.e. randomly trying thread i, then i+1, then i+n-1 (mod n) is good enough.## Assuming 6 threads, co-primes are [1, 5], which means the following permutations# assuming we start with victim 0:# - [0, 1, 2, 3, 4, 5]# - [0, 5, 4, 3, 2, 1]# While we don't care much about randoness quality, it's a bit disappointing.## Alternative 2. We can choose m to be the next power of 2, meaning all odd integers are co-primes,# consequently:# - we don't need a GCD to find the coprimes# - we don't need to cache coprimes, removing a cache-miss potential# - a != 1, so we now have a multiplicative factor, which makes output more "random looking".# n and (m-1) <=> n mod m, if m is a power of 2let maxExclusive =cast[uint32](maxExclusive)
let M = maxExclusive.nextPowerOfTwo_vartime()
let c = (randomSeed and ((M shr1) -1)) *2+1# c odd and c ∈ [0, M)let a = (randomSeed and ((M shr2) -1)) *4+1# a-1 divisible by 2 (all prime factors of m) and by 4 if m divisible by 4let mask = M-1# for mod Mlet start = randomSeed and mask
var x = start
whiletrue:
if x < maxExclusive:
yieldcast[int32](x)
x = (a*x + c) and mask # ax + c (mod M), with M power of 2if x == start:
break
The text was updated successfully, but these errors were encountered:
You are genius! I use congruential generators for many different use cases but this use case I never thought of. So simple yet so efficient. Congratulations!
A tricky issue in message-passing based work-requesting/work-stealing is how to efficiently select victims, especially with a large number of cores.
hence a total of 896 threads?
A 896-bit bitset would be huge, and we don't really want to have this be a compile-time parameter.
Instead we can use a Linear Congruential Generator: https://en.wikipedia.org/wiki/Linear_congruential_generator
The following recurrence generates all random numbers mod m without repetition:
Xₙ₊₁ = aXₙ+c (mod m)
if and only if (Hull-Dobell theorem):
The only state to send in steal request would be
a
andc
.Xₙ
is the victim ID andm
is either the number of threads or for ease of implementation the next power of 2.Nim code: https://github.com/mratsim/constantine/blob/1dfbb8bd4f2fefc4059f4aefb040f0b3ba2918cb/constantine/platforms/threadpool/threadpool.nim#L84-L128
The text was updated successfully, but these errors were encountered: