This disaggregation in block cache of RocksDB seems actually not quite useful given the fast SSD.
But you may find the RDMA-related codes useful if you are a beginner in RDMA.
- 1. a simple remote memory allocator
- 2. copy and tidy up the LRUCache codes and create RMLRUCache, remove Secondary Cache Logics for convenient
- 3. a simple rdma server or interface for convenient fetching and storing operations.
- a server handling control message and cm events
- a client that has one qp to write/read remote memory
- unit test above
- 4. embed the rdma interface into rocksdb
- 5. implement the remote memory logic for LRUCache.
- LRUHandle. modify its fields to support below operations
- rm_lru. implement rm_lru related methods (the simplest lru)
- eviction.
- evict local block to remote if exceeding local memory
- evict remtote block if exceeding total memory
- shard remote memory so that any shard can control its own rm (otherwise, it may fail when allocate a space but memory is framented by other shards)
- fetch if remote
- statistics about the remote memory
- count of hit in rm/hit in lm
- time of hit in rm/hit in lm (or rm overhead)
- stats of lm/rm usage (through
GetMapProperty
)
- try to treat remote memory as blocks, i.e., only allocate a block for each cache block (here is an assumption that the block size can always be fit in a cache)
- support async read/write
- modify rdma_transport to support async read/write (ignore potential race)
- modify rm to support async ops
- AsyncRequest with a buffer to recv remote value or a pointer to buffer that will be sent
- modify DLRUCache to use async ops
- do the transfering out of mutex
- invoke
wait
upon using the DLRUHandle (e.g., Lookup), and do free if necessary.
- modify rm to support rdma_transport pool (avoid contention)
- overlap rdma read/write as much as possible
- overlap read/write exchange in Lookup
- local BlockBasedMemoryAllocator
- a basic usable allocator (with custom deleter)
- shard the memory region to avoid lock contention
- register local BlockBasedMemoryAllocator for RDMA
- directly read/write to avoid copy
- sync version
- async version
- support configuration of using
d_lru_cache
or normallru_cache
- support configuration of using
rm_ratio
- modify value generator to use transformation of key, for easier verification of the correctness.