This is an implementation in OCaml of ART. Adaptive Radix Tree is like a
simple Hashtbl
with order:
# let tree = Art.make () ;;
# Art.insert tree (Art.key "foo") 42 ;;
# Art.insert tree (Art.key "bar") 21 ;;
# Art.find tree (Art.key "foo")
- : int = 42
Operation like minimum
or maximum
are available (which don't exist for a
simple Hashtbl.t
):
# let tree = Art.make () ;;
# Art.insert tree (Art.key "0") 0 ;;
# Art.insert tree (Art.key "1") 1 ;;
# Art.insert tree (Art.key "2") 2 ;;
# Art.minimum tree
- : int = 0
# Art.maximum tree
- : int = 2
If you want the order and the speed of Hashtbl.t
, Art is your library:
The function prefix_iter
is also available if you want to get a subset of your
tree:
# let t = Art.make () ;;
# Art.insert t (Art.key
# Art.insert t (Art.key "Dalton Joe") 0 ;;
# Art.insert t (Art.key "Dalton Jack") 1 ;;
# Art.insert t (Art.key "Dalton William") 2 ;;
# Art.insert t (Art.key "Dalton Averell") 3 ;;
# Art.insert t (Art.key "Rantanplan") 4 ;;
# let dalton = Art.prefix_iter ~prefix:(Art.key "Dalton")
(fun k _ a -> (k :> string) :: a) [] t ;;
- : string list = [ "Dalton Joe"
; "Dalton Jack"
; "Dalton William"
; "Dalton Averell" ]
ROWEX is a second implementation of ART with atomic operations. It's a functor
which expects an implementation of atomic operations such as load
or store
.
The current version of OCaml has a global lock for the GC. By this way, it's not
possible for us to execute ROWEX operations (find
/insert
) with true
parallelism if we use the same OCaml runtime. Even if you use LWT or ASYNC, you
execute jobs concurrently.
However, ROWEX wants to provide an implementation where find
/insert
can be
executed in parallel without any problems (race condition or ABA problem). So
ROWEX provides an implementation, persistent
, which implements atomic
operations on a memory area. Then, we are able, as parmap
, to
simulate true parallelism as long as each operations are executed into their own
fork()
.
The goal of this library is provide:
- the most easy way to switch the implementation to ocaml-multicore
- a baby step to be able to manipulate a file by several processes
(consumers/
find
, producers/insert
) in parallel
ROWEX follows two main papers:
The distribution comes with some tools to manipulate an index:
$ opam pin add -y https://github.com/dinosaure/art
$ opam install rowex
$ part.make index.idx
$ ls -lh
-rw-r--r-- 1 user user 8,0M ----- -- --:-- index.idx
prw------- 1 user user 0 ----- -- --:-- index.idx.socket
prw------- 1 user user 0 ----- -- --:-- index.idx-truncate.socket
$ part.insert index.idx foo 1
$ part.find index.idx foo
1
On the OCaml side, a Part
module exists which implements these functions:
type 'a t constraint 'a = [< `Rd | `Wr ]
val create : ?len:int -> string -> unit
val insert : [> `Rd | `Wr ] t -> string -> int -> unit
val lookup : [> `Rd ] t -> string -> int
part
is Unix dependent (and it need an Unix named pipe). It ensures with
explained internal mechanisms to use multiple readers and one writer:
- The writer can take the exclusive ownership on the index file and its named pipe
- readers don't need to take the ownership but they must send a signal into the named pipe (to the writer) that they start to introspect the index
For readers, some functions exist to signal their existence to the write:
val append_reader : Ipc.t -> unit
val delete_reader : Ipc.t -> unit
val ipc : _ t -> Ipc.t
This part of the distribution is experimental - even if the distribution
comes with several tests to ensure that the implementation works, ROWEX is
fragile! It still need a synchronization mechanism fsync()
which is added
pervasively in some parts of the code according to outcomes of errors.