pocus
project aims to offer a flexible way to produce cryptographic
hashes using fast and distributed computations. This project should
offer a framework providing a high level abstraction for different
kind of methods, mechanisms and strategies used in cryptography.
Keywords: Merkle Trees; Hash Chains; Sequential hashing; Hash Packing; Bloom Filters; Consistent Hashing; Hash Pointers; Cuckoo Hashing; Perfect Hashing; Hash Graphs; Salted Hashing; Hash Tables; Hash-based Encryption; Prefix Hash Tree; Sparse Merkle Trees; Hash-based Signatures; Incremental Hashing; Adaptive Hashing; Deduplication Hashing; Hash-Driven Key Stretching; Homomorphic Hashing; Deterministic Hashing; Trapdoor Hash Functions; SimHash; Keyed Hash Functions; Hash Tree Roots; Hash Aggregation; Hash Rings; Composite Hash Functions.
rebar3 compile
rebar3 ct
% create a new "plan"
{ok, R} = pocus:execute().
% check it
true = pocus:execute(R).
-
Academic and educational purpose
-
Documentation
-
Test
-
Notes and References
-
High Level Interfaces (methods)
- Sequential Hashing
- Hash Packing
- SimHash
-
Low Level Interfaces
- BearSSL
- PolarSSL
- WolfSSL
- Pure C
- Pure Assembly (x86-64/risc-v)
-
Cryptographic Hash Functions Support
- md4
- md5
- md6
- sha1
- sha-2 family
- sha-3 family
- blake family
- ripemd160
- skein
- whirlpool
- gost
-
Non-Cryptographic Hash Functions Support
- Fast-Hash
- Bernstein Hash
- GxHash
- pHash
- dHash
This project has been created at first as an assignment for an interview. Unfortunately during the implementation, I found definition on certain terms were not clear. Even more, no high level implementations in Erlang were available.
The tasks assigned were confusing, because of the terms used without clear definition. This document is rewriting things based on references found on internet or generated using OpenAI.
-
Hash Packing (cryto-currency): Hash packing can be interpreted as a method where multiple discrete data items are combined into a single composite data structure and then hashed using a cryptographic hash function to produce a single output hash. This process is commonly implemented using data structures like Merkle trees, as previously discussed. Here, the idea is to aggregate data efficiently so that the integrity and authenticity of the entire dataset can be verified through a single hash, the Merkle root. Hash packing is often used to create a compact and efficient representation of a large dataset for purposes like proving data integrity without needing to handle the entire dataset. Hash Packing output is a single hash representing a set of data items. In blockchain technology, transactions within a block can be hashed together using a Merkle tree, resulting in a single hash that effectively represents all transactions.
-
Combines many data items into one hash.
-
Utilizes structures like Merkle trees for efficient data aggregation and verification.
-
Commonly used to ensure data integrity and enable efficient verification processes (e.g., in blockchain transactions).
-
-
Sequential Hashing (cryptography): Sequential Hashing refers to the process of applying a hash function to data sequentially. This could mean hashing data in a series where the output of hashing one piece of data may be used as input for the next hashing operation, or continuously feeding data into a hash function as it is received. Sequential hashing is typically used in scenarios where the order and continuity of data are critical, such as in securing a timeline of records or events. In a blockchain, each block's header contains the hash of the previous block's header, creating a sequential chain of block hashes. This chain secures the entire blockchain by making it resistant to tampering.
-
Data is processed in a sequence, one piece after another.
-
Each piece of data can be dependent on the hash of the previous piece, creating a chain of hashes.
-
Often used for creating hash chains or for situations where data needs to be timestamped or added in a specific order, such as in some forms of ledger or blockchain technology.
-
-
Proof of Custody (blockchain/crypto-currency): Proof of custody in the context of cryptocurrency and blockchain technology generally refers to a mechanism or protocol by which a participant (often called a prover) can demonstrate to others (called verifiers) that they correctly possess certain data without actually revealing the data itself. This is particularly important in scenarios where large amounts of data are stored off-chain or in decentralized storage systems, as in the case of some scaling solutions like sharding.
This exercise has been solved using only
crypto
module. A NIF
using fast SHA-256
implementation
has been used, but this implementation does not support SHA2
extensions, that
means this code is slower than
OpenSSL/LibreSSL
SHA-2 implementation. Others test have been conducted with:
Those libraries offer barely the same speed over SHA-256. In conclusion, only Erlang crypto module implementation has been retained, but the NIF has been kept as example, here how to use it.
# build pocus_nif.so and copy it into priv
cd c_src
# gmake on openbsd.
make
cp pocus_nif.so ../priv
cd ..
# execute rebar3 shell
rebar3 shell
% start playing with the implementation
pocus_nif:sha256(<<"test">>).
This implementation is ~10x slower than crypto module using OpenSSL/LibreSSL.
To improve the computation on some part of the code, in particular in chunk generation part, a small jobs manager has been created, dividing the computation in dedicated processus, spawned on demand. A more classical method with a pool of worker already spawned could have been created though.
The current model is not really scalable and flexible. A
pocus_sequential
module defined as
gen_server
would have probably better. When started a process would
then keep the state of the computation and all event would have been
treated as step to produce the final hash. A tested PoC has been also
added in this exercise.
Choosing Best Hashing Strategies and Hash Functions by Mahima Singh and Deepak Garg
Parallel cryptographic hashing: Developments in the last 25 years by Neha Kishore and Priya Raina
Cipher and Hash Function Design Strategies based on linear and differential cryptanalysis by Joan Daemen
Analysis and Design of Cryptographic Hash Functions by Bart PRENEEL
Sequential Hashing with Minimum Padding by Shoichi Hirose
Suffcient conditions for sound tree and sequential hashing modes by Guido Bertoni, Joan Daemen, Michaël Peeters, and Gilles Van Assche
New Methodes for Sequential Hashing with Supertrace by Jürgen Eckerle and Thomas Lais
Sakura: a flexible coding for tree hashing by Guido Bertoni, Joan Daemen, Michaël Peeters, and Gilles Van Assche
Block-Cipher-Based Tree Hashing by Aldo Gunsing
SimHash: Hash-based Similarity Detection by Caitlin Sadowski and Greg Levin
In Defense of MinHash Over SimHash by Anshumali Shrivastava and Ping Li