Deggen (deggen@kschw.com)
Damian Orzepowski (damian.orzepowski@4chain.studio)
Wojciech Regulski (wojciech.regulski@4chain.studio)
Arkadiusz Osowski (arkadiusz.osowski@4chain.studio)
We propose a binary format for sending Transactions between peers to allow Simple Payment Verification (SPV). The format is optimized for minimal bandwidth while maintaining data required to independently validate the transaction in full.
Assumption: Every user has an independent source of Block Headers indexed by Merkle Root.
The simplest form is two transactions, one confirmed in a block which has a corresponding Merkle path and includes an output which is being spent in the second. The second is a newly signed transaction which constitutes the actual payment.
In cases where one or more inputs are not yet mined, each ancestral transaction is included. We step back through the Transaction DAG until every input has a corresponding parent transaction with a Merkle path.
This format is aligned with Dr. Craig Wright's explanation of SPV from this article. He makes reference to a new paradigm, this format is an attempt to bring forth that paradigm.
Users can adopt this format to transmit the data required to independently verify that a transaction is valid and is spending a real utxo from an existing block. The control mechanism to ensure no previous spend of the utxo is the economics and law. The format is intended for micropayments, so the risk is low. It is also the case that there is a cost to faking the data, since the attacker would really have to have previously owned an actual utxo. Any malfeasant would be signing incriminating evidence and sending it directly to the plaintiff if they were to defraud someone using this format. Without Merkle paths the data would be easier to fake, and there is no skin in the game required since the data could be randomly generated. Considering all these factors, the validation process is detailed in a later section.
This BRC is licensed under the Open BSV license.
Simplified Payment Verification formats for transmitting transactions between peers has yet to see wide adoption. This proposal advocates for complete ecosystem adoption of the principles of SPV; acknowledges that the system is secured by economic incentives, and law; and lays out a binary format for transmitting the data between parties optimizing for minimal bandwidth.
Three prior formats should be mentioned:
Extended Format BRC-30 incorporates the utxo script for script evaluation and satoshis for checking amounts. This format would still work when sending to nodes as they have a utxo lookup which is indexed by a hash of the extended data, meaning that invalid EF data would be detected immediately.
Tx Ancestors which was created for use within the Lite Client Toolbox, this uses an array of rawtxs, Merkle proofs, and Mapi responses to transport the data required for SPV.
Everett-style Transaction Envelopes BRC-8 uses a recursive JSON format which is otherwise similar to the above format originally designed for use within DPP. One of the ideas which this proposal highlights is that the target of merkle proofs could be height rather than blockhash or root, to save on bytes. Thinking along these lines, we propose that no target be provided at all, and the root simply be calculated at the receieving end as needed. Using the root to look up headers also avoids any implementation complexity associated with competing blocks. For example, when a competing block encapsulates the transactions it may not have the same height, if all transactions are being stored with paths at a height rather than blockhash or merkle root (which are unique to specific versions of a block) then updating a set of the transactions with merkle paths from the new block will be difficult to sort out. If they're indexed by blockhash then it would be trivial to set them all as "in need of updated paths" without the need for disambiguation.
Mapi is to be deprecated in the near future, so any new recommended formats should not include Mapi Responses as part of the solution.
The array of rawtxs within the Tx Ancestors spec makes some sense in that there are strange cases where the same rawtx has two outputs which are spent in different transactions, both within the ancestry of the tx we are validating. You don't want to have embedded a copy of the same proof twice, hence a hash map would make more sense than just including the merkle path bytes in the tx itself. Everett-style Transaction Envelope deals with this well even given its recursive object design. Txids are used as pointers to existing transactions when complex dependency graphs occur. This proposal uses the same thinking, but for a binary format lists are used rather than recursive objects.
The ordering of data has streaming validation in mind - an idea raised in EF BRC-30 - such that a receiver can start processing the validation immediately on receipt of the first few bytes, and any later bytes rely on previous bytes for their validation to even begin.
- Merkle Paths
- Oldest Tx Anchored by Path
- Newer Txs depending on Oldest parent
- Newest Tx
As soon as the Merkle Paths are receieved we can calculate the roots and lookup their blockheaders. If they're not valid then validation can stop there - rejected. Then we look at the oldest ancestor - if it's valid then its children can be validated, and so on until we reach the most recent tx.
BEEF combines thinking from several formats into one binary stream, prefixed with a very specific version number for disambiguation.
The encoding version number 4022206465 is chosen such that when seen in hex encoded as 4 bytes little endian it reads: 0100BEEF
. This is to allow multiple versions to be defined in future while keeping the data minimal and leaving an obvious sequence which developers can eyeball. Often Bitcoin developers will see a sequence 0100000000... for rawtx. When they instead see 0100BEEF... they will know this data is BEEF format when debugging and so on. If there are future improvements then the next version would be 0200BEEF
for example, this marker will remain "BEEF" for tens of thousands of versions until we reach 4022271999 which is obviously much more than would ever be required.
Field | Description | Size |
---|---|---|
Version no | Version number starts at 4022206465, encoded Uint32LE => 0100BEEF |
4 bytes |
nBUMPs | VarInt number of BSV Unified Merkle Paths which follow | 1-9 bytes |
BUMP data | All of the BUMPs required to prove inclusion of inputs in longest chain of blocks BRC-74 | many bytes x nBUMPs |
nTransactions | VarInt number of transactions which follow | 1-9 bytes |
Raw Transaction | RawTx bytes as in standard format BRC-12 | many bytes |
Has BUMP | 01 if so, followed by the BUMP index; 00 if not, followed by nothing. |
1 byte |
BUMP index | VarInt index number - indicating the BUMP to which the prior tx belongs if there is one. | 1-9 bytes |
Order is important - we must ensure that we end with the tx being evaluated, and its inputs are above, and their inputs are above that. Khan's algorithm is a well known solution to the problem in Graph Theory. Running this on the transaction DAG subset is recommended for complex transaction chains where order is not clear. Example below to demonstrate with extraneous data removed for clarity.
// khan's algorithm
function khanTopologicalSort(graph) {
const inDegree = {}
const queue = []
const result = []
for (let node in graph) {
inDegree[node] = 0
}
for (let node in graph) {
for (let neighbor in graph[node]) {
inDegree[neighbor]++
}
}
for (let node in inDegree) {
if (inDegree[node] === 0) {
queue.push(node)
}
}
while (queue.length) {
let node = queue.shift()
result.push(node)
for (let neighbor in graph[node]) {
inDegree[neighbor]--
if (inDegree[neighbor] === 0) {
queue.push(neighbor)
}
}
}
return result.reverse()
}
const txs = [
{
txid: '2222222222222222222222222222222222222222222222222222222222222222',
inputs: ['1111111111111111111111111111111111111111111111111111111111111111'],
},
{
txid: '1111111111111111111111111111111111111111111111111111111111111111',
inputs: ['0000000000000000000000000000000000000000000000000000000000000000'],
},
{
txid: '0000000000000000000000000000000000000000000000000000000000000000',
inputs: [],
},
{
txid: '4444444444444444444444444444444444444444444444444444444444444444',
inputs: [
'3333333333333333333333333333333333333333333333333333333333333333',
'2222222222222222222222222222222222222222222222222222222222222222',
],
},
{
txid: '3333333333333333333333333333333333333333333333333333333333333333',
inputs: [
'2222222222222222222222222222222222222222222222222222222222222222',
'1111111111111111111111111111111111111111111111111111111111111111',
],
},
]
const graph = {}
for (let tx of txs) {
graph[tx.txid] = {}
for (let input of tx.inputs) {
graph[tx.txid][input] = true
}
}
console.log({ graph })
console.log({ correctOrder: khanTopologicalSort(graph) })
0100beef01fe636d0c0007021400fe507c0c7aa754cef1f7889d5fd395cf1f785dd7de98eed895dbedfe4e5bc70d1502ac4e164f5bc16746bb0868404292ac8318bbac3800e4aad13a014da427adce3e010b00bc4ff395efd11719b277694cface5aa50d085a0bb81f613f70313acd28cf4557010400574b2d9142b8d28b61d88e3b2c3f44d858411356b49a28a4643b6d1a6a092a5201030051a05fc84d531b5d250c23f4f886f6812f9fe3f402d61607f977b4ecd2701c19010000fd781529d58fc2523cf396a7f25440b409857e7e221766c57214b1d38c7b481f01010062f542f45ea3660f86c013ced80534cb5fd4c19d66c56e7e8c5d4bf2d40acc5e010100b121e91836fd7cd5102b654e9f72f3cf6fdbfd0b161c53a9c54b12c841126331020100000001cd4e4cac3c7b56920d1e7655e7e260d31f29d9a388d04910f1bbd72304a79029010000006b483045022100e75279a205a547c445719420aa3138bf14743e3f42618e5f86a19bde14bb95f7022064777d34776b05d816daf1699493fcdf2ef5a5ab1ad710d9c97bfb5b8f7cef3641210263e2dee22b1ddc5e11f6fab8bcd2378bdd19580d640501ea956ec0e786f93e76ffffffff013e660000000000001976a9146bfd5c7fbe21529d45803dbcf0c87dd3c71efbc288ac0000000001000100000001ac4e164f5bc16746bb0868404292ac8318bbac3800e4aad13a014da427adce3e000000006a47304402203a61a2e931612b4bda08d541cfb980885173b8dcf64a3471238ae7abcd368d6402204cbf24f04b9aa2256d8901f0ed97866603d2be8324c2bfb7a37bf8fc90edd5b441210263e2dee22b1ddc5e11f6fab8bcd2378bdd19580d640501ea956ec0e786f93e76ffffffff013c660000000000001976a9146bfd5c7fbe21529d45803dbcf0c87dd3c71efbc288ac0000000000
0100beef // version
01 // VarInt nBUMPs
fe636d0c0007021400fe507c0c7aa754cef1f7889d5fd395cf1f785dd7de98eed895dbedfe4e5bc70d1502ac4e164f5bc16746bb0868404292ac8318bbac3800e4aad13a014da427adce3e010b00bc4ff395efd11719b277694cface5aa50d085a0bb81f613f70313acd28cf4557010400574b2d9142b8d28b61d88e3b2c3f44d858411356b49a28a4643b6d1a6a092a5201030051a05fc84d531b5d250c23f4f886f6812f9fe3f402d61607f977b4ecd2701c19010000fd781529d58fc2523cf396a7f25440b409857e7e221766c57214b1d38c7b481f01010062f542f45ea3660f86c013ced80534cb5fd4c19d66c56e7e8c5d4bf2d40acc5e010100b121e91836fd7cd5102b654e9f72f3cf6fdbfd0b161c53a9c54b12c841126331 // see BRC-74 for details of BUMP format
02 // VarInt nTransactions = 2
// rawtx parent follows
0100000001cd4e4cac3c7b56920d1e7655e7e260d31f29d9a388d04910f1bbd72304a79029010000006b483045022100e75279a205a547c445719420aa3138bf14743e3f42618e5f86a19bde14bb95f7022064777d34776b05d816daf1699493fcdf2ef5a5ab1ad710d9c97bfb5b8f7cef3641210263e2dee22b1ddc5e11f6fab8bcd2378bdd19580d640501ea956ec0e786f93e76ffffffff013e660000000000001976a9146bfd5c7fbe21529d45803dbcf0c87dd3c71efbc288ac00000000
01 // above tx has merkle path
00 // VarInt the index of the path for this tx in the above list
// rawtx current payment follows
0100000001ac4e164f5bc16746bb0868404292ac8318bbac3800e4aad13a014da427adce3e000000006a47304402203a61a2e931612b4bda08d541cfb980885173b8dcf64a3471238ae7abcd368d6402204cbf24f04b9aa2256d8901f0ed97866603d2be8324c2bfb7a37bf8fc90edd5b441210263e2dee22b1ddc5e11f6fab8bcd2378bdd19580d640501ea956ec0e786f93e76ffffffff013c660000000000001976a9146bfd5c7fbe21529d45803dbcf0c87dd3c71efbc288ac00000000
00 // above tx doesn't have merkle path, but instead has local parent
- We parse the BUMPs storing each in an array so that we can address them by index later.
- We parse each transaction.
- RawTx bytes double sha256 to get the txid.
- Store in hashmap from txid => parsedTx
- If there is a Merkle path after the tx then we lookup the BUMP using the BUMP index number.
- Lookup the txid within level 0 leaves of the BUMP to get the index of the txid within a block.
- Calculate the Merkle root with the index, txid, and BUMP data.
- Add the merkle root to an array which will be used in a request to our local header service once all transactions have been parsed.
- Mark the tx as valid or not as soon as the header service responds.
- Otherwise we run local validation on the transaction, stopping at the soonest failure.
- Check the txid exists in memory, and is marked as valid.
- Check that all scripts evaluate to TRUE.
- Check that the sum of satoshis in > satoshis out + fee
- Mark the tx as valid.
- We make a request to the local header service as soon as we have all merkle roots calculated.
- This involves sending a list of Merkle roots to the Pulse service which will validate that the merkle roots provided are all part of headers within the longest chain.
- If any of the roots are not part of the longest chain, the response is negative and the whole validation has failed.
- We await the final transaction being marked as valid since it depends on all other processes.
The format will be built into BUX to begin real world testing. Thereafter common libraries such as bsv and go-bt for easy incorporation into existing stacks.