From 51386f4a48b56aa7d9c0c4dfb51297365011fd18 Mon Sep 17 00:00:00 2001 From: Michael Peyton Jones Date: Thu, 14 Dec 2023 15:00:42 +0000 Subject: [PATCH 1/5] CPS for better builtin datastructures in Plutus --- CIP-????/README.md | 97 ++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 97 insertions(+) create mode 100644 CIP-????/README.md diff --git a/CIP-????/README.md b/CIP-????/README.md new file mode 100644 index 000000000..87be829a4 --- /dev/null +++ b/CIP-????/README.md @@ -0,0 +1,97 @@ +--- +CPS: ???? +Title: Lack of builtin data structures with good asymptotic performance in Plutus Core +Status: Proposed +Category: Plutus +Authors: + - Michael Peyton Jones + - Philip DiSarro + - Pi Lanningham +Discussions: +Created: 2023-12-14 +License: CC-BY-4.0 +--- + +## Abstract + +Plutus Core lacks builtin data structures with good asymptotic performance for some use cases. + +## Problem + +Plutus Core has a few builtin data structures, but these are mostly used to make a minimally adequate representation of the `Data` type. +It does not have builtin data structures optimized for performance. + +Users can implement their own data structures (since Plutus Core is an expressive programming language), but in practice this has not happened much. +In particular, we will focus on two examples here: + +1. Arrays with constant-time lookup +2. Maps with logarithmic-time lookup (also Sets, but we can treat them as a special case of Maps) + +Both of these are difficult to implement in Plutus Core: + +1. Arrays are (we believe) impossible without some kind of primitive with constant-time lookup +2. Maps are possible but are typically moderately complex data structures which require a lot of code, and this has not been done in practice + +## Use cases + +### Arrays + +#### Order matching + +A common pattern in DEXs is to have a list of inputs/outputs to match up in a datum. +For example, we might have: +``` +inputIdxs :: BuiltinList Integer +outputIdxs :: BuiltinList Integer +``` +We then want to go through these lists, looking up the corresponding inputs and outputs and check some property (e.g. that the value is directly transferred from one to the other). + +This requires a quadratic amount of work, which puts a low ceiling on how many orders can be processed at once. +Empirically, many are capped at about 30, whereas if they were limited only by the amount of space in the transaction for inputs and outputs the limit would be hundreds. + +If we had arrays with constant time indexing, we could make this linear instead (we would still need to do a linear amount of work to create arrays for the transaction inputs and outputs from the lists in the script context). + +#### `Constr` arguments + +The `Data` type has a `Constr` alternative which is used for encoding datatype constructors. +This is used for encoding the script context, and is used by languages such as Aiken extensively for representing user-defined datatypes also. + +The fields of the constructor are encoded in a list; hence to access a particular field the compiled code needs to do a linear amount of work. +If the arguments to a `Constr` were an array, we could access the fields in constant time. + +### Maps + +#### Operations on `Value` + +The `Value` type is a nested map: it is a map from bytestrings (representing policy IDs) to maps from bytestrings (representing token names) to integers (representing quantities). +Since map operations are currently linear, this means that even simple operations like checking whether one value is less than another can have quadratic cost. + +This would be much better if map operations were logarithmic cost. + +#### Indexing by party + +Many applications have a known set of participants identified by some bytestring, typically a public key. +It is therefore natural to store per-party state in a map indexed by the party identifier. + +Since map operations are currently linear, this is needlessly expensive and imposes a limit on the number of parties. + +## Goals + +1. Reduce the cost of operations on `Value` by a factor of 2-10 +2. Reduce the cost of a matching algorithm such that we can handle hundreds of matches for the same cost it currently takes to do 30. + +## Open questions + +- Can we implement a set/map data structure in Plutus Core code that has acceptable performance and doesn’t require too much size overhead? +- Do we need generic maps or is a map-from-bytestring sufficient? What about map-from-integer? + - Generic maps are harder since we typically need to know how to order the key type +- Is an array type useful even if it is immutable? + - We are unlikely to be able to offer mutable arrays +- Are builtin data structures useful enough even if they can only contain builtin types? + - This would mean that complex data structures would have to be stored inside arrays as `Data`, rather than using Scott encoding or sum-of-products representation +- Can we feasibly change the structure of the builtin `Data` type so that `Constr` arguments are in an array? + - We would need to retain both versions for backwards compatibility + +## References + +- https://x.com/Quantumplation/status/1733298551571038338?s=20 From 6bbbff9e0c028eb1a178c2b57358da0ebbbf7a21 Mon Sep 17 00:00:00 2001 From: Robert Phair Date: Tue, 2 Jan 2024 23:30:12 -0500 Subject: [PATCH 2/5] match CIP title to broad, brief PR title --- CIP-????/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/CIP-????/README.md b/CIP-????/README.md index 87be829a4..e8da5a6ad 100644 --- a/CIP-????/README.md +++ b/CIP-????/README.md @@ -1,6 +1,6 @@ --- CPS: ???? -Title: Lack of builtin data structures with good asymptotic performance in Plutus Core +Better builtin data structures in Plutus Status: Proposed Category: Plutus Authors: From 8d73071ab2c0175fbb3c14e514199fb4b1440d1c Mon Sep 17 00:00:00 2001 From: Michael Peyton Jones Date: Tue, 30 Jan 2024 11:24:54 +0000 Subject: [PATCH 3/5] Address comments --- CIP-????/README.md | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/CIP-????/README.md b/CIP-????/README.md index e8da5a6ad..8bb24605f 100644 --- a/CIP-????/README.md +++ b/CIP-????/README.md @@ -39,6 +39,8 @@ Both of these are difficult to implement in Plutus Core: #### Order matching A common pattern in DEXs is to have a list of inputs/outputs to match up in a datum. +In some cases the order is highly significant, e.g. earlier orders should be processed first, and the outcome of processing an earlier order may affect later ones. + For example, we might have: ``` inputIdxs :: BuiltinList Integer @@ -49,9 +51,10 @@ We then want to go through these lists, looking up the corresponding inputs and This requires a quadratic amount of work, which puts a low ceiling on how many orders can be processed at once. Empirically, many are capped at about 30, whereas if they were limited only by the amount of space in the transaction for inputs and outputs the limit would be hundreds. -If we had arrays with constant time indexing, we could make this linear instead (we would still need to do a linear amount of work to create arrays for the transaction inputs and outputs from the lists in the script context). +If we had arrays with constant time indexing, we could make this linear instead. +Note that unless we also implemented the "Data fields" suggestion below we would still need to do a linear amount of work to create arrays for the transaction inputs and outputs from the lists in the script context. -#### `Constr` arguments +#### Data fields The `Data` type has a `Constr` alternative which is used for encoding datatype constructors. This is used for encoding the script context, and is used by languages such as Aiken extensively for representing user-defined datatypes also. @@ -59,6 +62,8 @@ This is used for encoding the script context, and is used by languages such as A The fields of the constructor are encoded in a list; hence to access a particular field the compiled code needs to do a linear amount of work. If the arguments to a `Constr` were an array, we could access the fields in constant time. +Similarly, the `List` and `Map` constructors of `Data` could use arrays. + ### Maps #### Operations on `Value` @@ -73,7 +78,7 @@ This would be much better if map operations were logarithmic cost. Many applications have a known set of participants identified by some bytestring, typically a public key. It is therefore natural to store per-party state in a map indexed by the party identifier. -Since map operations are currently linear, this is needlessly expensive and imposes a limit on the number of parties. +Since map operations currently have much worse complexity than a good map data structure (often linear/quadratic instead of logarithmic/linear), this is needlessly expensive and imposes a limit on the number of parties. ## Goals From 11cd9d3af587e56b396df939cacb3f06e4d2d434 Mon Sep 17 00:00:00 2001 From: Robert Phair Date: Tue, 30 Jan 2024 20:47:48 +0530 Subject: [PATCH 4/5] fix missing Title term from YAML header --- CIP-????/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/CIP-????/README.md b/CIP-????/README.md index 8bb24605f..8515c9ae6 100644 --- a/CIP-????/README.md +++ b/CIP-????/README.md @@ -1,6 +1,6 @@ --- CPS: ???? -Better builtin data structures in Plutus +Title: Better builtin data structures in Plutus Status: Proposed Category: Plutus Authors: From 5ec27da12ce01197fc9202f890e4914231ee9d4a Mon Sep 17 00:00:00 2001 From: Robert Phair Date: Tue, 6 Feb 2024 23:47:20 +0530 Subject: [PATCH 5/5] assign CPS number 13 --- CIP-????/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/CIP-????/README.md b/CIP-????/README.md index 8515c9ae6..40d869b06 100644 --- a/CIP-????/README.md +++ b/CIP-????/README.md @@ -1,5 +1,5 @@ --- -CPS: ???? +CPS: 13 Title: Better builtin data structures in Plutus Status: Proposed Category: Plutus