Inefficient payload encoding #289

AdamZWu · 2023-10-05T16:21:34Z

The in-toto specs use DSSE to contain the statement data and carry signatures.

The statement is first serialized as a JSON string;
Then base64 encoded and stored in the "payload" field of DSSE;
Then the whole envelope is serialized as a JSON string.

The base64 encoding step incurs a 33% overhead, which really has no benefit, because the JSON serialized statement is already a legit string. This overhead will result in adoption difficulties for resource-constrained CI/CD, and is generally wasteful (e.g. some complex artifacts may generate 1GB provenance data; and 33% of that is 330MB, to store and transfer for every build).

Could we have some more efficient solutions?

For example, updating DSSE spec to not require all payload be base64 encoded -- the MIME type "application/vnd.in-toto+json" would indicate the payload is in text format and can be directly consumed.

Alternatively, if we must use base64 for DSSE payload, could we introduce something like "application/vnd.in-toto+json+lzma" which compresses the serialized statement first?

TomHennen · 2023-10-05T18:59:42Z

I seem to recall it base64 was chosen regardless of the string type partly because it helps avoid problems of deserialization attacks and probably also to prevent having to escape stuff like this.

However, I do wonder if it would help to specify an encoding that is base64(compress(SERIALIZED_BODY)).

I seem to remember we may have done some experimentation on this internally?

adityasaky · 2023-10-05T19:06:16Z

Side note: does this belong in https://github.com/secure-systems-lab/dsse, apart from any changes to in-toto's media type as a consequence of a DSSE change?

AdamZWu · 2023-10-05T19:09:23Z

@adityasaky: if we are changing the base64 encoding, then it is more of a DSSE work; if we are not changing that, but compressing serialized statements, then I think it is an in-toto work.

TomHennen · 2023-10-05T19:11:02Z

I suspect

We'd want https://github.com/secure-systems-lab/dsse to make some recommendations on what/how to do this
https://github.com/in-toto/attestation/blob/main/spec/v1/envelope.md#fields would need to be updated either way?

TomHennen · 2023-10-13T15:14:36Z

So generally folks are open to some solution here. We'd probably be looking for a PR that defines whatever the proposal is along with some code that actually does it.

MarkLodato · 2023-10-16T12:47:21Z

This issue should be closed in favor of secure-systems-lab/dsse#63. All changes need to happen there, since this is a DSSE issue.

AdamZWu · 2023-10-17T14:12:18Z

Also mentioned in secure-systems-lab/dsse#63:

As another alternative, the bundle format selected by in-toto, JSON lines, also offers a compression mode.

If we were to compress the bundle, does that make it again an in-toto issue? :P

Maybe compressing the bundle would yield a better data reduction, as a bundle will likely contain multiple attestations, and if these attestations are for the same set of artifacts, the "subject" fields will be repeated multiple times. A bundle-level compression would be able to discover the redundancy, something a statement-level compression cannot achieve.

deeglaze · 2024-01-12T22:50:28Z

The CoRIM draft is proposing a COSE_Sign1 envelope around a CBOR-serialized object for compact representation. Avoid JSON altogether. It seems like they are really similar to in-toto, just that they have some notions of predefined predicates and limited extensibility... I'm still trying to suss out where we can remove redundancy across the two efforts.

marcelamelara · 2024-06-28T14:15:07Z

Related #361

TomHennen · 2024-06-28T14:17:29Z

Discussed at today's attestation maintainers meeting.

We're open to both of these things. Our main concern would be on interoperability. Having multiple ways to encode and represent attestations could significantly hinder adoption. One way to resolve this might be with a 'generic' converter that could convert newer encodings to a canonical JSON encoding as needed.

We'd be happy to review PRs if folks who are highly motivated here want to submit them.

AdamZWu mentioned this issue Oct 5, 2023

Reducing overhead for payload encoding secure-systems-lab/dsse#63

Open

marcelamelara added the triage label Oct 10, 2023

TomHennen added enhancement New feature or request and removed triage labels Oct 13, 2023

marcelamelara added the triage label Jan 10, 2024

marcelamelara removed the triage label Jun 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inefficient payload encoding #289

Inefficient payload encoding #289

AdamZWu commented Oct 5, 2023

TomHennen commented Oct 5, 2023

adityasaky commented Oct 5, 2023 •

edited

Loading

AdamZWu commented Oct 5, 2023

TomHennen commented Oct 5, 2023

TomHennen commented Oct 13, 2023

MarkLodato commented Oct 16, 2023

AdamZWu commented Oct 17, 2023 •

edited

Loading

deeglaze commented Jan 12, 2024

marcelamelara commented Jun 28, 2024

TomHennen commented Jun 28, 2024

Inefficient payload encoding #289

Inefficient payload encoding #289

Comments

AdamZWu commented Oct 5, 2023

TomHennen commented Oct 5, 2023

adityasaky commented Oct 5, 2023 • edited Loading

AdamZWu commented Oct 5, 2023

TomHennen commented Oct 5, 2023

TomHennen commented Oct 13, 2023

MarkLodato commented Oct 16, 2023

AdamZWu commented Oct 17, 2023 • edited Loading

deeglaze commented Jan 12, 2024

marcelamelara commented Jun 28, 2024

TomHennen commented Jun 28, 2024

adityasaky commented Oct 5, 2023 •

edited

Loading

AdamZWu commented Oct 17, 2023 •

edited

Loading