Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Serialization #176

Open
adria0 opened this issue Oct 1, 2024 · 8 comments
Open

Serialization #176

adria0 opened this issue Oct 1, 2024 · 8 comments

Comments

@adria0
Copy link
Member

adria0 commented Oct 1, 2024

from @davidnevadoc

In halo2curves, the serialization of fields could be improved. There are methods that support different:
Endianness
Limbs, Bytes, bits
Motgomery form, non-montgomery form
Some are implemented through traits like SerdeObject others are implemented in PrimeField, others are implemented directly on the field. It would be great to remove redundancies, estabilish a clear name convention and document the end result.

Related issue: #109

First round, initial idea

Halo2curves serializations comes with two flavours:

  • Standard serialization

    • API
      • Implement trait Field.{from,to}_repr fn
      • Implement trait group::GroupEncoding (for compressed), and group::UncompresseEncoding (for uncompressed), allowing "unchecked" reads.
    • To match the serialization "standards", needs predefined format, configurable via macro at field/curve definition, not configurable when serialize/deserialize:
      • Fields: field endian in field_impl! macro
      • Curves: impl EndianRepr for endianess, impl Compressed for flags and to uncompress.
    • Since repr does not have to match with encoding endianess, fields implements {from,to}_bytes(). Let's say that since we are using PrimeFields, we do not need checked reads here because this is only checking that fits inside modulos.
    • Implements serde wrappers over group:xxxxEncodings
      • Note that allows hex encodings (::hex::serde::deserialize(deserializer)) and I'm not very sure if this could break constant-time without notice
  • Vainilla "raw" serialization

    • API
      • fn {from, to}_raw_bytes_[unchecked]
      • fn {read,write}_raw_[unchecked]
    • Quick, just little endian, no special flags.
    • Implememented via SerdeObject trait (which is not serde) at Field, Curve and tower
  • Others

Notes:

  • I really would like to get rid of EndianRepr but it seems like some kind of type-hack to allow to knowing the endiannes of CurveAffine::BASE when encoding compressing curves. Since our halo2curves takes traits from pasta, we are not able to modify it.
  • There are other ways to build fields:
    • Field.from_raw() is not designed for serialization, and it seems using this name. A more rational name will be from_integer IMO, but we cannot change this because from_raw is used in a derive calling pasta_curves at this moment.
    • from_uniform_bytes for random ones
    • from_mont for montgomery

Unserialize performance:

image

legend:

  • cur / afi : Curve / Affine
  • chk / unchk : Checked / Unchecked
  • comp / uncomp : Compressed / Uncompressed

code: https://gist.github.com/adria0/c440185d765a368aaf21ca5741a63ab7

Initial proposal:

  • Remove SerdeObject. Why we need this? If we want to quick process data between trusted parties we have UncompresseEncoding::from_uncompressed_unchecked (that is in the order of magnitude of SerdeObject decodings), if not we can use GroupEncoding::from_bytes that fits into standards. Also having a mix of endianess if we use one or another API just creates confusion IMO. The memory footprint that would save by using {read,write}_raw will be really low, so even if we decide to enter into zerocopy stuff, r/w traits here are not a significant improvment.
  • IMO Arkworks is the standard of interoperability at this moment, we must encode everything as this guys do.
  • Implement serde for Field, is not implemented
  • EndianRepr::from_bytes, EndianRepr::to_bytes is implemented in Field (that aready have these fields). This is extra confusing, these are just intended to be used internally as helpers. So rename this fields for something like endianrepr_{from,to}_bytes

Comments: @davidnevadoc @duguorong009 @kilic ?

@guorong009
Copy link

guorong009 commented Oct 2, 2024

@adria0
First of all, thx for all research and proposal!

Here is my idea about your proposal:

  1. Remove SerdeObject

Why we need this? If we want to quick process data between trusted parties we have UncompresseEncoding::from_uncompressed_unchecked (that is in the order of magnitude of SerdeObject decodings), if not we can use GroupEncoding::from_bytes that fits into standards.

I think this SerdeObject is for field stuff, while UncompressedEncoding is for curve one.
https://github.com/duguorong009/halo2curves/blob/b753a832e92d5c86c5c997327a9cf9de86a18851/derive/src/field/mod.rs#L485
Maybe, they do not bring any meaningful difference in underlying implementation.
But, I think it's odd to implement UncompressedEncoding trait for any field.

  1. Follow Arkworks standard
    I think @davidnevadoc did some research on serialization in Arkworks, in previous works.

  2. Implement serde for Field, is not implemented
    I think SerdeObject is for this purpose.
    They help to convert field to bytes and vice versa.

  3. Rename EndianRepr::[from, to]_bytes
    Do we need this renaming?
    As you said, these are for internal use.

Overall, I think we first need documentation.
The documentation, I mean, is both of one explaining all current traits(role, use, ...), and one of planning how to improve.

@davidnevadoc
Copy link
Contributor

davidnevadoc commented Oct 2, 2024

Thanks @adria0 for the in depth description of the issue!

I think we should tackle fields and curve serialization separately:

For curves:

  • GroupEncoding and UncompressedEncoding are traits meant for curves (groups), not for fields.
  • Curve point serialization is, with the exception of compression, little more than serializing field elements. So choices such as endianness should be made at the field level, and the curves just inherit them.

For fields, let's stick to prime fields only for now ( field extensions can be represented in multiple ways and that is a bigger problem that exceeds the scope of this issue).

  • I agree with @adria0 point on SerdeObject. I think it is one of the main points of redundancy and it may need to be removed.
    In my view, its name makes it easy to confuse with the usual serde::Serialize and serde::Deserialize traits.
    Then, the doc that explains its functionality states:
/// Trait for converting raw bytes to/from the internal representation of a type.
/// For example, field elements are represented in Montgomery form and serialized/deserialized without Montgomery reduction.

I can't see why this kind of access needs to be exposed publicly. Specially when Montgomery/non-Montgomery form introduces
one of the main sources of confusion when it comes to field representation.

  • I also agree with point made about from_raw. Unfortunately this is not a clear name, raw could mean anything.
    Is it raw bytes? raw limbs/ u64s? Montgomery or non-Montgomery?.
    I think just the opposite about from_uniform_bytes. I think it is crystal-clear that the input for this function is a series of random bytes and therefore, it has to be transformed into a field element w/o validity checks.

  • from_mont and to_mont look to me like good names. (perhaps replacing SerdeObject?)

Lastly, I agree 100% with @guorong009 remark about documentation. Whichever design we land on, we should make an effort to have it well documented to prevent more confusion.

@adria0
Copy link
Member Author

adria0 commented Oct 4, 2024

Round two,

proposed changes

  • Remove SerdeObject

documentation

Field Encodings

  • from_bytes/to_bytes: They use an industry-standard format that is consistent with how curves are encoded. This format is what will be used internally by the Serde library to ensure interoperability. Provides a unified format for both field and curve serialization. Ensures a consistent, industry-standard serialization, using big or little endian depending on the curve
  • from_mont/to_mont: These methods convert elements to and from the Montgomery form, which is an internal representation that is commonly used for efficient field arithmetic. Use these when working specifically with Montgomery-reduced values, especially in cryptographic computations.
  • from_raw: Creates a field element from a raw integer (typically limbs or u64s). Use this method when directly converting an integer value into a field element.
  • from_uniform_bytes: Converts a uniform random byte array into a valid field element. This is particularly useful in scenarios requiring a random element in the field, such as in cryptographic protocols or when hashing to the field.

Curve Encodings

  • GroupEncoding trait methods: Implements the serialization and deserialization of curve points in a compressed format. Compression is slower but generates standardized encodings that are smaller in size. Suitable for storing and transmitting curve points efficiently. Serde will use this.
  • UncompressedEncoding trait methods: Provides faster serialization/deserialization of curve points in an uncompressed format. The output is larger than that of GroupEncoding, but it's quicker to generate. When speed is prioritized over size.

Notes:

  • from_bytes, to_bytes from EndianRepr trait is only intended for internal use only. Do not use.

@adria0
Copy link
Member Author

adria0 commented Oct 4, 2024

Related #39

@adria0 adria0 linked a pull request Oct 4, 2024 that will close this issue
@davidnevadoc
Copy link
Contributor

davidnevadoc commented Oct 4, 2024

I tried to new version of the serialization where SerdeObject is removed in halo2, and found out that there are more serialization traits!
SerdePrimeField for example, implemented in Halo2...
Concretely, we should decide what to do with the parts of serialization implemented in halo2 (see https://github.com/privacy-scaling-explorations/halo2/blob/main/halo2_backend/src/helpers.rs)
Should we bring this kind of functionality to halo2curves ?

@adria0
Copy link
Member Author

adria0 commented Oct 7, 2024

I tried to new version of the serialization where SerdeObject is removed in halo2, and found out that there are more serialization traits! SerdePrimeField for example, implemented in Halo2... Concretely, we should decide what to do with the parts of serialization implemented in halo2 (see https://github.com/privacy-scaling-explorations/halo2/blob/main/halo2_backend/src/helpers.rs) Should we bring this kind of functionality to halo2curves ?

I think that implementing this helper in halo2curves, just adds more confusion, a library only has to provide one way to do the same thing.

I also question, why does this helper exists? It's just a simple wrapper. Exists maybe because halo2curves serialization was not really documented and was a way for implementer to clarify what can be done in halo2curves? I do not see this usefull in Halo2 in terms of cryptographic protocol.

@guorong009
Copy link

@adria0 @davidnevadoc
I've done some research why these helpers SerdeFormat(halo2), SerdeObject(halo2curves) are there.
The SerdeObject trait was introduced in this PR. 5bcc891
The SerdeFormat was introduced in this PR. privacy-scaling-explorations/halo2#111
All of work was done by @jonathanpwang .
Before that, there was no concept of Serde* in the halo2 and halo2curves.
For example, the zcash/halo2 has only one option - CurveRead(compressed). https://github.com/zcash/halo2/blob/main/halo2_proofs/src/helpers.rs

I believe we should check why @jonathanpwang added these.
If those traits & helpers do not have specific purpose we don't know, we can remove them, as @adria0 mentioned.

@jonathanpwang
Copy link
Contributor

jonathanpwang commented Oct 12, 2024

The reason they were added is that previously the default in halo2 and halo2curves was that serde always serialized and deserialized to the human-readable canonical representation of a BigInt as some u64 digits. However the internal struct stores the BigInt primes in Montgomery form. Therefore the serialization and deserialization was doing Montgomery reduction each time, which is not desirable if the serialization is being used for raw storage purposes that don't need human readability. For halo2 this was particularly relevant for proving keys.

So Serde* was created as a way to offer these separate "raw" (de)serde methods without interfering with the existing human readable serde methods.

I haven't been up to date on the current situation, so I can't tell if they are needed anymore, but this was the original motivation.

Edit: SerdeObject can be safely removed if you are able to define from_mont and to_mont somewhere else. The Serde* traits in halo2 was purely a way to extend traits beyond ff::Field. SerdeFormat is an enum to toggle between the different serialization methods.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants