Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize field presence tracking of default/optional/extended fields #324

Merged
merged 26 commits into from
Oct 25, 2024

Conversation

Nicceboy
Copy link
Contributor

@Nicceboy Nicceboy commented Sep 14, 2024

This assumes that #323 is merged first.

Field count of Sequence/Set is possible to know in compile-time and this will make such a change.
Because of that, it possible to use map with correct size in stack without any heap allocations.

It will increase performance quite a bit: (the screenshot is against alloc PR branch)

There is also untested scenario, which causes bug with HashMap (surprise...), this also adds test for that. It will probably apply for PER too.

Further optimization likely requires creating own data structure and improve the logic of field_bitfield usage. For some reason it still adds more overhead than it should.

If this seems good, I can make the change for PER too.

image

@Nicceboy
Copy link
Contributor Author

Nicceboy commented Sep 14, 2024

I also added dynamic default allocation here, because code was more related to that area.

When initializing the encoder with:

Encoder::<0>::new(enc::EncoderOptions::coer(), core::mem::size_of::<T>());

Allocation is a bit closer to the real requirements and on very complex data structures (for example IEEE 1609.2-2022 https://standards.ieee.org/ieee/1609.2/10258/ this is around 20% performance increase. There isn't any other good defaults I guess which does not come with downsides.

src/enc.rs Outdated
///
/// Const `N` is the count of root components in sequence or set.
/// Generic `C` is the sequence value.
fn encode_sequence<const N: usize, C, F>(
Copy link
Collaborator

@XAMPPRocky XAMPPRocky Sep 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't N just be C::FIELDS.len() ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently you can't, the problem is similar as in #318

You will get a compiler error:

error[E0401]: can't use generic parameters from outer item
   --> src/oer/enc.rs:849:28
    |
840 |     fn encode_sequence<const N: usize, C, F>(
    |                                        - type parameter from outer item
...
849 |         const LEN: usize = C::FIELDS.len();
    |                            ^^^^^^^^^ use of generic parameter from outer item
    |
    = note: a `const` is a separate item from the item that contains it

error: generic parameters may not be used in const operations
   --> src/oer/enc.rs:850:39
    |
850 |         let mut encoder = Encoder::<{ C::FIELDS.len() }>::new(
    |                                       ^^^^^^^^^ cannot perform const operation using `C`
    |
    = note: type parameters may not be used in const expressions

@Nicceboy
Copy link
Contributor Author

Nicceboy commented Sep 15, 2024

Just a reference flamegraph why I have been so interested about this one field:

image

For example, if we encode Integer 32 866 times as part of the sequence, the field bitmap is extended 12 217 times in Integer bench test.

@Nicceboy
Copy link
Contributor Author

I think I will try to replace the use of LinearMap with own data structure or try to track the presence in different way, so this change would be also temporal. However, the constant field count is always required.

@kurkpitaine
Copy link
Contributor

kurkpitaine commented Sep 28, 2024

I encountered a case where using HashMap for OER Encoder::field_bitfield produces the wrong preamble for a sequence type containing optional fields. The iterator over HashMap visits values in arbitrary order, which results in an erroneous preamble since the order of elements is not respected. The preamble construction algorithm assumes the field_bitfield is ordered as the elements of the sequence, but HashMap usage breaks this. LinearMap iterator has the same behavior so this will not fix my issue.
Also, this might be the cause (didn't dig) of another issue I have with an UPER encoding.
Do we rollback to a BTreeMap, which is ordered, or update the preamble construction algorithm ?

@Nicceboy
Copy link
Contributor Author

I encountered a case where using HashMap for OER Encoder::field_bitfield produces the wrong preamble for a sequence type containing optional fields. The iterator over HashMap visits values in arbitrary order, which results in an erroneous preamble since the order of elements is not respected. The preamble construction algorithm assumes the field_bitfield is ordered as the elements of the sequence, but HashMap usage breaks this. LinearMap iterator has the same behavior so this will not fix my issue. Also, this might be the cause (didn't dig) of another issue I have with an UPER encoding. Do we rollback to a BTreeMap, which is ordered, or update the preamble construction algorithm ?

I am currently working with custom data structure so that we could avoid them completely. Can you provide some test case for your issue so I could verify it? There is one new test case for COER in the end of coer.rs, which seems to work with LinearMap for me.

@kurkpitaine
Copy link
Contributor

This appears when I COER encode a ToBeSignedCertificate, with some app_permissions. I can provide a test case later if necessary.

@Nicceboy
Copy link
Contributor Author

This appears when I COER encode a ToBeSignedCertificate, with some app_permissions. I can provide a test case later if necessary.

Ah, no need for that. The same standard which the new test case is based. I will look into it.

@Nicceboy
Copy link
Contributor Author

I made a PR #338 to fix this in a meantime. Finding a solution which would resolve all potential performance bottlenecks and hopefully removing needs to look this again seems to take some time.

@Nicceboy
Copy link
Contributor Author

There is one potential solution which makes Field and Fields types completely constant and moves the presence tracking to different layer, but that introduces quite many generics, if that is okay.

It should be almost as fast as it gets, and the improvement is quite big:

image

@XAMPPRocky
Copy link
Collaborator

but that introduces quite many generics, if that is okay.

Can you expand on what you mean by "many generics", can you show where the impact would be (not the full code but like examples), particularly for public APIs?

@Nicceboy
Copy link
Contributor Author

Nicceboy commented Sep 30, 2024

but that introduces quite many generics, if that is okay.

Can you expand on what you mean by "many generics", can you show where the impact would be (not the full code but like examples), particularly for public APIs?

There is no impact for major public APIs I think. I haven't tried yet all the cases I had in mind.

Constructed trait would change as

/// A `SET` or `SEQUENCE` value.
/// `RL` is the number of fields in the "root component list".
/// `EL` is the number of fields in the list of extensions.
pub trait Constructed<const RL: usize = 0, const EL: usize = 0> {
    /// Fields contained in the "root component list".
    const FIELDS: super::fields::Fields<RL>;
    /// Whether the type is extensible.
    const IS_EXTENSIBLE: bool = false;
    /// Fields contained in the list of extensions.
    const EXTENDED_FIELDS: Option<super::fields::Fields<EL>> = None;
}

Fields type would be as:

/// Represents all of the values that make up a given value in ASN.1.
#[derive(Debug, Clone)]
pub struct Fields<const N: usize> {
    fields: [Field; N],
}

As a result, both encode_sequence/encode_set and decode methods would get generics, as well as the encoder trait itself and the implementing encoders:

E.g. encoder trait:

/// A **data format** encode any ASN.1 data type.
/// Const `RC` is the count of root components in sequence or set, EC about extended list.
pub trait Encoder<const RC: usize = 0, const EC: usize = 0> {
    type Ok;
    /// The associated error type returned on failure.
    type Error: Error + Into<crate::error::EncodeError> + From<crate::error::EncodeError>;
    type AnyEncoder<const R: usize, const E: usize>: Encoder<RC, EC, Ok = Self::Ok, Error = Self::Error>
        + Encoder;

And encode sequence would look like

    fn encode_sequence<const R: usize, const E: usize, C, F>(
        &mut self,
        tag: Tag,
        encoder_scope: F,
    ) -> Result<Self::Ok, Self::Error>
    where
        C: crate::types::Constructed<R, E>,
        F: FnOnce(&mut Self::AnyEncoder<R, E>) -> Result<(), Self::Error>;

And overall, this would make it possible to use static arrays everywhere and making it possible to construct logic for bitfield presence tracking with proc-macros only for default/optional/extended types, which would expand into code like:

#[derive(AsnType, Debug, Decode, Encode, Clone, PartialEq, PartialOrd, Eq, Ord, Hash)]
#[rasn(automatic_tags)]
#[non_exhaustive]
pub struct ExtendedOptional {
    pub value: Integer,
    pub integer1: Option<Integer>,
    pub octet1: Option<OctetString>,
    pub integer2: Option<Integer>,
    pub octet2: Option<OctetString>,
    pub integer3: Option<Integer>,
    pub octet3: Option<OctetString>,
    #[rasn(extension_addition)]
    pub integer4: Option<Integer>,
    #[rasn(extension_addition)]
    pub octet4: Option<OctetString>,
    #[rasn(extension_addition)]
    pub integer5: Option<Integer>,
    #[rasn(extension_addition)]
    pub octet5: Option<OctetString>,
}


#[allow(clippy::mutable_key_type)]
impl rasn::Encode for ExtendedOptional {
    fn encode_with_tag_and_constraints<'constraints, EN: rasn::Encoder>(
        &self,
        encoder: &mut EN,
        tag: rasn::types::Tag,
        constraints: rasn::types::Constraints<'constraints>,
    ) -> core::result::Result<(), EN::Error> {
        #[allow(unused)]
        let __rasn_field_value = &self.value;
        #[allow(unused)]
        let __rasn_field_integer1 = &self.integer1;
        #[allow(unused)]
        let __rasn_field_octet1 = &self.octet1;
        #[allow(unused)]
        let __rasn_field_integer2 = &self.integer2;
        #[allow(unused)]
        let __rasn_field_octet2 = &self.octet2;
        #[allow(unused)]
        let __rasn_field_integer3 = &self.integer3;
        #[allow(unused)]
        let __rasn_field_octet3 = &self.octet3;
        #[allow(unused)]
        let __rasn_field_integer4 = &self.integer4;
        #[allow(unused)]
        let __rasn_field_octet4 = &self.octet4;
        #[allow(unused)]
        let __rasn_field_integer5 = &self.integer5;
        #[allow(unused)]
        let __rasn_field_octet5 = &self.octet5;
        let mut fields = Self::FIELDS;
        let mut extended_fields = Self::EXTENDED_FIELDS;
        encoder
            .encode_sequence::<7usize, 4usize, Self, _>(tag, |encoder| {
                self.value.encode_with_tag(
                    encoder,
                    rasn::types::Tag::new(rasn::types::Class::Context, 0usize as u32),
                )?;
                if self.integer1.is_some() {
                    fields.set_field_present_by_index(1usize);
                }
                self.integer1.encode_with_tag(
                    encoder,
                    rasn::types::Tag::new(rasn::types::Class::Context, 1usize as u32),
                )?;
                if self.octet1.is_some() {
                    fields.set_field_present_by_index(2usize);
                }
                self.octet1.encode_with_tag(
                    encoder,
                    rasn::types::Tag::new(rasn::types::Class::Context, 2usize as u32),
                )?;
                if self.integer2.is_some() {
                    fields.set_field_present_by_index(3usize);
                }
                self.integer2.encode_with_tag(
                    encoder,
                    rasn::types::Tag::new(rasn::types::Class::Context, 3usize as u32),
                )?;
                if self.octet2.is_some() {
                    fields.set_field_present_by_index(4usize);
                }
                self.octet2.encode_with_tag(
                    encoder,
                    rasn::types::Tag::new(rasn::types::Class::Context, 4usize as u32),
                )?;
                if self.integer3.is_some() {
                    fields.set_field_present_by_index(5usize);
                }
                self.integer3.encode_with_tag(
                    encoder,
                    rasn::types::Tag::new(rasn::types::Class::Context, 5usize as u32),
                )?;
                if self.octet3.is_some() {
                    fields.set_field_present_by_index(6usize);
                }
                self.octet3.encode_with_tag(
                    encoder,
                    rasn::types::Tag::new(rasn::types::Class::Context, 6usize as u32),
                )?;
                if self.integer4.is_some() {
                    if let Some(ref mut ext_fields) = extended_fields {
                        ext_fields.set_field_present_by_index(7usize);
                    }
                }
                encoder.encode_extension_addition(
                    rasn::types::Tag::new(rasn::types::Class::Context, 7usize as u32),
                    <Option<Integer> as rasn::AsnType>::CONSTRAINTS
                        .override_constraints(rasn::types::Constraints::default()),
                    &self.integer4,
                )?;
                if self.octet4.is_some() {
                    if let Some(ref mut ext_fields) = extended_fields {
                        ext_fields.set_field_present_by_index(8usize);
                    }
                }
                encoder.encode_extension_addition(
                    rasn::types::Tag::new(rasn::types::Class::Context, 8usize as u32),
                    <Option<OctetString> as rasn::AsnType>::CONSTRAINTS
                        .override_constraints(rasn::types::Constraints::default()),
                    &self.octet4,
                )?;
                if self.integer5.is_some() {
                    if let Some(ref mut ext_fields) = extended_fields {
                        ext_fields.set_field_present_by_index(9usize);
                    }
                }
                encoder.encode_extension_addition(
                    rasn::types::Tag::new(rasn::types::Class::Context, 9usize as u32),
                    <Option<Integer> as rasn::AsnType>::CONSTRAINTS
                        .override_constraints(rasn::types::Constraints::default()),
                    &self.integer5,
                )?;
                if self.octet5.is_some() {
                    if let Some(ref mut ext_fields) = extended_fields {
                        ext_fields.set_field_present_by_index(10usize);
                    }
                }
                encoder.encode_extension_addition(
                    rasn::types::Tag::new(rasn::types::Class::Context, 10usize as u32),
                    <Option<OctetString> as rasn::AsnType>::CONSTRAINTS
                        .override_constraints(rasn::types::Constraints::default()),
                    &self.octet5,
                )?;
                encoder.set_presence_bits::<7, 4>(fields, extended_fields);
                Ok(())
            })
            .map(drop)
    }
}

Only negative approach in this is (on top of added code complexity), that all encoders track the presence of used default/optional/extended fields while they might not need it.

@Nicceboy Nicceboy changed the title Replace HashMap with LinearMap (Only OER for now) Optimize field presence tracking of default/optional/extended fields Sep 30, 2024
@Nicceboy Nicceboy marked this pull request as draft September 30, 2024 15:09
@Nicceboy Nicceboy force-pushed the oer-linear-map branch 6 times, most recently from 1457693 to 9fcca73 Compare October 22, 2024 11:27
@Nicceboy
Copy link
Contributor Author

I think this can finally be reviewed.
There are a lot of changes, but almost all are directly connected.
Previous example struct and macro generated code can be ignored, as I moved all the logic into the encoders.

I also added optional and extended fields for the integer.rs test to better see the overall effect (against current main), which on my machine results in around a 3x increase in performance for OER encoding. Also, PER is changed and gets a boost. The effect is smaller as its overall role is currently smaller.

image

Summary of the changes (also notes for me)

Allocations are currently the biggest hindrance for performance and we can reduce them with fixed-size arrays. If we can pass the field counts of Sequences/Sets, there are many places where these constants are useful for defining arrays (e.g. premable construction, arrays for bitfield tracking, extensions' field data reference array). So the following should change to achieve that:

  • Field and Fields type are now completely constant as described earlier and everything related is computed on compile-time.
  • Constructed trait will use associated constants for Field counts
    • Add also IS_EXTENSIBLE const for easier detection of extensibility.
      • Fields can be absent but type can be still extensible.
  • proc-macros will generate and pass the field counts to the encode_sequence, encode_set decode_sequence and decode_set types.
    • In order to benefit from that (and use recursively), also encoder and decoder must have place for these constants, so it can use them globally and create a new recursive instance of encoder with different contant values.
      • Add type AnyEncoder<const R: usize, const E: usize>: Encoder<RC, EC, Ok = Self::Ok, Error = Self::Error> + Encoder; for Encoder trait so that recursive encoder with different constants can be created. (also in decoder)
  • Count the presense of optional and extended fields during encoding only when required
    • Basically add presence when encoding some or extension addition or some/none, no overhead anywhere else.
    • Get the SET canonised order for preamble by relying on the Ord of Tag.
    • Also removes the index tracking as it introduces "global" complex mutable state and is not needed if we rely on the field order otherwise.
    • I tried to preserve everything on the encoder in the end, so any custom implementation of the encoding methods does not bring issues as long as the field order is preserved.
  • There was a bug when decoding SET with OER - fields were not canonised for preamble.
  • Add is_present method for AsnType trait to see if the type is absent (easier detection of the presence of the optional values, especially on optional extended fields). Might not be needed in the future, see below.

Notes for the future:

  • If we add those encode_optional_extension_addition_ et. al. methods, field presence tracking can still be slightly simplified.
  • If I do not do the new PR soon, use of RC reference counter for output buffer makes reusing the same buffer significantly easier for recursive encoders (especially on encode_sequence). Lifetimes do not want to work, I will still give a try. Would be another 2x speedup at least for OER.
  • PER might need a larger rework on logic how it currently uses buffers to get a higher improvement for performance

@Nicceboy Nicceboy marked this pull request as ready for review October 23, 2024 22:01
Copy link
Collaborator

@XAMPPRocky XAMPPRocky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just some doc comments and nitpicks. Well done on this!

src/de.rs Show resolved Hide resolved
src/de.rs Show resolved Hide resolved
src/enc.rs Outdated Show resolved Hide resolved
src/enc.rs Outdated Show resolved Hide resolved
src/enc.rs Outdated Show resolved Hide resolved
src/oer/de.rs Outdated Show resolved Hide resolved
src/oer/de.rs Outdated Show resolved Hide resolved
Nicceboy and others added 2 commits October 25, 2024 19:30
Co-authored-by: XAMPPRocky <4464295+XAMPPRocky@users.noreply.github.com>
@XAMPPRocky XAMPPRocky merged commit b8d294e into librasn:main Oct 25, 2024
19 of 65 checks passed
@github-actions github-actions bot mentioned this pull request Nov 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants