Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generic Data Structures #33

Merged
merged 32 commits into from
Oct 1, 2024
Merged

Generic Data Structures #33

merged 32 commits into from
Oct 1, 2024

Conversation

elazarg
Copy link
Member

@elazarg elazarg commented Apr 14, 2024

Decompose the data structures into domain-specific and generic data structures.

The ds package has zero dependencies, and it knows nothing about numbers or network.

Interfaces (all in interfaces.go):

  • Comparable (Equal, Copy)
  • Hashable (Comparable, Hash)
  • Set (Hashable + set operations)
  • Product of A x B (partitions, left and right projections, swap)
  • TripleSet, convenience (and associativity agnostic) interface for A x B x C

Implementations:

  • HashMap - a simple adapter around a go map, allowing Hashable keys.
  • HashSet
  • MultiMap is a non-injective map. It gives a simple "inverse" on a HashMap.
  • ProductLeft and ProductRight is a map-based implementation of a cartesian product; it implements Product as a one-to-one mapping from sets to sets, where each key-value pairs encodes the cross product of its two items.
  • TripleSetLeft and TripleSetRight and TripleSetOuter are right-, left- and outer-associative implementations of TripleSet.
  • DisjointSum is tagged-union of two sets. Go's generics place some limitations on the implementation; The empty element must always be explicitly passed to the constructor.
  • UnitSet is an idea - using sets of size 1 as the identity element of Product. It's probably not really helpful.

The interval package has interval and intervalset. Nothing particularly new about them.

netp is unchanged.

The netset package is where handles sets of network elements.

  • ICMPSet is largely the one proposed in Add ICMP set #23.
  • IPBlock is adapted to implement Set.
  • TCPUDPSet is a TripleSet of ProtocolSet x PortSet x PortSet
  • TransportSet is a disjoint union of TCPUDPSet + ICMPSet
  • ConnectionSet is a TripleSet of IPBlock x IPBlock x TransportSet

The package connection now contains the state-aware connectionset (named Set), and its json-formatting. In my opinion it should be moved to the analyzer.

Many tests are adapted. but some data structures do not have dedicated tests.

There is no implementation of a dynamically-bounded tuple, since I could not find the proper way to fit that into a generic datastructure with a well defined degenerate edge case.

The diff is probably not useful; it should be easier to read the code.

Copy link
Contributor

@adisos adisos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, added few initial comments.

pkg/netset/connectionset.go Outdated Show resolved Hide resolved
pkg/netset/tcpudpset.go Outdated Show resolved Hide resolved
pkg/ds/interfaces.go Outdated Show resolved Hide resolved
pkg/ds/hashmap.go Outdated Show resolved Hide resolved
pkg/ds/hashmap.go Show resolved Hide resolved
pkg/netset/connectionset.go Show resolved Hide resolved
@elazarg
Copy link
Member Author

elazarg commented Jun 19, 2024

We may need to use ordered maps instead of hash maps, so the lists of partitions are consistently ordered.

Copy link
Contributor

@adisos adisos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

few more questions and comments

pkg/ds/interfaces.go Outdated Show resolved Hide resolved
pkg/ds/interfaces.go Outdated Show resolved Hide resolved
pkg/ds/interfaces.go Show resolved Hide resolved
pkg/ds/interfaces.go Show resolved Hide resolved
pkg/netset/icmpset.go Outdated Show resolved Hide resolved
pkg/netset/connectionset.go Outdated Show resolved Hide resolved
pkg/ds/unitset.go Outdated Show resolved Hide resolved
pkg/ds/tripleset_outer.go Show resolved Hide resolved
pkg/netp/icmp.go Outdated Show resolved Hide resolved
Signed-off-by: Elazar Gershuni <elazarg@gmail.com>
Signed-off-by: Elazar Gershuni <elazarg@gmail.com>
Also, rename NewMap to NewHashMap

Signed-off-by: Elazar Gershuni <elazarg@gmail.com>
Signed-off-by: Elazar Gershuni <elazarg@gmail.com>
@elazarg
Copy link
Member Author

elazarg commented Jun 25, 2024

Added an elaborated README, and changed implementation of MultiMap to use HashSet.

@elazarg
Copy link
Member Author

elazarg commented Jun 26, 2024

BTW, I think the class ICMP is better named ICMPRule, and is a set. The file icmp.go in the netp package should only have functions that are related to ICMP proper. Such change would also solve the need to expose implementation details of the ICMP class to the ICMPSet class.

@elazarg elazarg requested a review from adisos July 3, 2024 15:44
@elazarg
Copy link
Member Author

elazarg commented Jul 3, 2024

@adisos is there anything left for me to implement/document so this PR can be merged?

@elazarg elazarg mentioned this pull request Jul 6, 2024
Copy link
Contributor

@adisos adisos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

few small comments and questions

pkg/ds/hashmap.go Outdated Show resolved Hide resolved
return m.m.IsEmpty()
}

// Size returns the number of unique pairs in the Product object
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the Size documentation is incorrect? it does not return the number of pairs..

Copy link
Member Author

@elazarg elazarg Aug 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand. Maybe the issue here is the notion of "pair" - here it means the mathematical pair of underlying elements, [x, y], not a pair of sets [s1, s2] (usually implicitly encoding s1 x s2, as is the case in the return value of Partitions()).

pkg/ds/interfaces.go Show resolved Hide resolved
pkg/ds/interfaces.go Show resolved Hide resolved
pkg/ds/interfaces.go Show resolved Hide resolved
pkg/ds/interfaces.go Show resolved Hide resolved
pkg/ds/interfaces.go Show resolved Hide resolved
@adisos
Copy link
Contributor

adisos commented Jul 8, 2024

@adisos is there anything left for me to implement/document so this PR can be merged?

added few small comments, still reviewing the tests and some implementation details.

elazarg and others added 3 commits July 8, 2024 15:43
Co-authored-by: Adi Sosnovich <82078442+adisos@users.noreply.github.com>
Signed-off-by: Elazar Gershuni <elazarg@gmail.com>
interval:
* Separate tests from interval_test.go
* Improve documentation.
* Export and set-like functions that are well defined.
* Rename interval.Subtract to interval.SubtractSplit, and add tests.
* Handle empty cases first.
* Preallocate Elements.

intervalset:
* Guard Size() from overflow, and use intervalset.CalculateSize().
* Handle empty cases first.
* Remove String() method, since it is not obvious; clients should implement.

Signed-off-by: Elazar Gershuni <elazarg@gmail.com>
@YairSlobodin1
Copy link
Member

Hi @elazarg Could you please also add the following method?

func (b *IPBlock) LastIPAddress() string {
	return intToIP4(b.ipRange.Max())
}

Again - what is the use case?

@elazarg I'm working on optimizing SGs. I have divided all the rules according to the protocol and the remote and converted them into slices or pairs. In both cases, each unit contains one CIDR. These units are disjoint and do not overlap, thanks to the generic implementation.

It would be great if there was a way to compare two CIDRs from different slices (e.g., rules with the TCP protocol vs. all protocols). Could you please also add a method to check whether two CIDRs are touching?

Spec1:

nif -> 1.1.1.0/31, tcp ports 1-10
nif -> 1.1.1.1, tcp ports 1-20
nif -> 1.1.1.2/31, tcp ports 1-10

In this case two rules are enough:

nif -> 1.1.1.0/30, tcp ports 1-10
nif -> 1.1.1.1, tcp ports 1-20

Spec2:

nif -> 1.1.1.0/31, tcp ports 1-10
nif -> 1.1.1.22/32, tcp ports 1-20
nif -> 1.1.1.23/32, tcp ports 1-10

In this case three rules are sufficient because the CIDRs don't touch

Signed-off-by: Elazar Gershuni <elazarg@gmail.com>
Signed-off-by: Elazar Gershuni <elazarg@gmail.com>
@elazarg
Copy link
Member Author

elazarg commented Sep 2, 2024

Isn't it what you get when you convert a set of CIDRs into an ipblock and then take the partitions of ToCidrList()?

Regardless of this specific case, much of the point of the models data structures is this kind of optimization, so if it's not already supported it should probably be added later, both for specific sets (such as ipblocks) and for products, as in the case of the triple IPBlock x Protocol x Port you describe.

I don't want to add too many features at this point, since already this PR is huge and is a lot of work for @adisos to review. It's better merged first, and updated later.

Signed-off-by: adisos <adisos@il.ibm.com>
@adisos
Copy link
Contributor

adisos commented Sep 11, 2024

I think the new types should implement basic String() functionality, the client code can add other string functionality as needed, but it's also useful for testing, demo, example usage, debugging, etc.. of this pkg as stand alone .
I would at least implement it for netset.connecionset, connection.connection, product, tripleset..

Signed-off-by: adisos <adisos@il.ibm.com>
@elazarg
Copy link
Member Author

elazarg commented Sep 11, 2024

I would at least implement it for netset.connecionset, connection.connection, product, tripleset

For Product and Tripleset this means adding String requirement to the interface of Set.

@elazarg
Copy link
Member Author

elazarg commented Sep 11, 2024

I think the new types should implement basic String() functionality, the client code can add other string functionality as needed

And in the case of Product (and TripleSet), the string format should be less specific to the analyzer. I suggest

{ (L1 x R1) | ... | (Ln x Rn) } for partitions 1...n

So the client - which might not be even aware of this library or its inner working - have an idea what's encoded. Commas can be ambiguous.

Signed-off-by: adisos <adisos@il.ibm.com>
Signed-off-by: adisos <adisos@il.ibm.com>
Signed-off-by: adisos <adisos@il.ibm.com>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have second thoughts about restricting our model to only reason about standard ICMP types and codes.
In both firewalls mechanisms (e.g., NACL) and in actual traffic there is no such restriction.
One may write an NACL rule, denying ICMP traffic with type 20, and although this type is nonstandard, the NACL will do exactly as told: if it observes an ICMP packet with type==20 in its header (possibly tailor-made by an attacker?), the NACL will block this packet.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From synthesis POV, I think the case for allowing nonstandard ICMP is likely to be unintentional. It can be denied by default, supported via a special class.

Which makes me think, for both analysis and synthesis, we can have a union of ICMPset | ICMPInvalid. That may be better than just handling interval set everywhere in that (a) it's trivial to check and work with well defined ICMP messages and (b) it forces the handling of invalid ICMP messages to be explicit.


Regardless, I think this discussion is better deferred to a dedicated PR (before creating a new release) to make the review feasible. I will gadly work on it, if needed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, checking if a given ICMP <code,type> pair is standard can be very useful in multiple places (e.g., analyzer's lint).

I prefer defining StandardICMP as a subset of the full allowed range (0-254 for type, 0-255 for code).

A separate PR for making this change is a good idea.

Signed-off-by: adisos <adisos@il.ibm.com>
Signed-off-by: adisos <adisos@il.ibm.com>
Signed-off-by: adisos <adisos@il.ibm.com>
Signed-off-by: adisos <adisos@il.ibm.com>
Signed-off-by: adisos <adisos@il.ibm.com>
@adisos
Copy link
Contributor

adisos commented Sep 29, 2024

  • Added unit tests for the various types
  • Added String() method to the relevant interfaces and types
  • opened issue generic PR followups  #68 with list of suggested followups after this PR is merged.

elazarg and others added 2 commits September 29, 2024 12:31
Signed-off-by: Elazar Gershuni <elazarg@gmail.com>
Signed-off-by: adisos <adisos@il.ibm.com>
Signed-off-by: adisos <adisos@il.ibm.com>
Signed-off-by: adisos <adisos@il.ibm.com>
@adisos adisos merged commit c4bc6d3 into main Oct 1, 2024
4 checks passed
@adisos adisos deleted the generic branch October 1, 2024 06:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants