-
-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FlatMerge should deduplicate "equivalent" entries and squash their data #245
base: main
Are you sure you want to change the base?
Conversation
…y (inherit equality methods, adapt MergeWith()) Signed-off-by: Jim Klimov <jimklimov@gmail.com>
…s for different traces Signed-off-by: Jim Klimov <jimklimov@gmail.com>
…ity.KnownEntityTypeProperties[] Signed-off-by: Jim Klimov <jimklimov@gmail.com>
…ipping Signed-off-by: Jim Klimov <jimklimov@gmail.com>
…ipping Signed-off-by: Jim Klimov <jimklimov@gmail.com>
…ull) Scope value Signed-off-by: Jim Klimov <jimklimov@gmail.com>
…ipping Signed-off-by: Jim Klimov <jimklimov@gmail.com>
…arious Scope values, refresh comments about spec versions involved Signed-off-by: Jim Klimov <jimklimov@gmail.com>
…methodEquals and methodMergeWith via pre-cached BomEntity static Dicts Signed-off-by: Jim Klimov <jimklimov@gmail.com>
…tyListMergeHelperReflection and BomEntityListReflection Signed-off-by: Jim Klimov <jimklimov@gmail.com>
Signed-off-by: Jim Klimov <jimklimov@gmail.com>
Signed-off-by: Jim Klimov <jimklimov@gmail.com>
… in original JSON) NonNullable properties Signed-off-by: Jim Klimov <jimklimov@gmail.com>
…use default Equals() and Equivalent() implementations Signed-off-by: Jim Klimov <jimklimov@gmail.com>
…ey for BomEntityListReflection[List<type>] Signed-off-by: Jim Klimov <jimklimov@gmail.com>
…operty Signed-off-by: Jim Klimov <jimklimov@gmail.com>
…property Signed-off-by: Jim Klimov <jimklimov@gmail.com>
Signed-off-by: Jim Klimov <jimklimov@gmail.com>
… hit (handled a "tmp" target entry which was equivalent or equal to an incoming "obj" entry) Signed-off-by: Jim Klimov <jimklimov@gmail.com>
…ble" helper properties Signed-off-by: Jim Klimov <jimklimov@gmail.com>
…s", skip "NonNullable" helper properties Signed-off-by: Jim Klimov <jimklimov@gmail.com>
Signed-off-by: Jim Klimov <jimklimov@gmail.com>
…rialize(BomEntity) to rule them all Signed-off-by: Jim Klimov <jimklimov@gmail.com>
…nternal Serialize() implementations Signed-off-by: Jim Klimov <jimklimov@gmail.com>
…mEntity implem) Signed-off-by: Jim Klimov <jimklimov@gmail.com>
…ompact(BomEntity) with minimal markup overhead and use it in BomEntity.Equals() Signed-off-by: Jim Klimov <jimklimov@gmail.com>
…tations for BomEntity itself as the handler for derived classes Signed-off-by: Jim Klimov <jimklimov@gmail.com>
Signed-off-by: Jim Klimov <jimklimov@gmail.com>
Signed-off-by: Jim Klimov <jimklimov@gmail.com>
…ithRefLinkType* interfaces Signed-off-by: Jim Klimov <jimklimov@gmail.com>
…ter/setter methods Could not just declare in interface anticipated getter/setter methods (for zero copy-pasta) - that conflicted with the later generated getters/setter actual methods. Signed-off-by: Jim Klimov <jimklimov@gmail.com>
…eriments Signed-off-by: Jim Klimov <jimklimov@gmail.com>
…latmerge-dedup-squash Signed-off-by: Jim Klimov <jimklimov@gmail.com>
Not yet the complete intended solution, but a large leap in that direction, which took way longer than I had anticipated (or that colleagues are content with). Anyhow, recent dozen or so commits adds a way to "walk" BomEntities (including the Bom object which represents the whole document) to map the "bom-ref"s used to name things, and "ref"s or list items pointing back to these names and so entities. This in turn allows to consistently rename objects in the in-memory representation (which would be the way to ensure uniqueness where due and we decide something is not mergeable despite originally having same "bom-ref" names). As a practical example, the |
Missing DCO is on a revert-commit which is too long ago in history to rebase (had several merges from master since, unraveling the conflicts just to change a commit message via rebase does not seem worthwhile). |
…...} Signed-off-by: Jim Klimov <jimklimov@gmail.com>
…...} Signed-off-by: Jim Klimov <jimklimov@gmail.com>
…some messages to formatting strings, wrap longer others Signed-off-by: Jim Klimov <jimklimov@gmail.com>
…lue, as planned Signed-off-by: Jim Klimov <jimklimov@gmail.com>
…tter/setter; avoid initializing to default value du-jour Signed-off-by: Jim Klimov <jimklimov@gmail.com>
…dd pragma for Sonar) Signed-off-by: Jim Klimov <jimklimov@gmail.com>
Signed-off-by: Jim Klimov <jimklimov@gmail.com>
… returns dictBackrefs) Signed-off-by: Jim Klimov <jimklimov@gmail.com>
Signed-off-by: Jim Klimov <jimklimov@gmail.com>
…ivate Signed-off-by: Jim Klimov <jimklimov@gmail.com>
Signed-off-by: Jim Klimov <jimklimov@gmail.com>
…...} Signed-off-by: Jim Klimov <jimklimov@gmail.com>
…lue, as planned - correct fix Signed-off-by: Jim Klimov <jimklimov@gmail.com>
…erformance even if privatized with getter/setter Signed-off-by: Jim Klimov <jimklimov@gmail.com>
… in (unlikely) case it is null Signed-off-by: Jim Klimov <jimklimov@gmail.com>
…re-initialized or not (and a definitive value wins over compiler default du-jour), because Sonar analyzer wants opposite things Signed-off-by: Jim Klimov <jimklimov@gmail.com>
…WalkResult.bomRoot Signed-off-by: Jim Klimov <jimklimov@gmail.com>
…tainers() wrap Signed-off-by: Jim Klimov <jimklimov@gmail.com>
…BomWalkResult.bomRoot Signed-off-by: Jim Klimov <jimklimov@gmail.com>
Signed-off-by: Jim Klimov <jimklimov@gmail.com>
…ing info exposure Signed-off-by: Jim Klimov <jimklimov@gmail.com>
…BomWalkResult.debugPerformance Signed-off-by: Jim Klimov <jimklimov@gmail.com>
Primarily written as a practical test case for `Bom.WalkThis()` and `Bom.RenameBomRef()` methods introduced in the library, but may be useful to have exposed for end-users. Relies on CycloneDX/cyclonedx-dotnet-library#245 for the bulk of work (BomEntity base-class and interface family, etc.) and CycloneDX/cyclonedx-dotnet-library#256 for metadata update of the output document. --------- Signed-off-by: Jim Klimov <jimklimov@gmail.com>
The proposed PR is the result of hard labor of my past month or so, as I learned a lot about C# along the way, to solve the practical problem that after I merged hundreds of component Bom documents into one to craft a "release SBOM", the resulting document was invalid against the CycloneDX schemas, particularly that many entries were mentioned many times. 26K times too many in fact...
Fixes CycloneDX/cyclonedx-cli#219
Fixes CycloneDX/cyclonedx-cli#188
Fixes #82
Follows up from #199 and #216 AFAIK
@fnxpt kindly pointed me to the work posted and merged in #199 for the library - note that related PR CycloneDX/cyclonedx-cli#305 to actually benefit from the change in the CLI tool is still not merged; my practical work builds upon that.
There was quite a bit of back and forth as I was coercing C# to minimize copy-paste in application classes, so all activities are doable by addressing their new base class
BomEntity
with which they may, but not "must", implement customEquivalent()
,Equal()
andMergeWith()
methods. So there is a lot of reflection, with cached type, property and method information which had really noticeably sped up the millions of loops I've had executed, and some insane tricks to produce desired types at run-time (eh, suddenly Groovy is a very friendly language!) On the upside, this allowed to remove the several identical implementations ofEquals()
,Serialize()
etc. which popped up in earlier PRs since February.A relatively compact example
MergeWith()
implementation can be seen inHash.cs
which on one hand served as a PoC, and on another can notice if someone has several different hash values for the same algorithm.A major code block of such implementation is in the
Component.cs
since its "equivalence" is not "equality" of the whole record but just that of the "BomRef" - and numerous other properties tend to differ most in the data sets I had (different scopes, whose priority order of values is defined in spec; different collections of hashes; sometimes other info), where it loops over the class property lists and has a switch/case to choose further handling logic based on their types, names and other criteria.Finally, the original "faulty" relaxed merging logic which got me into this was redeemed due to its speed: tossing away the first 25K entries that were identical to something already present with a quick and cheap check cost some 3-5 sec on my laptop. Then comes the last pass with the careful
MergeWith()
to squash the remaining ~1.5K entries into 660 ultimately unique entries which have all the original info coming from different corners of the universe in another 10 sec. Given that there is a lot of loop-in-loop parsing involved, the original take to be careful all the way (with each of the incoming 330 BOM documents) was expensive: script took about 2-5 minutes to complete each attempt in different iterations, which made dev-testing poorly feasible. And debugging (with expression breakpoints to catch the interesting errors) took half an hour just to get to a test case with the original wealth of data...I currently do no plan to undertake the re-design of logic which went into the
Component
class to pull it apart into some helper methods (e.g. merge of properties which are lists and the default "un-interesting" property handling) which could be re-used if other classes get to define their non-default merging logic.Some other ideas were to use the ListMergeHelper for list properties in the classes (as part of their
MergeWith()
implem), and to consider making theBom
merge aMergeWith()
implementation too. This one feels difficult due to numerous (Flat, Hierarchical) merge strategies that are available; maybe subclassing with one implem each (and defined equality/equivalence between them) could be an answer. Comments about this have been left in the code.Finally, there may be or not be further entity types that would benefit from a diligent
MergeWith()
, but currently I don't have any on the plate - and my files do not upset the schemas, so...These are all reasonable exercises for some other day, takers welcome. I'm happy this finally works and the burden is off my shoulders (and a half-week sprint task that took an extra month can get closed) so would rest on laurels and sleep for a week ;)