Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filter out APKINDEX entries that don't contribute any CVEs #2

Open
luhring opened this issue Nov 14, 2023 · 11 comments
Open

Filter out APKINDEX entries that don't contribute any CVEs #2

luhring opened this issue Nov 14, 2023 · 11 comments
Labels
enhancement New feature or request

Comments

@luhring
Copy link
Contributor

luhring commented Nov 14, 2023

Description

In an effort to further increase the CVEs-to-bytes ratio, we could consider a scanner-informed filter pass that removes DB entries that aren't pulling their weight in CVEs...

@luhring luhring added the enhancement New feature or request label Nov 14, 2023
@jspeed-meyers
Copy link

I don't see this feature in the priorities spreadsheet. Who's the PM for this?

@lyoung-confluent
Copy link
Contributor

@jspeed-meyers I think it's important that we spend time bikeshedding about which vulnerability scanner would be used as the source of truth instead of discussing the feature request itself

@lyoung-confluent
Copy link
Contributor

I was curious what % of packages contribute at least one vulnerability to the final count. By my math, as of today, 7908 unique (package, version) tuples contribute at least one finding:

$ grype -o json ghcr.io/chainguard-dev/maxcve/maxcve > scan.json
$ jq -r '.matches[].matchDetails[].searchedBy.package | "\(.name):\(.version)"' scan.json | sort -u > vulnerable.list
$ wc -l vulnerable.list
    7908 vulnerable.list

Comparing this number to the total number of packages inside the image we get 48729:

$ skopeo copy docker://ghcr.io/chainguard-dev/maxcve/maxcve dir:maxcve
$ tar -Oxf maxcve/8c4dde6aca507207b07666b9ab641592084ea0c81a887a1dd60de51438c296a7
| grep -c "^P:"
48729

We can infer that an image containing only vulnerable packages would be ~16% of the size of the current image.

@lyoung-confluent
Copy link
Contributor

If we assume that the majority of vulnerabilities (i.e. anything not in Wolfi secdb) are being matched by vulnerability scanners based on the version and origin (upstream) property from each APK package we could reduce the number of unique package versions by de-duplicating each -r<epoch> version.

For example, if there is version 1.2.3-r0, 1.2.3-r1, 1.2.3-r2 and 1.2.3-r3 of a package, we could include only the 1.2.3-r0 version which presumably contains all of the vulnerabilities from the 1.2.3 upstream package.

@jspeed-meyers
Copy link

@jspeed-meyers I think it's important that we spend time bikeshedding about which vulnerability scanner would be used as the source of truth instead of discussing the feature request itself

That's an excellent point. @luhring--Can you please weigh in on which scanner's results best approximate "truth"?

@jspeed-meyers
Copy link

We can infer that an image containing only vulnerable packages would be ~16% of the size of the current image.

Yet another reason the lack of a PM is troubling. Is the design goal to make the image have as many CVEs as possible and to have as many packages as possible? Or to make the image have as many CVEs per package as possible? Without a thorough PRD, we'll be stuck here forever.

YOU PEOPLE CALL THIS A LAUNCH?????

@imjasonh -- Please advise.

@imjasonh
Copy link
Member

imjasonh commented Apr 2, 2024

Apologies for not discussing this thoroughly enough during the (extensive!) design and approval phase.

@jspeed-meyers you're right to feel that this launch was rushed, and that the proper care and diligence wasn't demonstrated, in accordance with the seriousness of this work. I can only hope to do better in future launches.

However, as a silver lining, I wonder if we've exposed a loophole which might enable infinite CVEs! If scanners are looking for any version < $fixed-version then we may be able to have the image report infinitely many versions below the fixed version -- e.g., if a CVE was fixed in foo@1.2.3, we can report the image includes foo@1.2.2-r0, foo@1.2.2-r1, foo@1.2.2-r2, and so on, as well as foo@1.2.1-r*, 1.2.0-r*, etc.

("infinite" here is misleading; in reality we're constrained by the limits of epoch numbers -- int64s -- and the available space of unfixed versions -- nominally three int64 version components, possibly 4?)

The question of vuln density then becomes how compactly we can encode and compress the existence of a vulnerable package-version.

Clearly this warrants further study.

@luhring
Copy link
Contributor Author

luhring commented Apr 2, 2024

Depending on how the given scanner parses the installed DB, you might not need to find unique epoch numbers at all, and could instead repeat a given known vulnerable version's entry as many times as you want.

@luhring
Copy link
Contributor Author

luhring commented Apr 2, 2024

But this starts us on a dark path, where to push the CVEs-to-bytes ratio to the limit, you'd simply repeat the DB entry with the highest CVEs-to-bytes value in place of incorporating the entire catalog of packages with their varied and inferior CVEs-to-bytes values.

@jspeed-meyers
Copy link

But this starts us on a dark path, where to push the CVEs-to-bytes ratio to the limit, you'd simply repeat the DB entry with the highest CVEs-to-bytes value in place of incorporating the entire catalog of packages with their varied and inferior CVEs-to-bytes values.

Whatever the PRD says is what we should do.

@lyoung-confluent
Copy link
Contributor

("infinite" here is misleading; in reality we're constrained by the limits of epoch numbers -- int64s -- and the available space of unfixed versions -- nominally three int64 version components, possibly 4?)

It appears that at least within grype they are using go-apk-version which does not actually parse the revision into an int64, instead leaving it as a string so we are not bound by such silly limits: https://go.dev/play/p/IpZRv7ID-cE 😅

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants