-
-
Notifications
You must be signed in to change notification settings - Fork 86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Track repository_url #495
base: master
Are you sure you want to change the base?
Track repository_url #495
Conversation
Signed-off-by: Prabhu Subramanian <prabhu@appthreat.com>
Signed-off-by: Prabhu Subramanian <prabhu@appthreat.com>
Signed-off-by: Prabhu Subramanian <prabhu@appthreat.com>
Signed-off-by: Prabhu Subramanian <prabhu@appthreat.com>
Making this a draft to try another approach. Please feel free to use the PR for testing purposes. Update: Ready for review. |
… up confidence Signed-off-by: Prabhu Subramanian <prabhu@appthreat.com>
@hboutemy @stevespringett Would appreciate a review. |
IIUC, repository_url is an origin repository url for a dependency? from Maven experience on such feature for dependencies reports, if a repository manager is used, it's not possible to identify the origin repository of a dependency: that's why we removed the feature a few years ago I don't think this is feasible in a consistent way, that's a general Maven issue, shared by every plugin that tried to do that |
@hboutemy Thank you for the comments. If a repository manager is used and if we get the url to the manager instance that is still a useful information for enterprise customers. Currently, there is no visibility about either the repository manager or the origin. |
I just looked more at the example SBOM given, here is an example component with the new data that I see
I see 2 parts:
let's discuss the |
Thanks @hboutemy. The repository_url is added when maven resolves and pulls the jar from a remote repository and not from a local cache. There is an if condition to ensure this is the case.
When executing this PR branch with a custom repository proxy and maven repo directory using
|
@hboutemy any feedback based on my last comment? |
as you can see from your second example generated SBOM, the resulting repository_url is the url of a proxy = something unusable (I'm even surprised by the IP in http://0.0.0.0:8080/releases value) I understand the dream of this PR, but this is a dream: Maven cannot do that reliably Thinking about it, if you want to work on this aspect, see #245, I'd propose to focus on the distribution-intake external reference: from intake of a dependency, defining the distribution external reference of that dependency can make sense and please, when sharing example, please share first a simple example before sharing complex ones: simple is to define precisely the feature, while complex is useful to see at scale |
@hboutemy I disagree with the word unusable. It is showing all the packages that got downloaded from a private internal repository (happens to run on localhost). It gives the confidence that there was no local cache (could be malicious) that was used. From the administrative settings of the internal registry, we can find the upstream public repository that was used for caching and it is absolutely fine if the SBOM tool doesn't have this information. The second part of the PR sets identity evidence, which IMHO, is quite important for any SBOM tool. |
Apologies for the delayed message. I was hoping the issue might resolve itself, but it seems the PR has stalled. I believe it’s important to move this forward to provide improved transparency for users. I am not an expert in PURL, so I would greatly appreciate any feedback from others on this matter. Based on my understanding, the ideal PURL structure should look something like this ?:
Also, I think including evidence is particularly valuable when dependencies are sourced from local caches or other repositories, as these sources can be susceptible to manipulation by attackers. However, it would be beneficial to include evidence for all components including from maven central, providing more detailed information on how each component's identity was concluded. By incorporating evidence, we offer transparency regarding the techniques and data sources used to establish component identity, thereby enhancing the reliability and trustworthiness of the SBOM. @stevespringett @jkowalleck @pombredanne @skhokhlov @aloubyansky I would be grateful for your input and thank you in advance for your guidance. |
A little disclaimer: I'm just a community member, and don't consider myself an expert. But given you asked :)
I think it'd be great to clarify it further when it comes to SBOMs for Maven dependencies. There could be a couple of perspectives. An artifact producer deploys artifacts to specific Maven repositories, which, I guess, should be considered the official distributions of those artifacts. Now there is a consumer perspective. As @hboutemy mentioned, it's not always possible to track the original repository an artifact was downloaded from in case mirrors and/or proxies are involved. Sometimes it will be possible, sometimes it won't be. In many cases it won't be. In this case, from the supply chain perspective, it could be useful to record from where a given artifact was downloaded. It seems like it would fit well the |
This is potentially very nice feature. However, I have some considerations about data consistency and reproducibility.
|
Maven resolver exposes this info, so I'd expect the remote repo to be properly recorded. I haven't tested the changes in this PR though.
The resolver reports the repository that was used to resolve the artifact, i.e. the first one that successfully resolved it. So this would basically be manifesting supply chain from the build tool perspective with possibly lacking information about the original distribution repositories of dependencies. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maven resolver exposes this info, so I'd expect the remote repo to be properly recorded. I haven't tested the changes in this PR though.
this lacks a unit test showing expected result: please add
this will also will permit further work to clarify edge cases: yes, the effective remote repository is properly recorded
effective means "downloaded through your Nexus proxy" when you use a mrm = something that everybody should do to lower pressure on Maven Central, and that any company also should do: I can expect lack of mrm only from personal developers working at home
and also please remove unrelated changes on DefaultModelConverter, which require a separate dioscussion
* @param component Component for which the evidence needs to be attached | ||
* @param artifact Maven artifact | ||
*/ | ||
private static void addComponentEvidence(final Component component, final Artifact artifact) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all that code is unrelated to repository_url tracking: please remove, unrelated
changing title is easy, but if we do that, we start the discussion here instead of doing it on #494 or a new issue based on it: there are so many questions on all the evidence model, how it maps to build tools, how it is done in other ecosystems are you sure you want to start here? and mix? |
Just a remark from the side: How to make reviewer's/maintainer's life easy:
These simple rules make reviews easy. But more important: they make change-management easy. My 2 cents from one maintainer to another maintainer: |
Supersedes #494 by also setting component evidence for 1.5 spec.
With this PR, purl for the components would include
repository_url
qualifier for repositories that is nothttps://repo.maven.apache.org/maven2
as per purl spec.Example BOM generated for the repo https://github.com/eclipse-jkube/jkube is attached.
bom.xml.txt
bom.json.txt
Things to clarify
package-url/purl-spec#303