-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improvements of exposed data #215
Comments
Hey @ClaudiuCristea, First off, thanks for bringing this up! It's always great to see community members contributing ideas to improve the project. I've got a few reservations on the proposed changes:
I think all revolves around that you'd like to know the base URL and the API type in order to query it, but I believe that task of detecting the API or the underlying technology might be better suited for a dedicated library. I think it's fair to think of This way, instead of relying on a potentially outdated On top of that, most projects are on github.com or gitlab.com, so just by looking at the URL, we can tell where they are from. Can you maybe reuse publiccode-crawler, or adapt it for your needs? I hope these points resonate with your thoughts. If there's anything else in the proposed change that might need attention, I'd be happy to discuss further. Let's keep the collaboration going! |
Thank you for reply. Few remarks: I was looking to https://github.com/alranel/go-vcsurl and that I'm thinking on very similar approach in order to guess the API from the URL. Given all your points, I understand that, exposing the API type, will not going to be supported here. I see most of them valid points. Maybe some are debatable but, yes, that's it, I can live with maintaining my own API guesser. I was looking at the code from https://github.com/alranel/go-vcsurl (though I have zero Go knowledge) but I can't find an answer to my 2nd point: detecting the code hosting platform URL. Or the other way around: detecting the project full-path out of the full URL. Of course, I'm referring to the situation when a GitLab instance is located not on the host root, but in a subdirectory. It seems to me that the code assumes that the code platform is installed directly under the host (which, I agree, are most of the cases). Maybe I'm missing something? |
@claudiu-cristea you're spot on about The thought was to potentially extend go-vcsurl or a similar library to manage these cases. It doesn't have to be go-vcsurl specifically; another library could work just as well. The key is to centralize the logic for detecting platform URLs and extracting project paths, making it more adaptable to different setups. publiccode-crawler and/or go-vcsurl have to be extended regardless if we want new code hosting platforms (italia/publiccode-crawler#132) or plain git URLs (italia/publiccode-crawler#196) I'm not an expert with PHP, but I think there is a way to load a Go library and call it with FFI? |
@bfabio, thank you for clarifying and sorry for late feedback We took a slightly different approach because we're using code hosting platform plugins (e.g. GitHub plugin). So each plugin knows to determine if they are in business of handling a given URL. Then we're caching the result so next time we know which API to use. Solved also the "GitLab installed under a sub-dir" by performing some additional HTTP requests but only when we have the non-standard case Thank you again for support. Closing this issue |
@claudiu-cristea nice to know that approach makes sense, the plugins are kinda like the scanners in |
I'm missing some information that allows me to fully understand an entry from https://api.developers.italia.it/v1/software:
The code platform
In the case of self-hosted GitLab or Bitbucket software it's difficult to understand the underlying technology. Is it GitLab or Bitbucket? What API should I use if I want to fetch more info about that project? This info is not part of
publiccodeYml
blob either. But https://github.com/italia/publiccode-crawler knows this information and it would be nice to be exposed on the same level asid
,url
,publiccodeYml
, etc. For instance:Of course,
platformType
could be any ofgithub
,gitlab
,bitbucket
.The project's full path
Let's take an hypothetical case, a GitLab self-hosted project having this URL: https://example.com/base/path/group1/group2/group3/project.git. Note that the GitLab instance is installed at https://example.com/base/path (in a subdirectory, relative to the domain).
If a consumer of https://api.developers.italia.it/v1/software API wants to understand which is the project full path (namespace and project), by extracting it from the URL, they will fail. That's because extracting the path is misleading. Most probably they will assume that everything that comes after the host is the project full path:
base/path/group1/group2/group3
project
But this is wrong as the project's namespace is
group1/group2/group3
. Again, this information is missed also frompubliccodeYml
blob.I think this info should be exposed. something like:
Moreover, this info is already available and exposed, as I see, by the
/software/{softwareId}/logs
path. In this way, a consumer understands how to derive the base URL of the GitLab/Bitbucket self-hosted platform.The text was updated successfully, but these errors were encountered: