-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API for retrieving metrics on software catalog #174
Comments
Let's have the crawler query the ES and output the results in a JSON file. Such a file will be public in a directory served by nginx. See italia/developers.italia.it#406 as a reference to such a public dir. |
@biancini we could close this issue since the point was achieved by italia/developers.italia.it#406 |
@sebbalex I believe the solution that is in use right now is not an API so I would leave this open for future improvements. I still believe it could be nice to have an actual and proper API for this. |
Moving the issue to developers-italia-api |
This should be doable now with something like: import json
from collections import defaultdict
import requests
import yaml
API_BASE_URL = "https://api.developers.italia.it/v1"
def get_paginated(resource: str):
items = []
page = True
page_after = ""
while page:
res = requests.get(f"{API_BASE_URL}/{resource}?all=true&{page_after}")
res.raise_for_status()
body = res.json()
items += body["data"]
page_after = body["links"]["next"]
if page_after:
# Remove the '?'
page_after = page_after[1:]
page = bool(page_after)
return items
software = get_paginated("software")
publishers = get_paginated("publishers")
by_date = defaultdict(
lambda: {
"num_software_pa": 0,
"num_software_thirdparty": 0,
"num_administrations": 0,
}
)
for s in software:
date = s["createdAt"][:10]
try:
publiccode = yaml.safe_load(s["publiccodeYml"])
if publiccode.get("it", {}).get("riuso", {}).get("codiceIPA"):
by_date[date]["num_software_pa"] += 1
else:
by_date[date]["num_software_thirdparty"] += 1
except:
pass
administrations = set()
for publisher in publishers:
if publisher.get("alternativeId"):
administrations.add(publisher["id"])
date = publisher["createdAt"][:10]
by_date[date]["num_administrations"] = len(administrations)
print(json.dumps([{date: counts} for date, counts in by_date.items()], indent=4)) but I wouldn't turn into an endpoint into the API, as the data is easily available without hardcoding the metrics. |
To integrate the work ongoing on metric creation for Developers /Italia, it would be great to have an API (talking JSON) that shows the following data:
Numeric indicator with the number of software projects released in the catalog in section A.
Numeric indicator with the number of software projects released in the catalog in section B.
Numeric indicator of all unique administrations that have at least one software published in the catalog.
Numeri indicator of the average of the vitality index for all projects in the catalog (either A and section B).
If the crawler has the data, it would be great to have this JSON API also proposing the evolution of these numbers over time (since the beginning of Developers /Italia).
The output could be of this form:
The text was updated successfully, but these errors were encountered: