Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ETQ tech: je veux publier les fichiers opendata selon les bonnes pratiques de data.gouv #11100

Draft
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

Benoit-MINT
Copy link
Contributor

@Benoit-MINT Benoit-MINT commented Nov 28, 2024

Dans le dataset "utilisation de DS" : https://www.data.gouv.fr/fr/datasets/utilisation-du-service-demarches-simplifiees/
Aujourd'hui, on ajoute chaque mois une nouvelle resource par type de données :

  • Nombre d'instructeurs connectés par mois
  • Nombre de comptes usagers créés avec France Connect par mois
  • Nombre de démarches supprimées par mois
  • Nombre de démarches closes par mois
  • Nombre de démarches publiées par mois
  • Nombre de comptes créés par mois
  • Nombre de dossiers créés par mois
  • Nombre d'instructeurs créés par mois
  • Nombre d'administrateurs créés par mois
    Ceci ne suit pas les bonnes pratiques de data.gouv pour faciliter la réutilisation.
    Il est préférable de maintenir à jour, chaque mois, une même resource par type de données. Cela revient à ajouter de nouvelles lignes aux mêmes fichiers.

PR en draft, cas d'usage sur "Nombre de comptes créés par mois"


def existing_data(dataset, resource)
file_url = APIDatagouv::API.get_existing_file_url(dataset, resource)
csv_data = URI.open(file_url).read

Check failure

Code scanning / CodeQL

Use of `Kernel.open` or `IO.read` or similar sinks with a non-constant value Critical

Call to URI.open with a non-constant value. Consider replacing it with URI().open.

Copilot Autofix AI about 8 hours ago

To fix the problem, we should replace the use of URI.open with a safer alternative. Specifically, we can use Net::HTTP.get to fetch the content of the URL, which does not have the same vulnerability as URI.open.

  1. Replace the call to URI.open(file_url).read with Net::HTTP.get(URI.parse(file_url)).
  2. Ensure that the net/http library is required at the top of the file if it is not already.
Suggested changeset 1
app/jobs/cron/datagouv/account_by_month_job.rb

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/app/jobs/cron/datagouv/account_by_month_job.rb b/app/jobs/cron/datagouv/account_by_month_job.rb
--- a/app/jobs/cron/datagouv/account_by_month_job.rb
+++ b/app/jobs/cron/datagouv/account_by_month_job.rb
@@ -1,2 +1,3 @@
 # frozen_string_literal: true
+require 'net/http'
 
@@ -46,3 +47,3 @@
     file_url = APIDatagouv::API.get_existing_file_url(dataset, resource)
-    csv_data = URI.open(file_url).read
+    csv_data = Net::HTTP.get(URI.parse(file_url))
     data = []
EOF
@@ -1,2 +1,3 @@
# frozen_string_literal: true
require 'net/http'

@@ -46,3 +47,3 @@
file_url = APIDatagouv::API.get_existing_file_url(dataset, resource)
csv_data = URI.open(file_url).read
csv_data = Net::HTTP.get(URI.parse(file_url))
data = []
Copilot is powered by AI and may make mistakes. Always verify output.
Positive Feedback
Negative Feedback

Provide additional feedback

Please help us improve GitHub Copilot by sharing more details about this comment.

Please select one or more of the options

def existing_data(dataset, resource)
file_url = APIDatagouv::API.get_existing_file_url(dataset, resource)
csv_data = URI.open(file_url).read

Check failure

Code scanning / CodeQL

Use of `Kernel.open`, `IO.read` or similar sinks with user-controlled input Critical

This call to URI.open depends on a
user-provided value
. Consider replacing it with URI().open.

Copilot Autofix AI about 8 hours ago

To fix the problem, we need to replace the URI.open call with a safer alternative. The recommended approach is to use URI(<uri>).open or an HTTP client to fetch the content. This ensures that the URL is properly parsed and validated before being used.

In this case, we will replace URI.open(file_url).read with URI(file_url).open.read. This change will ensure that the URL is parsed and validated before being opened, mitigating the risk of injection attacks.

Suggested changeset 1
app/jobs/cron/datagouv/account_by_month_job.rb

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/app/jobs/cron/datagouv/account_by_month_job.rb b/app/jobs/cron/datagouv/account_by_month_job.rb
--- a/app/jobs/cron/datagouv/account_by_month_job.rb
+++ b/app/jobs/cron/datagouv/account_by_month_job.rb
@@ -46,3 +46,3 @@
     file_url = APIDatagouv::API.get_existing_file_url(dataset, resource)
-    csv_data = URI.open(file_url).read
+    csv_data = URI(file_url).open.read
     data = []
EOF
@@ -46,3 +46,3 @@
file_url = APIDatagouv::API.get_existing_file_url(dataset, resource)
csv_data = URI.open(file_url).read
csv_data = URI(file_url).open.read
data = []
Copilot is powered by AI and may make mistakes. Always verify output.
Positive Feedback
Negative Feedback

Provide additional feedback

Please help us improve GitHub Copilot by sharing more details about this comment.

Please select one or more of the options

def existing_data(dataset, resource)
file_url = APIDatagouv::API.get_existing_file_url(dataset, resource)
csv_data = URI.open(file_url).read

Check failure

Code scanning / CodeQL

Server-side request forgery Critical

The URL of this request depends on a
user-provided value
.

Copilot Autofix AI about 8 hours ago

To fix the problem, we need to ensure that the URL derived from the HTTP response is validated before it is used. One way to do this is to check that the URL belongs to a trusted domain. This can be done by parsing the URL and verifying its host against a whitelist of allowed hosts.

  1. Parse the URL using URI.parse.
  2. Check that the host of the URL is in a predefined list of allowed hosts.
  3. Raise an error or handle the case where the URL is not from an allowed host.
Suggested changeset 1
app/jobs/cron/datagouv/account_by_month_job.rb

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/app/jobs/cron/datagouv/account_by_month_job.rb b/app/jobs/cron/datagouv/account_by_month_job.rb
--- a/app/jobs/cron/datagouv/account_by_month_job.rb
+++ b/app/jobs/cron/datagouv/account_by_month_job.rb
@@ -1,2 +1,3 @@
 # frozen_string_literal: true
+require 'uri'
 
@@ -46,2 +47,7 @@
     file_url = APIDatagouv::API.get_existing_file_url(dataset, resource)
+    uri = URI.parse(file_url)
+    allowed_hosts = ["trusted.domain.com"]
+    unless allowed_hosts.include?(uri.host)
+      raise "Untrusted URL host: #{uri.host}"
+    end
     csv_data = URI.open(file_url).read
EOF
@@ -1,2 +1,3 @@
# frozen_string_literal: true
require 'uri'

@@ -46,2 +47,7 @@
file_url = APIDatagouv::API.get_existing_file_url(dataset, resource)
uri = URI.parse(file_url)
allowed_hosts = ["trusted.domain.com"]
unless allowed_hosts.include?(uri.host)
raise "Untrusted URL host: #{uri.host}"
end
csv_data = URI.open(file_url).read
Copilot is powered by AI and may make mistakes. Always verify output.
Positive Feedback
Negative Feedback

Provide additional feedback

Please help us improve GitHub Copilot by sharing more details about this comment.

Please select one or more of the options
Copy link

codecov bot commented Nov 28, 2024

Codecov Report

Attention: Patch coverage is 0% with 76 lines in your changes missing coverage. Please review.

Project coverage is 44.61%. Comparing base (7932085) to head (146d00c).
Report is 31 commits behind head on main.

Files with missing lines Patch % Lines
app/jobs/cron/datagouv/account_by_month_job.rb 0.00% 37 Missing ⚠️
app/lib/api_datagouv/api.rb 0.00% 33 Missing ⚠️
app/services/generate_open_data_csv_service.rb 0.00% 6 Missing ⚠️

❗ There is a different number of reports uploaded between BASE (7932085) and HEAD (146d00c). Click for more details.

HEAD has 14 uploads less than BASE
Flag BASE (7932085) HEAD (146d00c)
16 2
Additional details and impacted files
@@             Coverage Diff             @@
##             main   #11100       +/-   ##
===========================================
- Coverage   84.35%   44.61%   -39.75%     
===========================================
  Files        1176     1178        +2     
  Lines       25944    30847     +4903     
  Branches     4898     3953      -945     
===========================================
- Hits        21886    13762     -8124     
- Misses       4058    17085    +13027     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant