Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ETQ tech: je veux publier les fichiers opendata selon les bonnes pratiques de data.gouv #11100

Draft
wants to merge 7 commits into
base: main
Choose a base branch
from
51 changes: 44 additions & 7 deletions app/jobs/cron/datagouv/account_by_month_job.rb
Original file line number Diff line number Diff line change
Expand Up @@ -3,19 +3,56 @@
class Cron::Datagouv::AccountByMonthJob < Cron::CronJob
include DatagouvCronSchedulableConcern
self.schedule_expression = "every month at 4:30"
FILE_NAME = "nb_comptes_crees_par_mois"
HEADERS = ["mois", "nb_comptes_crees_par_mois"]
FILE_NAME = "#{HEADERS[1]}.csv"

Check warning on line 7 in app/jobs/cron/datagouv/account_by_month_job.rb

View check run for this annotation

Codecov / codecov/patch

app/jobs/cron/datagouv/account_by_month_job.rb#L6-L7

Added lines #L6 - L7 were not covered by tests

def perform(*args)
GenerateOpenDataCsvService.save_csv_to_tmp(FILE_NAME, data) do |file|
begin
if existing_resource?(:statistics_dataset, :statistics_account_by_month_resource)

Check warning on line 10 in app/jobs/cron/datagouv/account_by_month_job.rb

View check run for this annotation

Codecov / codecov/patch

app/jobs/cron/datagouv/account_by_month_job.rb#L10

Added line #L10 was not covered by tests
# there is an existing file, so we update it
existing_data = existing_data(:statistics_dataset, :statistics_account_by_month_resource)
nb_months_to_query = nb_months_to_query(existing_data)
if nb_months_to_query > 0
nb_months_to_query.downto(1) do
existing_data << data(n)
end
end
GenerateOpenDataCsvService.save_csv_to_tmp(FILE_NAME, HEADERS, existing_data) do |file|
APIDatagouv::API.upload(file, :statistics_dataset, :statistics_account_by_month_resource)
end
else

Check warning on line 22 in app/jobs/cron/datagouv/account_by_month_job.rb

View check run for this annotation

Codecov / codecov/patch

app/jobs/cron/datagouv/account_by_month_job.rb#L12-L22

Added lines #L12 - L22 were not covered by tests
# no existing file, so we create a new file with only the data of the previous month
data = [data(1)]
GenerateOpenDataCsvService.save_csv_to_tmp(FILE_NAME, HEADERS, data) do |file|

Check warning on line 25 in app/jobs/cron/datagouv/account_by_month_job.rb

View check run for this annotation

Codecov / codecov/patch

app/jobs/cron/datagouv/account_by_month_job.rb#L24-L25

Added lines #L24 - L25 were not covered by tests
APIDatagouv::API.upload(file, :statistics_dataset)
ensure
FileUtils.rm(file)
end
end
end

def data
User.where(created_at: 1.month.ago.all_month).count
private

Check warning on line 31 in app/jobs/cron/datagouv/account_by_month_job.rb

View check run for this annotation

Codecov / codecov/patch

app/jobs/cron/datagouv/account_by_month_job.rb#L31

Added line #L31 was not covered by tests

def data(n)
[date_last_month(n), User.where(created_at: n.month.ago.all_month).count]
end

Check warning on line 35 in app/jobs/cron/datagouv/account_by_month_job.rb

View check run for this annotation

Codecov / codecov/patch

app/jobs/cron/datagouv/account_by_month_job.rb#L33-L35

Added lines #L33 - L35 were not covered by tests

def date_last_month(n)
Date.today.prev_month(n).strftime("%Y %B")
end

Check warning on line 39 in app/jobs/cron/datagouv/account_by_month_job.rb

View check run for this annotation

Codecov / codecov/patch

app/jobs/cron/datagouv/account_by_month_job.rb#L37-L39

Added lines #L37 - L39 were not covered by tests

def existing_resource?(dataset, resource)
APIDatagouv::API.check_existing_resource(dataset, resource)
end

Check warning on line 43 in app/jobs/cron/datagouv/account_by_month_job.rb

View check run for this annotation

Codecov / codecov/patch

app/jobs/cron/datagouv/account_by_month_job.rb#L41-L43

Added lines #L41 - L43 were not covered by tests

def existing_data(dataset, resource)
file_url = APIDatagouv::API.get_existing_file_url(dataset, resource)
csv_data = URI.open(file_url).read

Check failure

Code scanning / CodeQL

Use of `Kernel.open` or `IO.read` or similar sinks with a non-constant value Critical

Call to URI.open with a non-constant value. Consider replacing it with URI().open.

Copilot Autofix AI about 10 hours ago

To fix the problem, we should replace the use of URI.open with a safer alternative. Specifically, we can use Net::HTTP.get to fetch the content of the URL, which does not have the same vulnerability as URI.open.

  1. Replace the call to URI.open(file_url).read with Net::HTTP.get(URI.parse(file_url)).
  2. Ensure that the net/http library is required at the top of the file if it is not already.
Suggested changeset 1
app/jobs/cron/datagouv/account_by_month_job.rb

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/app/jobs/cron/datagouv/account_by_month_job.rb b/app/jobs/cron/datagouv/account_by_month_job.rb
--- a/app/jobs/cron/datagouv/account_by_month_job.rb
+++ b/app/jobs/cron/datagouv/account_by_month_job.rb
@@ -1,2 +1,3 @@
 # frozen_string_literal: true
+require 'net/http'
 
@@ -46,3 +47,3 @@
     file_url = APIDatagouv::API.get_existing_file_url(dataset, resource)
-    csv_data = URI.open(file_url).read
+    csv_data = Net::HTTP.get(URI.parse(file_url))
     data = []
EOF
@@ -1,2 +1,3 @@
# frozen_string_literal: true
require 'net/http'

@@ -46,3 +47,3 @@
file_url = APIDatagouv::API.get_existing_file_url(dataset, resource)
csv_data = URI.open(file_url).read
csv_data = Net::HTTP.get(URI.parse(file_url))
data = []
Copilot is powered by AI and may make mistakes. Always verify output.
Positive Feedback
Negative Feedback

Provide additional feedback

Please help us improve GitHub Copilot by sharing more details about this comment.

Please select one or more of the options

Check failure

Code scanning / CodeQL

Use of `Kernel.open`, `IO.read` or similar sinks with user-controlled input Critical

This call to URI.open depends on a
user-provided value
. Consider replacing it with URI().open.

Copilot Autofix AI about 10 hours ago

To fix the problem, we need to replace the URI.open call with a safer alternative. The recommended approach is to use URI(<uri>).open or an HTTP client to fetch the content. This ensures that the URL is properly parsed and validated before being used.

In this case, we will replace URI.open(file_url).read with URI(file_url).open.read. This change will ensure that the URL is parsed and validated before being opened, mitigating the risk of injection attacks.

Suggested changeset 1
app/jobs/cron/datagouv/account_by_month_job.rb

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/app/jobs/cron/datagouv/account_by_month_job.rb b/app/jobs/cron/datagouv/account_by_month_job.rb
--- a/app/jobs/cron/datagouv/account_by_month_job.rb
+++ b/app/jobs/cron/datagouv/account_by_month_job.rb
@@ -46,3 +46,3 @@
     file_url = APIDatagouv::API.get_existing_file_url(dataset, resource)
-    csv_data = URI.open(file_url).read
+    csv_data = URI(file_url).open.read
     data = []
EOF
@@ -46,3 +46,3 @@
file_url = APIDatagouv::API.get_existing_file_url(dataset, resource)
csv_data = URI.open(file_url).read
csv_data = URI(file_url).open.read
data = []
Copilot is powered by AI and may make mistakes. Always verify output.
Positive Feedback
Negative Feedback

Provide additional feedback

Please help us improve GitHub Copilot by sharing more details about this comment.

Please select one or more of the options

Check failure

Code scanning / CodeQL

Server-side request forgery Critical

The URL of this request depends on a
user-provided value
.

Copilot Autofix AI about 10 hours ago

To fix the problem, we need to ensure that the URL derived from the HTTP response is validated before it is used. One way to do this is to check that the URL belongs to a trusted domain. This can be done by parsing the URL and verifying its host against a whitelist of allowed hosts.

  1. Parse the URL using URI.parse.
  2. Check that the host of the URL is in a predefined list of allowed hosts.
  3. Raise an error or handle the case where the URL is not from an allowed host.
Suggested changeset 1
app/jobs/cron/datagouv/account_by_month_job.rb

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/app/jobs/cron/datagouv/account_by_month_job.rb b/app/jobs/cron/datagouv/account_by_month_job.rb
--- a/app/jobs/cron/datagouv/account_by_month_job.rb
+++ b/app/jobs/cron/datagouv/account_by_month_job.rb
@@ -1,2 +1,3 @@
 # frozen_string_literal: true
+require 'uri'
 
@@ -46,2 +47,7 @@
     file_url = APIDatagouv::API.get_existing_file_url(dataset, resource)
+    uri = URI.parse(file_url)
+    allowed_hosts = ["trusted.domain.com"]
+    unless allowed_hosts.include?(uri.host)
+      raise "Untrusted URL host: #{uri.host}"
+    end
     csv_data = URI.open(file_url).read
EOF
@@ -1,2 +1,3 @@
# frozen_string_literal: true
require 'uri'

@@ -46,2 +47,7 @@
file_url = APIDatagouv::API.get_existing_file_url(dataset, resource)
uri = URI.parse(file_url)
allowed_hosts = ["trusted.domain.com"]
unless allowed_hosts.include?(uri.host)
raise "Untrusted URL host: #{uri.host}"
end
csv_data = URI.open(file_url).read
Copilot is powered by AI and may make mistakes. Always verify output.
Positive Feedback
Negative Feedback

Provide additional feedback

Please help us improve GitHub Copilot by sharing more details about this comment.

Please select one or more of the options
data = []
CSV.parse(csv_data, headers: true) do |row|
data << [row[0], row[1].to_i]
end
end

Check warning on line 52 in app/jobs/cron/datagouv/account_by_month_job.rb

View check run for this annotation

Codecov / codecov/patch

app/jobs/cron/datagouv/account_by_month_job.rb#L45-L52

Added lines #L45 - L52 were not covered by tests

def nb_months_to_query(existing_data)
last_date = Date.parse("#{existing_data[0][0]}-01")
(Date.today.year * 12 + Date.today.month) - (last_date.year * 12 + last_date.month) - 1

Check warning on line 56 in app/jobs/cron/datagouv/account_by_month_job.rb

View check run for this annotation

Codecov / codecov/patch

app/jobs/cron/datagouv/account_by_month_job.rb#L54-L56

Added lines #L54 - L56 were not covered by tests
end
end
39 changes: 39 additions & 0 deletions app/lib/api_datagouv/api.rb
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,28 @@
end

class << self
def check_existing_resource(dataset, resource)
response = Typhoeus.get(
datagouv_check_url(dataset, resource),
headers: { "X-Api-Key" => datagouv_secret[:api_key] }
)

Check warning on line 20 in app/lib/api_datagouv/api.rb

View check run for this annotation

Codecov / codecov/patch

app/lib/api_datagouv/api.rb#L16-L20

Added lines #L16 - L20 were not covered by tests

response.success?
end

Check warning on line 23 in app/lib/api_datagouv/api.rb

View check run for this annotation

Codecov / codecov/patch

app/lib/api_datagouv/api.rb#L22-L23

Added lines #L22 - L23 were not covered by tests

def get_existing_file_url(dataset, resource)
response = Typhoeus.get(
datagouv_get_resource_url(dataset, resource),
followlocation: true
)

Check warning on line 29 in app/lib/api_datagouv/api.rb

View check run for this annotation

Codecov / codecov/patch

app/lib/api_datagouv/api.rb#L25-L29

Added lines #L25 - L29 were not covered by tests

if response.success?
JSON.parse(response.body)["url"]
else
raise RequestFailed.new(datagouv_get_resource_url(dataset, resource), response)
end
end

Check warning on line 36 in app/lib/api_datagouv/api.rb

View check run for this annotation

Codecov / codecov/patch

app/lib/api_datagouv/api.rb#L31-L36

Added lines #L31 - L36 were not covered by tests

def upload(io, dataset, resource = nil)
response = Typhoeus.post(
datagouv_upload_url(dataset, resource),
Expand All @@ -32,6 +54,23 @@

private

def datagouv_check_url(dataset, resource)
[
datagouv_secret[:api_url],
"/datasets/", datagouv_secret[dataset],
"/resources/", datagouv_secret[resource],
"/check/"
].join
end

Check warning on line 64 in app/lib/api_datagouv/api.rb

View check run for this annotation

Codecov / codecov/patch

app/lib/api_datagouv/api.rb#L57-L64

Added lines #L57 - L64 were not covered by tests

def datagouv_get_resource_url(dataset, resource)
[
datagouv_secret[:api_url],
"/datasets/", datagouv_secret[dataset],
"/resources/", datagouv_secret[resource]
].join
end

Check warning on line 72 in app/lib/api_datagouv/api.rb

View check run for this annotation

Codecov / codecov/patch

app/lib/api_datagouv/api.rb#L66-L72

Added lines #L66 - L72 were not covered by tests

def datagouv_upload_url(dataset, resource = nil)
if resource.present?
[
Expand Down
22 changes: 6 additions & 16 deletions app/services/generate_open_data_csv_service.rb
Original file line number Diff line number Diff line change
@@ -1,21 +1,11 @@
# frozen_string_literal: true

class GenerateOpenDataCsvService
def self.save_csv_to_tmp(file_name, data)
f = Tempfile.create(["#{file_name}_#{date_last_month}", '.csv'], 'tmp')
f << generate_csv(file_name, data)
f.rewind
yield f if block_given?
f.close
end

def self.date_last_month
Date.today.prev_month.strftime("%B")
end

def self.generate_csv(file_name, data)
headers = ["mois", file_name]
data = [[date_last_month, data]]
SpreadsheetArchitect.to_csv(headers: headers, data: data)
def self.save_csv_to_tmp(file_name, headers, data)
Tempfile.create(file_name, 'tmp') do |file|
file << SpreadsheetArchitect.to_csv(headers:, data:)
file.rewind
yield file if block_given?
end

Check warning on line 9 in app/services/generate_open_data_csv_service.rb

View check run for this annotation

Codecov / codecov/patch

app/services/generate_open_data_csv_service.rb#L4-L9

Added lines #L4 - L9 were not covered by tests
end
end
1 change: 1 addition & 0 deletions config/env.example.optional
Original file line number Diff line number Diff line change
Expand Up @@ -173,6 +173,7 @@ OPENDATA_ENABLED="enabled" # disabled by default if `OPENDATA_ENABLED` not set
DATAGOUV_API_KEY="thisisasecret"
DATAGOUV_API_URL="https://www.data.gouv.fr/api/1"
DATAGOUV_STATISTICS_DATASET="dataset-id1"
DATAGOUV_STATICTICS_ACOUNT_RESOURCE="resource-id1-of-dataset-id1"
DATAGOUV_DESCRIPTIF_DEMARCHES_DATASET="dataset-id2"
DATAGOUV_DESCRIPTIF_DEMARCHES_RESOURCE="resource-id-of-dataset-id2"

Expand Down
1 change: 1 addition & 0 deletions config/secrets.yml
Original file line number Diff line number Diff line change
Expand Up @@ -81,6 +81,7 @@ defaults: &defaults
descriptif_demarches_dataset: <%= ENV['DATAGOUV_DESCRIPTIF_DEMARCHES_DATASET'] %>
descriptif_demarches_resource: <%= ENV['DATAGOUV_DESCRIPTIF_DEMARCHES_RESOURCE'] %>
statistics_dataset: <%= ENV['DATAGOUV_STATISTICS_DATASET'] %>
statistics_account_by_month_resource: <%= ENV['DATAGOUV_STATICTICS_ACOUNT_RESOURCE'] %>


development:
Expand Down
14 changes: 9 additions & 5 deletions spec/jobs/cron/datagouv/account_by_month_job_spec.rb
Original file line number Diff line number Diff line change
Expand Up @@ -4,18 +4,22 @@
let!(:user) { create(:user, created_at: 1.month.ago) }
let(:status) { 200 }
let(:body) { "ok" }
let(:stub) { stub_request(:post, /https:\/\/www.data.gouv.fr\/api\/.*\/upload\//) }
# let(:stub) { stub_request(:post, /https:\/\/www.data.gouv.fr\/api\/.*\/upload\//) }

describe 'perform' do
before do
stub
end
# before do
# stub
# end

subject { Cron::Datagouv::AccountByMonthJob.perform_now }

it 'send POST request to datagouv' do
subject
expect(stub).to have_been_requested
allow(APIDatagouv::API).to receive(:upload) do |file|
csv = CSV.read(file, headers: true)
expect(csv[0]['mois']).to eq(Date.today.prev_month.strftime("%B %Y"))
end
# expect(stub).to have_been_requested
end
end
end
Loading