Skip to content
jh-RLI edited this page Sep 18, 2022 · 10 revisions

OEP Data Review

Process and Workflow

Contributor

open an GitHub issue

  • Use the OEP-repository data-preprocessing

  • Use the issue template (ToDo: Create template)

    issue name: review/[data-set-name]
    issue tag: review
    issue text: (ToDo: Create template)

data upload

  • (to be continued)
  1. Write data to the database schema model_draft using the OEP-API (template here)
  2. Make sure the table name follows the OEP Naming Conventions
  3. Write the metadata string (examples and templates here)
  4. Create an issue in this repository with tag review
  5. Get in contact with OEP reviewers and find responsible person
  6. If necessary, revise the data and metadata
  7. Get familiar with the OEP community and become a OEP data reviewer!

Naming Conventions

Dos & Don'ts

  • only use lower case
  • use the singular instead of the plural.
  • use underscores
  • use ASCII characters only
  • no points, no commas
  • no spaces
  • avoid dates
  • table and column names are limited to 50 characters

Table Name

  • name starts with the copyright owner, source, project or model name (e.g. zensus, ego, oemof)
  • main value (e.g. population)
  • if separated by [attribute] (e.g. by_gender)
  • with resolution [tupel] (e.g. per_mun)

Example: zensus_population_by_gender_per_mun Remember to add new energy-related abbreviations to the Glossary

Collection of Criteria

General

  • Data, metadata and additional material (e.g., documentation, article) has been provided
  • User rights are set

Metadata

  • Include metadata string to repo
  • Metadata file has header
    • Metadata licensed with Public Domain (CC0)
    • Authors included
  • Metadata follows additional information
  • All columns are described in resources/fields
  • All languages are listed
  • Open Data
    • Suitable open license
    • All sources included (all attributions correct)
    • All links to sources included
  • Add appropriate OEP tags

Data

  • Primary Key is set
  • Fields that can be empty are specified as Nullable=True

Geographic Data

  • (PostGIS)-Geometry is in column named geom (vector) or rast (raster)
  • Data type is geometry (or raster)
  • The CRS (SRID) defined is defined as EPSG
    • Original data stays with the original CRS
    • Prefered CRS of the oedb are
    1. WGS84 - EPSG: 4326
    2. ETRS89 / ETRS-LAEA - EPSG: 3035
    • Spacial Index (GIST) on column geom
    • All geometries are valid (ST_IsValid)

To get a basic understanding of CRS, see e.g. QGIS docs.

Data quality

The database set-up of the OEP is designed to support users in achieving good data quality:

  • Plausibility and integration tests are applied to identify mistakes in the data.
  • When the number of users and reviewers becomes large enough, user evaluations and ratings on data quality will be implemented.

Further information and guidelines regarding data management and data publication can be found here: Open Knowledge Foundation, Open Data Foundation and Software Carpentry (e.g. here).

Badge system

Refer to https://cos.io/our-services/open-science-badges/

The quality of data is indicated by a badge, e.g.

  • Bronze
  • Silver
  • Gold
  • Platin

A certain badge implies that defined criteria are fulfilled, including subordinate ones (e.g. datasets holding a gold badge also fulfill criteria of bronze and silver).

Badge criteria

  1. Bronze (must-have)
  • Primary key
  • Follows naming conventions
  • Meta data exist
  • ...
  1. Silver (should-have)
  • Meta data exhaustive
  • Spatial index defined
  • ...
  1. Gold (good-to-have)
  • Plausibility and integrity -> a testing script is provided for verification
  • ...
  1. Platin (best-practice)
  • Approved/rated positively by XX users
  • ...