Skip to content
João Moreira edited this page Feb 23, 2018 · 56 revisions

Enabling B2SHARE to behave as a FAIR Data Point: proof-of-concept

Project scope

This Figure illustrates the simplified lifecycle of EUDAT [1] data management approach supported by the FAIR data principles. The highlighted parts limit the scope of this project for B2SHARE service [2], fitting into:

  • Preserving data with metadata for interoperability and reuse; and
  • Re-using data with metadata for findability.

Scope

Project goal

Design problem: how to improve the “FAIRness” of B2SHARE?

How to make B2SHARE compliant with the FAIR data principles?

The goal of B2SHARE-FAIR project is to demonstrate how B2SHARE can behave as a FDP. This POC demonstrates how to enable EUDAT B2SHARE to be compliant with the FAIR Data Point (FDP) specification [3], i.e. enable B2SHARE to capture metadata as FDP by implementing a proxy of B2SHARE API interfaces. FDP metadata specification consists of 4 layers of metadata:

  • L1. Metadata about the data repository
  • L2. Metadata about the catalog/groups of datasets
  • L3. Metadata about the datasets
  • L4. Metadata about the dataset’s distributions

The content of these metadata needs to be stored and, once having the metadata stored, the metadata needs to be exposed also according to FDP specification.

Requirements

R1. Develop “non-intrusive” solution (decoupled to B2SHARE): use B2SHARE REST API to access data.

R2. Improve semantic interoperability of B2SHARE: enable B2SHARE REST API to provide semantic data (annotated).

R3. Identify mappings between B2SHARE and FDP: align B2SHARE terminology with FDP metadata layers.

An ontological analysis of B2SHARE service was performed and the main parts of the conceptual model are illustrated below:

  • B2SHARE information system: B2SHARE information system

  • B2SHARE entities: B2SHARE entities ("concepts")

Obs.: The ontological language OntoUML [4], the language of the Unified Foundational Ontology (UFO), was used.

Basic concepts (from [5]):

  1. (Scientific) community: roles of creating and maintaining metadata schemas and curating the datasets which are part of a scientific domain or research project. B2SHARE users can be part of one or more communities.
  2. Community administrator: selected member of a community with grants needed for the metadata schema definitions and record curation tasks.
  3. Record: Any user can upload scientific datasets into B2SHARE within a data record. A record is comprised of data files and associated metadata schema. A record is always connected to one scientific community which has the role of curating and maintaining it.
  4. Metadata schema: a set of record metadata fields and their constraints/rules/formats. A record contains a set of common metadata fields and a set of custom metadata blocks. This metadata is not free form, however, but is governed by static schemas; the common metadata schema is set by B2SHARE and defines a superset of Dublin Core elements, while the schema for the custom metadata block is specific to each community and can be customized by the community administrators. The schemas are formally defined in the JSON Schema format. A special HTTP API call is available for retrieving the JSON Schema of a record in a specific community. In order to be accepted, the records submitted to a community must conform to the schema required by the community.
  5. Root (generic) metadata schema: a metadata schema applied to any record, i.e. common fixed metadata fields for all communities.
  6. Community (custom) metadata schema: custom metadata blocks, each block containing related metadata fields.
  7. Record states: a data record can exist in several states.
  8. Record 'draft' state: Immediately after creation a record enters the draft state. In this state the record is only accessible by its owner and can be freely modified: its metadata can be changed and files can be uploaded into or removed from it.
  9. Record 'published' state: a draft can be published at any time, and through this action it changes its state from 'draft' to 'published', is assigned Persistent Identifiers (PID), and becomes publicly accessible. The list of files in a published record cannot be changed.

To achieve the aforementioned goal, it is required to extend B2SHARE service, accommodating the 4 layers of FDP metadata:

  • L1. Metadata about the data repository: the data repository should be “fixed” as B2SHARE (EUDAT) website, thus, the fields should be incorporated for an unique instance, being configurable by the administrator, thus, this metadata can be stored in a configuration file.
  • L2. Metadata about the catalog/groups of datasets: the catalog of datasets can be mapped to the communities of B2SHARE. Therefore, the CRUD operations of these metadata must use the same approach of the CRUD of communities (B2SHARE).
  • L3. Metadata about the datasets: the dataset concept from FDP can be mapped to the records registered within B2SHARE. Therefore, the CRUD operations of these metadata must use the same approach of the CRUD of records (B2SHARE).
  • L4. Metadata about the dataset’s distributions: different concrete serialization for the same datasets, e.g., XML, CSV, Excel, relational database, etc. In B2SHARE these metadata is reflected by the files uploaded within a record. Therefore, the CRUD operations of these metadata must use the same approach of the CRUD of files (B2SHARE).

Obs.: The analysis was performed from Jun/2017 to Aug/2017, which included the system analysis, detailing of requirements, scope and study of the solution considering the available resources.

Solution architecture

The solution is inspired in the research on model-driven engineering (MDE) and semantic translations. MDE transformations are illustrated in the figure below, from Brambilla book.

B2SHARE entities ("concepts")

Semantic translation: “process of changing the underlying semantics of a piece of knowledge. Given some information described semantically, in terms of a source ontology, it is transformed into information described in terms of a target ontology” [6]

Although B2SHARE does not provide an ontology, we consider its metadata model as a "piece of knowledge". Therefore, the solution is the implementation of translations from B2SHARE REST API resources (source) to FDP levels (target) to be executed at runtime as a proxy of B2SHARE REST API. The communication diagram illustrating the translation of the third level of FDP, the data repository itself. The user accesses B2SHARE-FAIR REST API, which makes a synchronous call to B2SHARE REST API and transforms the data received (as JSON) to JSON-LD, providing data semantically enriched.

B2SHARE entities ("concepts")

The solution is implemented through GET endpoints, illustrated in figure below.

FDP level 1 (data repository) is implemented through /fdp/, which makes synchronous call to /api/ resource. FDP level 2 (catalog) is implemented through /catalogs/, which makes synchronous call to /communities/ resource. FDP level 3 (dataset) is implemented through /datasets/, which makes synchronous call to /records/ resource. FDP level 4 (distribution) is implemented through /distributions/, which makes synchronous call to /files/ resource.

B2SHARE entities ("concepts")

A methodology based on MDE is adopted to design the translations as a set of mappings. The image below shows the design of the mappings of FDP level 2 (catalog), where each metadata field of the source is analyzed and an is used to find an equivalent ontology property, either in FDP specifications or other standardized ontologies or a proprietary well founded ontology.

B2SHARE entities ("concepts")

The implementation technology for the POC was chosen according to prior knowledge in libraries for JSON and JSON-LD from different languages and the underlying technology of B2SHARE REST API. Python 3 and PyLD [7] were chosen for this POC.

Code structure

  • B2SHARE-FAIR/src/fair/ contains the models that represent input data and the translation functions.
  • B2SHARE-FAIR/src/mapper/ contains the mapper from JSON to base models.
  • B2SHARE-FAIR/src/proxy/ contains the application endpoints and resources.
  • B2SHARE-FAIR/ontologies/ contains the B2SHARE ontology developed with the predicates (properties) that we considered specific of B2SHARE service.

The /fair/, /mapper/, /proxy/ are the required sources to execute the POC. Each of these has a /tests/ sub-folder that can be executed through pytest command.

Obs.: the translations serializing the data in JSON-LD are implemented in B2SHARE-FAIR/src/fair/translators.py

Deployment

This solution was developed and tested in a server Ubuntu 16 and Python 3.5.2. The requirements for deployment are in: https://github.com/jonimoreira/B2SHARE-FAIR/blob/master/requirements.txt

To start the API: gunicorn proxy.app:api

Validation

The functional validation was done by making a call to each endpoint considering data test from the B2SHARE test environment (data available on 23-02-2018): https://trng-b2share.eudat.eu/api/

Results of B2SHARE-FAIR REST API functional tests:

1. FDP (level 1): data repository http://localhost:8000/fdp/

{
	"r3d:startDate": "01/01/2016",
	"b2:b2note_url": "https://b2note.bsc.es/interface_main.html",
	"@context": {
		"r3d": "http://www.re3data.org/schema/3-0/",
		"foaf": "http://xmlns.com/foaf/",
		"dcat": "http://www.w3.org/ns/dcat#",
		"rdfs": "http://www.w3.org/2000/01/rdf-schema/",
		"rdf": "http://www.w3.org/1999/02/22-rdf-syntax-ns#",
		"lang": "http://id.loc.gov/vocabulary/iso639-1/",
		"fdp": "http://rdf.biosemantics.org/ontologies/fdp-o/",
		"owl": "http://www.w3.org/2002/07/owl#",
		"b2": "https://b2share.eudat.eu/ontology/b2share/",
		"xsd": "http://www.w3.org/2001/XMLSchema#",
		"dct": "http://purl.org/dc/terms/"
	},
	"@type": "r3d:Repository",
	"r3d:repositoryIdentifier": "https://trng-b2share.eudat.eu/fdp-repositoryID",
	"dct:publisher": "SURFsara",
	"b2:b2access_registration_link": "https://b2access.eudat.eu/",
	"b2:site_function": "trng",
	"b2:terms_of_use_link": "http://hdl.handle.net/11304/e43b2e3f-83c5-4e3f-b8b7-18d38d37a6cd",
	"@id": "https://trng-b2share.eudat.eu/",
	"dct:hasVersion": "2.1.1",
	"r3d:institution": "SURFsara",
	"r3d:institutionCountry": "The Netherlands",
	"dct:description": "The EUDAT B2SHARE data repository as a web application",
	"fdp:metadataIssued": "01/01/2016",
	"b2:training_site_link": "",
	"dct:identifier": "https://trng-b2share.eudat.eu/",
	"fdp:metadataModified": "23/02/2018",
	"fdp:metadataIdentifier": "https://trng-b2share.eudat.eu/fdp-metadataID",
	"dct:title": "EUDAT B2SHARE data repository",
	"r3d:lastUpdate": "23/02/2018",
	"rdfs:label": "EUDAT B2SHARE data repository"
}

https://github.com/jonimoreira/B2SHARE-FAIR/blob/master/src/fair/tests/data/fdp_l1_fdp.json

2. Catalogs (level 2): catalogs http://localhost:8000/catalogs/

[
	{
		"@id": "https://trng-b2share.eudat.eu/api/communities/c4234f93-da96-4d2f-a2c8-fa83d0775212",
		"foaf:logo": "/img/communities/aalto.jpg",
		"dct:description": "Aalto University",
		"@context": {
			"b2": "https://b2share.eudat.eu/ontology/b2share/",
			"foaf": "http://xmlns.com/foaf/",
			"dcat": "http://www.w3.org/ns/dcat#",
			"rdfs": "http://www.w3.org/2000/01/rdf-schema#",
			"rdf": "http://www.w3.org/1999/02/22-rdf-syntax-ns#",
			"lang": "http://id.loc.gov/vocabulary/iso639-1/",
			"fdp": "http://rdf.biosemantics.org/ontologies/fdp-o#",
			"owl": "http://www.w3.org/2002/07/owl#",
			"dct": "http://purl.org/dc/terms/",
			"xsd": "http://www.w3.org/2001/XMLSchema#",
			"pro": "http://purl.org/spar/pro/"
		},
		"b2:publication_workflow": "direct_publish",
		"dct:identifier": "c4234f93-da96-4d2f-a2c8-fa83d0775212",
		"dct:modified": "Wed, 21 Dec 2016 08:57:40 GMT",
		"@type": "dcat:Catalog",
		"b2:MemberRole": {
			"dct:description": "Member role of the community \"Aalto\"",
			"@id": "com:c4234f93da964d2fa2c8fa83d0775212:member",
			"@type": "pro:PublishingRole",
			"dct:identifier": 2,
			"dct:title": "com:c4234f93da964d2fa2c8fa83d0775212:member"
		},
		"dct:title": "Aalto",
		"b2:restricted_submission": true,
		"b2:AdminRole": {
			"dct:description": "Admin role of the community \"Aalto\"",
			"@id": "com:c4234f93da964d2fa2c8fa83d0775212:admin",
			"@type": "pro:PublishingRole",
			"dct:identifier": 1,
			"dct:title": "com:c4234f93da964d2fa2c8fa83d0775212:admin"
		},
		"dct:issued": "Wed, 21 Dec 2016 08:57:40 GMT"
	},
	{
		"@id": "https://trng-b2share.eudat.eu/api/communities/99916f6f-9a2c-4feb-a342-6552ac7f1529",
		"foaf:logo": "/img/communities/bbmri.png",
		"dct:description": "Biomedical Research.",
		"@context": {
			"b2": "https://b2share.eudat.eu/ontology/b2share/",
			"foaf": "http://xmlns.com/foaf/",
			"dcat": "http://www.w3.org/ns/dcat#",
			"rdfs": "http://www.w3.org/2000/01/rdf-schema#",
			"rdf": "http://www.w3.org/1999/02/22-rdf-syntax-ns#",
			"lang": "http://id.loc.gov/vocabulary/iso639-1/",
			"fdp": "http://rdf.biosemantics.org/ontologies/fdp-o#",
			"owl": "http://www.w3.org/2002/07/owl#",
			"dct": "http://purl.org/dc/terms/",
			"xsd": "http://www.w3.org/2001/XMLSchema#",
			"pro": "http://purl.org/spar/pro/"
		},
		"b2:publication_workflow": "direct_publish",
		"dct:identifier": "99916f6f-9a2c-4feb-a342-6552ac7f1529",
		"dct:modified": "Wed, 21 Dec 2016 08:57:40 GMT",
		"@type": "dcat:Catalog",
		"b2:MemberRole": {
			"dct:description": "Member role of the community \"BBMRI\"",
			"@id": "com:99916f6f9a2c4feba3426552ac7f1529:member",
			"@type": "pro:PublishingRole",
			"dct:identifier": 4,
			"dct:title": "com:99916f6f9a2c4feba3426552ac7f1529:member"
		},
		"dct:title": "BBMRI",
		"b2:restricted_submission": false,
		"b2:AdminRole": {
			"dct:description": "Admin role of the community \"BBMRI\"",
			"@id": "com:99916f6f9a2c4feba3426552ac7f1529:admin",
			"@type": "pro:PublishingRole",
			"dct:identifier": 3,
			"dct:title": "com:99916f6f9a2c4feba3426552ac7f1529:admin"
		},
		"dct:issued": "Wed, 21 Dec 2016 08:57:40 GMT"
	},
	{
		"@id": "https://trng-b2share.eudat.eu/api/communities/0afede87-2bf2-4d89-867e-d2ee57251c62",
		"foaf:logo": "/img/communities/clarin.png",
		"dct:description": "Linguistic data",
		"@context": {
			"b2": "https://b2share.eudat.eu/ontology/b2share/",
			"foaf": "http://xmlns.com/foaf/",
			"dcat": "http://www.w3.org/ns/dcat#",
			"rdfs": "http://www.w3.org/2000/01/rdf-schema#",
			"rdf": "http://www.w3.org/1999/02/22-rdf-syntax-ns#",
			"lang": "http://id.loc.gov/vocabulary/iso639-1/",
			"fdp": "http://rdf.biosemantics.org/ontologies/fdp-o#",
			"owl": "http://www.w3.org/2002/07/owl#",
			"dct": "http://purl.org/dc/terms/",
			"xsd": "http://www.w3.org/2001/XMLSchema#",
			"pro": "http://purl.org/spar/pro/"
		},
		"b2:publication_workflow": "direct_publish",
		"dct:identifier": "0afede87-2bf2-4d89-867e-d2ee57251c62",
		"dct:modified": "Wed, 21 Dec 2016 08:57:40 GMT",
		"@type": "dcat:Catalog",
		"b2:MemberRole": {
			"dct:description": "Member role of the community \"CLARIN\"",
			"@id": "com:0afede872bf24d89867ed2ee57251c62:member",
			"@type": "pro:PublishingRole",
			"dct:identifier": 6,
			"dct:title": "com:0afede872bf24d89867ed2ee57251c62:member"
		},
		"dct:title": "CLARIN",
		"b2:restricted_submission": false,
		"b2:AdminRole": {
			"dct:description": "Admin role of the community \"CLARIN\"",
			"@id": "com:0afede872bf24d89867ed2ee57251c62:admin",
			"@type": "pro:PublishingRole",
			"dct:identifier": 5,
			"dct:title": "com:0afede872bf24d89867ed2ee57251c62:admin"
		},
		"dct:issued": "Wed, 21 Dec 2016 08:57:40 GMT"
	},

	(...)

]

https://github.com/jonimoreira/B2SHARE-FAIR/blob/master/src/fair/tests/data/fdp_l2_catalogs.json

3. Catalog (level 2): catalog http://localhost:8000/catalogs/c4234f93-da96-4d2f-a2c8-fa83d0775212

{
	"@id": "https://trng-b2share.eudat.eu/api/communities/c4234f93-da96-4d2f-a2c8-fa83d0775212",
	"foaf:logo": "/img/communities/aalto.jpg",
	"dct:description": "Aalto University",
	"@context": {
		"b2": "https://b2share.eudat.eu/ontology/b2share/",
		"foaf": "http://xmlns.com/foaf/",
		"dcat": "http://www.w3.org/ns/dcat#",
		"rdfs": "http://www.w3.org/2000/01/rdf-schema#",
		"rdf": "http://www.w3.org/1999/02/22-rdf-syntax-ns#",
		"lang": "http://id.loc.gov/vocabulary/iso639-1/",
		"fdp": "http://rdf.biosemantics.org/ontologies/fdp-o#",
		"owl": "http://www.w3.org/2002/07/owl#",
		"dct": "http://purl.org/dc/terms/",
		"xsd": "http://www.w3.org/2001/XMLSchema#",
		"pro": "http://purl.org/spar/pro/"
	},
	"b2:publication_workflow": "direct_publish",
	"dct:identifier": "c4234f93-da96-4d2f-a2c8-fa83d0775212",
	"dct:modified": "Wed, 21 Dec 2016 08:57:40 GMT",
	"@type": "dcat:Catalog",
	"b2:MemberRole": {
		"dct:description": "Member role of the community \"Aalto\"",
		"@id": "com:c4234f93da964d2fa2c8fa83d0775212:member",
		"@type": "pro:PublishingRole",
		"dct:identifier": 2,
		"dct:title": "com:c4234f93da964d2fa2c8fa83d0775212:member"
	},
	"dct:title": "Aalto",
	"b2:restricted_submission": true,
	"b2:AdminRole": {
		"dct:description": "Admin role of the community \"Aalto\"",
		"@id": "com:c4234f93da964d2fa2c8fa83d0775212:admin",
		"@type": "pro:PublishingRole",
		"dct:identifier": 1,
		"dct:title": "com:c4234f93da964d2fa2c8fa83d0775212:admin"
	},
	"dct:issued": "Wed, 21 Dec 2016 08:57:40 GMT"
}

https://github.com/jonimoreira/B2SHARE-FAIR/blob/master/src/fair/tests/data/fdp_l2_catalog_c4234f93-da96-4d2f-a2c8-fa83d0775212.json

4. Datasets (level 3): datasets with query string http://localhost:8000/datasets/?page=1&q=test&size=10&sort=mostrecent&community:c4234f93-da96-4d2f-a2c8-fa83d0775212

[
	{
		"dct:identifier": "dataRecord",
		"dct:issued": "2018-02-21T09:03:06.085923+00:00",
		"@id": "https://trng-b2share.eudat.eu/api/records/5357770a5412453785fff358596a47c4",
		"dct:modified": "2018-02-21T09:03:06.085932+00:00",
		"b2:hasDistributionsLink": "https://trng-b2share.eudat.eu/api/files/ea76492d-c113-4e15-a232-7415affb9dfc",
		"b2:hasDistributions": [
			{
				"dcat:distribution": "29f558ca-bffa-4be8-bf03-5e0ce405f48d"
			},
			{
				"dcat:distribution": "c07ad375-dee8-4859-86a4-c5ecbb272d00"
			}
		],
		"b2:hasThemes": [],
		"@type": "dcat:Dataset",
		"b2:hasCommunity": "e9b9792e-79fb-4b07-b6b4-b9c2bd06d095",
		"b2:hasDescriptions": [],
		"@context": {
			"lang": "http://id.loc.gov/vocabulary/iso639-1/",
			"rdfs": "http://www.w3.org/2000/01/rdf-schema#",
			"xsd": "http://www.w3.org/2001/XMLSchema#",
			"dct": "http://purl.org/dc/terms/",
			"foaf": "http://xmlns.com/foaf/",
			"fdp": "http://rdf.biosemantics.org/ontologies/fdp-o#",
			"rdf": "http://www.w3.org/1999/02/22-rdf-syntax-ns#",
			"b2": "https://b2share.eudat.eu/ontology/b2share/",
			"owl": "http://www.w3.org/2002/07/owl#",
			"dcat": "http://www.w3.org/ns/dcat/"
		}
	},
	{
		"dct:identifier": "dataRecord",
		"dct:issued": "2018-02-21T09:00:43.616327+00:00",
		"@id": "https://trng-b2share.eudat.eu/api/records/73fdfca7f0fb4257a394e6a5ce1ab553",
		"dct:modified": "2018-02-21T09:00:43.616336+00:00",
		"b2:hasDistributionsLink": "https://trng-b2share.eudat.eu/api/files/be2e5420-204b-4498-bb89-014a19b03a9d",
		"b2:hasDistributions": {
			"dcat:distribution": "c4c701c7-dca9-48be-8e8c-de9813498ba1"
		},
		"b2:hasThemes": [],
		"@type": "dcat:Dataset",
		"b2:hasCommunity": "e9b9792e-79fb-4b07-b6b4-b9c2bd06d095",
		"b2:hasDescriptions": [],
		"@context": {
			"lang": "http://id.loc.gov/vocabulary/iso639-1/",
			"rdfs": "http://www.w3.org/2000/01/rdf-schema#",
			"xsd": "http://www.w3.org/2001/XMLSchema#",
			"dct": "http://purl.org/dc/terms/",
			"foaf": "http://xmlns.com/foaf/",
			"fdp": "http://rdf.biosemantics.org/ontologies/fdp-o#",
			"rdf": "http://www.w3.org/1999/02/22-rdf-syntax-ns#",
			"b2": "https://b2share.eudat.eu/ontology/b2share/",
			"owl": "http://www.w3.org/2002/07/owl#",
			"dcat": "http://www.w3.org/ns/dcat/"
		}
	},
	{
		"dct:modified": "2018-02-20T13:58:27.115806+00:00",
		"b2:hasDistributionsLink": "https://trng-b2share.eudat.eu/api/files/3172cf1b-e4fb-42db-82c4-ea5aa04c84c2",
		"dct:license": "Creative Commons Attribution-NoDerivs (CC-BY-ND)",
		"@id": "https://trng-b2share.eudat.eu/api/records/1de2fed054f14efcbb1a9d68ef4e1878",
		"b2:hasDescriptions": {
			"dct:description": "landcover data for EMEP. Test files."
		},
		"dct:identifier": "dataRecord",
		"dct:issued": "2018-02-20T13:58:27.115798+00:00",
		"@type": "dcat:Dataset",
		"b2:hasThemes": {
			"dcat:theme": "EMEP"
		},
		"b2:hasDistributions": [
			{
				"dcat:distribution": "4665e237-0736-4753-a741-d6021aca646b"
			},
			{
				"dcat:distribution": "c1eaf59c-b6ee-4347-8fd4-d46e0a80b809"
			},
			{
				"dcat:distribution": "881e8d8a-ece5-4d28-bb5a-0f3f28a0e189"
			},
			{
				"dcat:distribution": "5b2bb25f-fe63-4b7e-b687-a1d44e1628d3"
			}
		],
		"b2:hasCommunity": "e9b9792e-79fb-4b07-b6b4-b9c2bd06d095",
		"@context": {
			"lang": "http://id.loc.gov/vocabulary/iso639-1/",
			"rdfs": "http://www.w3.org/2000/01/rdf-schema#",
			"xsd": "http://www.w3.org/2001/XMLSchema#",
			"dct": "http://purl.org/dc/terms/",
			"foaf": "http://xmlns.com/foaf/",
			"fdp": "http://rdf.biosemantics.org/ontologies/fdp-o#",
			"rdf": "http://www.w3.org/1999/02/22-rdf-syntax-ns#",
			"b2": "https://b2share.eudat.eu/ontology/b2share/",
			"owl": "http://www.w3.org/2002/07/owl#",
			"dcat": "http://www.w3.org/ns/dcat/"
		}
	},

	(...)

]

https://github.com/jonimoreira/B2SHARE-FAIR/blob/master/src/fair/tests/data/fdp_l3_datasets_querystring_test_page_1_size_10.json

5. Datasets (level 3): dataset http://localhost:8000/datasets/1de2fed054f14efcbb1a9d68ef4e1878

{
	"dct:modified": "2018-02-20T13:58:27.115806+00:00",
	"b2:hasDistributionsLink": "https://trng-b2share.eudat.eu/api/files/3172cf1b-e4fb-42db-82c4-ea5aa04c84c2",
	"dct:license": "Creative Commons Attribution-NoDerivs (CC-BY-ND)",
	"@id": "https://trng-b2share.eudat.eu/api/records/1de2fed054f14efcbb1a9d68ef4e1878",
	"b2:hasDescriptions": {
		"dct:description": "landcover data for EMEP. Test files."
	},
	"dct:identifier": "dataRecord",
	"dct:issued": "2018-02-20T13:58:27.115798+00:00",
	"@type": "dcat:Dataset",
	"b2:hasThemes": {
		"dcat:theme": "EMEP"
	},
	"b2:hasDistributions": [
		{
			"dcat:distribution": "4665e237-0736-4753-a741-d6021aca646b"
		},
		{
			"dcat:distribution": "c1eaf59c-b6ee-4347-8fd4-d46e0a80b809"
		},
		{
			"dcat:distribution": "881e8d8a-ece5-4d28-bb5a-0f3f28a0e189"
		},
		{
			"dcat:distribution": "5b2bb25f-fe63-4b7e-b687-a1d44e1628d3"
		}
	],
	"b2:hasCommunity": "e9b9792e-79fb-4b07-b6b4-b9c2bd06d095",
	"@context": {
		"lang": "http://id.loc.gov/vocabulary/iso639-1/",
		"rdfs": "http://www.w3.org/2000/01/rdf-schema#",
		"xsd": "http://www.w3.org/2001/XMLSchema#",
		"dct": "http://purl.org/dc/terms/",
		"foaf": "http://xmlns.com/foaf/",
		"fdp": "http://rdf.biosemantics.org/ontologies/fdp-o#",
		"rdf": "http://www.w3.org/1999/02/22-rdf-syntax-ns#",
		"b2": "https://b2share.eudat.eu/ontology/b2share/",
		"owl": "http://www.w3.org/2002/07/owl#",
		"dcat": "http://www.w3.org/ns/dcat/"
	}
}

https://github.com/jonimoreira/B2SHARE-FAIR/blob/master/src/fair/tests/data/fdp_l3_dataset_1de2fed054f14efcbb1a9d68ef4e1878.json

6. Distributions (level 4): distributions (versions of a file) http://localhost:8000/distributions/3172cf1b-e4fb-42db-82c4-ea5aa04c84c2

[
	{
		"dct:issued": "2018-02-20T13:58:27.035754+00:00",
		"@type": "dcat:Distribution",
		"dct:hasVersion": "4665e237-0736-4753-a741-d6021aca646b",
		"dct:title": "emepGLC01.nc",
		"dct:modified": "2018-02-20T13:58:27.040867+00:00",
		"@id": "https://trng-b2share.eudat.eu/api/files/3172cf1b-e4fb-42db-82c4-ea5aa04c84c2/emepGLC01.nc",
		"dct:versionOf": "https://trng-b2share.eudat.eu/api/files/3172cf1b-e4fb-42db-82c4-ea5aa04c84c2",
		"@context": {
			"lang": "http://id.loc.gov/vocabulary/iso639-1/",
			"rdfs": "http://www.w3.org/2000/01/rdf-schema#",
			"xsd": "http://www.w3.org/2001/XMLSchema#",
			"dct": "http://purl.org/dc/terms/",
			"foaf": "http://xmlns.com/foaf/",
			"fdp": "http://rdf.biosemantics.org/ontologies/fdp-o#",
			"rdf": "http://www.w3.org/1999/02/22-rdf-syntax-ns#",
			"b2": "https://b2share.eudat.eu/ontology/b2share/",
			"owl": "http://www.w3.org/2002/07/owl#",
			"dcat": "http://www.w3.org/ns/dcat#"
		}
	},
	{
		"dct:issued": "2018-02-20T13:58:27.047180+00:00",
		"@type": "dcat:Distribution",
		"dct:hasVersion": "c1eaf59c-b6ee-4347-8fd4-d46e0a80b809",
		"dct:title": "glc2000xCLMf18.nc",
		"dct:modified": "2018-02-20T13:58:27.052225+00:00",
		"@id": "https://trng-b2share.eudat.eu/api/files/3172cf1b-e4fb-42db-82c4-ea5aa04c84c2/glc2000xCLMf18.nc",
		"dct:versionOf": "https://trng-b2share.eudat.eu/api/files/3172cf1b-e4fb-42db-82c4-ea5aa04c84c2",
		"@context": {
			"lang": "http://id.loc.gov/vocabulary/iso639-1/",
			"rdfs": "http://www.w3.org/2000/01/rdf-schema#",
			"xsd": "http://www.w3.org/2001/XMLSchema#",
			"dct": "http://purl.org/dc/terms/",
			"foaf": "http://xmlns.com/foaf/",
			"fdp": "http://rdf.biosemantics.org/ontologies/fdp-o#",
			"rdf": "http://www.w3.org/1999/02/22-rdf-syntax-ns#",
			"b2": "https://b2share.eudat.eu/ontology/b2share/",
			"owl": "http://www.w3.org/2002/07/owl#",
			"dcat": "http://www.w3.org/ns/dcat#"
		}
	},
	{
		"dct:issued": "2018-02-20T13:58:27.058480+00:00",
		"@type": "dcat:Distribution",
		"dct:hasVersion": "881e8d8a-ece5-4d28-bb5a-0f3f28a0e189",
		"dct:title": "glcSimple01degF18.nc",
		"dct:modified": "2018-02-20T13:58:27.063534+00:00",
		"@id": "https://trng-b2share.eudat.eu/api/files/3172cf1b-e4fb-42db-82c4-ea5aa04c84c2/glcSimple01degF18.nc",
		"dct:versionOf": "https://trng-b2share.eudat.eu/api/files/3172cf1b-e4fb-42db-82c4-ea5aa04c84c2",
		"@context": {
			"lang": "http://id.loc.gov/vocabulary/iso639-1/",
			"rdfs": "http://www.w3.org/2000/01/rdf-schema#",
			"xsd": "http://www.w3.org/2001/XMLSchema#",
			"dct": "http://purl.org/dc/terms/",
			"foaf": "http://xmlns.com/foaf/",
			"fdp": "http://rdf.biosemantics.org/ontologies/fdp-o#",
			"rdf": "http://www.w3.org/1999/02/22-rdf-syntax-ns#",
			"b2": "https://b2share.eudat.eu/ontology/b2share/",
			"owl": "http://www.w3.org/2002/07/owl#",
			"dcat": "http://www.w3.org/ns/dcat#"
		}
	},
	{
		"dct:issued": "2018-02-20T13:58:27.069858+00:00",
		"@type": "dcat:Distribution",
		"dct:hasVersion": "5b2bb25f-fe63-4b7e-b687-a1d44e1628d3",
		"dct:title": "NCDUMP.emepGLC01",
		"dct:modified": "2018-02-20T13:58:27.074953+00:00",
		"@id": "https://trng-b2share.eudat.eu/api/files/3172cf1b-e4fb-42db-82c4-ea5aa04c84c2/NCDUMP.emepGLC01",
		"dct:versionOf": "https://trng-b2share.eudat.eu/api/files/3172cf1b-e4fb-42db-82c4-ea5aa04c84c2",
		"@context": {
			"lang": "http://id.loc.gov/vocabulary/iso639-1/",
			"rdfs": "http://www.w3.org/2000/01/rdf-schema#",
			"xsd": "http://www.w3.org/2001/XMLSchema#",
			"dct": "http://purl.org/dc/terms/",
			"foaf": "http://xmlns.com/foaf/",
			"fdp": "http://rdf.biosemantics.org/ontologies/fdp-o#",
			"rdf": "http://www.w3.org/1999/02/22-rdf-syntax-ns#",
			"b2": "https://b2share.eudat.eu/ontology/b2share/",
			"owl": "http://www.w3.org/2002/07/owl#",
			"dcat": "http://www.w3.org/ns/dcat#"
		}
	}
]

https://github.com/jonimoreira/B2SHARE-FAIR/blob/master/src/fair/tests/data/fdp_l4_distributions_3172cf1b-e4fb-42db-82c4-ea5aa04c84c2.json

Performance analysis:

Performance analysis of the total transaction time to access a resource, comparing the method call GET: **(A) B2SHARE-FAIR REST API resource x (B) BS2SHARE REST API resource (equivalent/mapped) **

  1. (A) '/fdp/' ==> (B) '/api/' :B2SHARE-FAIR/src/proxy/tests/test_webapp.py
  2. (A) '/catalogs/' ==> (B) '/communities/' : B2SHARE-FAIR/src/proxy/tests/test_communities.py
  3. (A) '/catalogs/{_id}' ==> (B) '/communities/{_id}' : B2SHARE-FAIR/src/proxy/tests/test_communities.py
  4. (A) '/datasets/' ==> (B) '/records/' : B2SHARE-FAIR/src/proxy/tests/test_records.py
  5. (A) '/datasets/{_id}' ==> (B) '/records/{_id}' : B2SHARE-FAIR/src/proxy/tests/test_records.py
  6. (A) '/datasets/{_qs}' ==> (B) '/records/{_qs}' : B2SHARE-FAIR/src/proxy/tests/test_records.py
  7. (A) '/distributions/{_id}' ==> (B) '/files/{_id}' : B2SHARE-FAIR/src/proxy/tests/test_files.py
  • Test cases compute the total time to access the resource 1000 times (consecutively):
  1. One test case executing 10 calls for {10, 20, 30, 40, 50, 100} times.
  2. One test case executing 10 calls for {10, 20, 30, 40, 50, 100} times.
  3. Five test cases executing 10 calls for {10, 50, 100}: five communities pre-selected (list)
  4. One test case executing 10 calls for {10, 20, 30, 40, 50, 100} times.
  5. Ten test cases: ten records pre-selected, varying the number of files and metadata descriptions
  6. Two test cases: two query strings pre-selected
  7. Ten test cases: ten files (buckets) pre-selected, varying the number of contents (versions)
  • Test data (input):
  1. NA
  2. NA
  3. IDs: {c4234f93-da96-4d2f-a2c8-fa83d0775212, 99916f6f-9a2c-4feb-a342-6552ac7f1529, 0afede87-2bf2-4d89-867e-d2ee57251c62, 94a9567e-2fba-4677-8fde-a8b68bdb63e8, b344f92a-cd0e-4e4c-aa09-28b5f95f7e41 }
  4. NA
  5. IDs: {7547be3d2e93445783c4d343e6cdd1c0, a11736ab1b174028a1bbedea63e84411, ea735c4786f24ad4974fd7a58a7edc41, 3cb79e246ee34b3e9faaa3408feaf89e, 277e0971184242b1a80f4182e2f18aca, b2246d077d3e4d9396a47393eb3ff952, ad7cb0926f234428a850164e569e8162, d3f5b834ce404c2db22e071f2a2b7c77, 7ab78a953116446a9a18d45f42ba86ef, 79e55266573546238e4c80e5233c2f68 }
  6. QSs: { "?page=2&size=10&sort=mostrecent&q=test", "?page=1&size=10&sort=mostrecent&q=community:99916f6f-9a2c-4feb-a342-6552ac7f1529" }
  7. ID: { "88699ea0-e199-43f7-8a16-d311ecfa02e1", "5c11832e-444d-4740-8bdc-1fb55d12eeef", "25486e34-4f9c-4605-b0a5-f5f7e48d11b2", "c89a695c-f4c7-4ee5-a4b0-eda2f79dbdd9", "940fa97e-9a79-4ec0-9327-8f6b0b504b41", "eb6ebb0f-6b33-4972-87dd-78e6e281d3b9", "f91a4583-6f7e-4e6a-9bde-c75a635a4cef", "9bd0a681-d93f-46f9-8b37-c67e6edee571", "2d3af417-0de0-4b88-86d9-320b2084a945", "d5001514-5f6f-47f5-8ec2-5ed8c3629b7f" }

Results - total transaction time of FDP overhead:

Test environment (I)

  1. Min: 7.34% | Max: 11.71%
  2. Min: 35.61% | Max: 42.35%
  3. Average: (A) XXms ==> (B) YYms, difference =
  4. (A) XXms ==> (B) YYms, difference =
  5. Average: (A) XXms ==> (B) YYms, difference =
  6. Average: (A) XXms ==> (B) YYms, difference =
  7. Average: (A) XXms ==> (B) YYms, difference =

<add bar graph: x = #test case; y = XXms; bar 1 (A), bar 2 (B)

Obs.: a threat to this validation observed in this scenario, which probably affected the results (for worst), is that both B2SHARE-FAIR REST API and the tests scripts were executed together in the same environment (VM described above).

Conclusions and outcomes

The POC validated the efficacy of the approach, demonstrating how to enable B2SHARE data repository to be compliant with the FDP specification. Each requirement was addressed:

R1. Develop “non-intrusive” solution (decoupled to B2SHARE)

No changes in B2SHARE code and/or data insertion were necessary. The REST APIs accomplished their role on enabling the integration of B2SHARE with B2SHARE-FAIR in a decoupled manner, following SOA 2.0 principles.

R2. Improve semantic interoperability of B2SHARE

The POC showed how data stored in B2SHARE can be annotated with metadata from diverse ontologies guided by the FDP specification and the creation of the B2SHARE ontology. Furthermore, the implementation of the REST API providing data following the JSON-LD syntax could enable the serialization of these ontologies (RDF).

**R3. Identify mappings between B2SHARE and FDP **

The solution was inspired in the approaches of MDE transformation and semantic translation, which enable mappings between data representations. Therefore, prior knowledge in these researches supported the identification of mappings between B2SHARE REST API and FDP.

Among the open issues discovered as outcome of this POC, we highlight: *(1) Adopt a methodology to create the translations for the root and community schemas, i.e. for each metadata of the record level (FDP dataset): *(a) Check FDP specification for an equivalent metadata; *(b) If not found in the FDP spec, search standardized ontologies presenting equivalent metadata and chose one; *(c) If not found, then create/extend an ontology with the metadata; *(d) Validate the equivalence translation

Example: in the message below, what would be an equivalent metadata for "open_access"?

 "metadata": {
      "$schema": "https://trng-b2share.eudat.eu/api/communities/e9b9792e-79fb-4b07-b6b4-b9c2bd06d095/schemas/0#/json_schema", 
      "DOI": "http://doi.org/XXXX/b2share.09f6c89f9af74bb5a5be9f75c35a3d63", 
      "community": "e9b9792e-79fb-4b07-b6b4-b9c2bd06d095", 
      "community_specific": {}, 
      "creators": [
        {
          "creator_name": "np"
        }
      ], 
      "descriptions": [
        {
          "description": "a description ", 
          "description_type": "Other"
        }
      ], 
      "ePIC_PID": "http://hdl.handle.net/0000/09f6c89f9af74bb5a5be9f75c35a3d63", 
      "language": "en", 
      "open_access": true, 
      "owners": [
        32
      ], 
      "publication_state": "published", 
      "publisher": "http://dendro.fe.up.pt", 
      "titles": [
        {
          "title": "this is another test project"
        }
      ]
    }, 
    "updated": "2018-01-23T15:19:58.378657+00:00"

Obs.1: Ontology-driven conceptual modeling can play a major role in this methodology, especially to support (c). Obs.2: The process to change root (generic) and community (custom) metadata schemas in B2SHARE should be aligned with this methodology. Ideally, this process should be merged with this methodology.

  • Discuss an approach to include FDP level 5

  • Validate this POC: (a) Insert diverse types of data from distinct domains; (b) Execute the metrics evaluation; (c) Validate accessibility of sensitive data ("A" from FAIR), e.g. patient data;

  • Discuss if the Publishing Roles Ontology (PRO) [4] should play a role for the accessibility definition.

  • Apply a caching approach on the proxy side to speed up the solution

Main EUDAT resources

References

[1] https://eudat.eu/eudat-cdi

[2] https://www.eudat.eu/services/b2share

[3] https://github.com/DTL-FAIRData/FAIRDataPoint/wiki/FAIR-Data-Point-Specification

[4] https://ontouml.org/

[5] https://B2SHARE.eudat.eu/help/api

[6] Ganzha, M., et al. Streaming semantic translations. in 2017 21st International Conference on System Theory, Control and Computing (ICSTCC). 2017.

[7] https://github.com/digitalbazaar/pyld

[8] https://sparontologies.github.io/pro/current/pro.html

Clone this wiki locally