Skip to content
João Moreira edited this page Feb 24, 2018 · 56 revisions

Enabling B2SHARE to behave as a FAIR Data Point: proof-of-concept

Project scope

This Figure illustrates the simplified lifecycle of EUDAT [1] data management approach supported by the FAIR data principles. The highlighted parts limit the scope of this project for B2SHARE service [2], fitting into:

  • Preserving data with metadata for interoperability and reuse; and
  • Re-using data with metadata for findability.

Scope

Project goal

Design problem: how to improve the “FAIRness” of B2SHARE?

How to make B2SHARE compliant with the FAIR data principles?

The goal of B2SHARE-FAIR project is to develop a POC demonstrating how to enable EUDAT B2SHARE to be compliant with the FAIR Data Point (FDP) specifications [3], i.e. how to enable B2SHARE to capture metadata as FDP as a proxy of B2SHARE API interfaces. FDP metadata specification consists of 5 layers of metadata:

  • L1. Metadata about the data repository
  • L2. Metadata about the catalog/groups of datasets
  • L3. Metadata about the datasets
  • L4. Metadata about the dataset’s distributions
  • L5. Metadata about the data record (structure and content, e.g. domain, range of the values, relations, types)

Obs.: "L5" is OUT OF THE SCOPE of this project due to its complexity.

The content of these metadata needs to be stored and, once having the metadata stored, the metadata needs to be exposed also according to FDP specification.

Requirements

R1. Develop “non-intrusive” solution (decoupled to B2SHARE): use B2SHARE REST API to access data.

R2. Improve semantic interoperability of B2SHARE: enable B2SHARE REST API to provide semantic data (annotated).

R3. Identify mappings between B2SHARE and FDP: align B2SHARE terminology with FDP metadata layers.

An ontological analysis of B2SHARE service was performed and the main parts of the conceptual model are illustrated below:

  • B2SHARE information system: B2SHARE information system

  • B2SHARE entities: B2SHARE entities ("concepts")

Obs.: The ontological language OntoUML [4], the language of the Unified Foundational Ontology (UFO), was used.

Basic concepts (from [5]):

  1. (Scientific) community: roles of creating and maintaining metadata schemas and curating the datasets which are part of a scientific domain or research project. B2SHARE users can be part of one or more communities.
  2. Community administrator: selected member of a community with grants needed for the metadata schema definitions and record curation tasks.
  3. Record: Any user can upload scientific datasets into B2SHARE within a data record. A record is comprised of data files and associated metadata schema. A record is always connected to one scientific community which has the role of curating and maintaining it.
  4. Metadata schema: a set of record metadata fields and their constraints/rules/formats. A record contains a set of common metadata fields and a set of custom metadata blocks. This metadata is not free form, however, but is governed by static schemas; the common metadata schema is set by B2SHARE and defines a superset of Dublin Core elements, while the schema for the custom metadata block is specific to each community and can be customized by the community administrators. The schemas are formally defined in the JSON Schema format. A special HTTP API call is available for retrieving the JSON Schema of a record in a specific community. In order to be accepted, the records submitted to a community must conform to the schema required by the community.
  5. Root (generic) metadata schema: a metadata schema applied to any record, i.e. common fixed metadata fields for all communities.
  6. Community (custom) metadata schema: custom metadata blocks, each block containing related metadata fields.
  7. Record states: a data record can exist in several states.
  8. Record 'draft' state: Immediately after creation a record enters the draft state. In this state the record is only accessible by its owner and can be freely modified: its metadata can be changed and files can be uploaded into or removed from it.
  9. Record 'published' state: a draft can be published at any time, and through this action it changes its state from 'draft' to 'published', is assigned Persistent Identifiers (PID), and becomes publicly accessible. The list of files in a published record cannot be changed.

To achieve the aforementioned goal, it is required to extend B2SHARE service, accommodating the 4 layers of FDP metadata:

  • L1. Metadata about the data repository: the data repository should be “fixed” as B2SHARE (EUDAT) website, thus, the fields should be incorporated for an unique instance, being configurable by the administrator, thus, this metadata can be stored in a configuration file.
  • L2. Metadata about the catalog/groups of datasets: the catalog of datasets can be mapped to the communities of B2SHARE. Therefore, the CRUD operations of these metadata must use the same approach of the CRUD of communities (B2SHARE).
  • L3. Metadata about the datasets: the dataset concept from FDP can be mapped to the records registered within B2SHARE. Therefore, the CRUD operations of these metadata must use the same approach of the CRUD of records (B2SHARE).
  • L4. Metadata about the dataset’s distributions: different concrete serialization for the same datasets, e.g., XML, CSV, Excel, relational database, etc. In B2SHARE these metadata is reflected by the files uploaded within a record. Therefore, the CRUD operations of these metadata must use the same approach of the CRUD of files (B2SHARE).

Obs.: The analysis was performed from Jun/2017 to Aug/2017, which included the system analysis, detailing of requirements, scope and study of the solution considering the available resources.

Solution architecture

The solution is inspired in the research on model-driven engineering (MDE) and semantic translations. MDE transformations are illustrated in the figure below, from Brambilla book.

B2SHARE entities ("concepts")

Semantic translation: “process of changing the underlying semantics of a piece of knowledge. Given some information described semantically, in terms of a source ontology, it is transformed into information described in terms of a target ontology” [6]

Although B2SHARE does not provide an ontology, we consider its metadata model as a "piece of knowledge". Therefore, the solution is the implementation of translations from B2SHARE REST API resources (source) to FDP levels (target) to be executed at runtime as a proxy of B2SHARE REST API. The communication diagram illustrating the translation of the third level of FDP, the data repository itself. The user accesses B2SHARE-FAIR REST API, which makes a synchronous call to B2SHARE REST API and transforms the data received (as JSON) to JSON-LD, providing data semantically enriched.

B2SHARE entities ("concepts")

The solution is implemented through GET endpoints, illustrated in figure below.

  • FDP level 1 (data repository) is implemented through /fdp/, which makes synchronous call to /api/ resource.
  • FDP level 2 (catalog) is implemented through /catalogs/, which makes synchronous call to /communities/ resource.
  • FDP level 3 (dataset) is implemented through /datasets/, which makes synchronous call to /records/ resource.
  • FDP level 4 (distribution) is implemented through /distributions/, which makes synchronous call to /files/ resource.

B2SHARE entities ("concepts")

A methodology based on MDE is adopted to design the translations as a set of mappings. The image below shows the design of the mappings of FDP level 2 (catalog), where each metadata field of the source is analyzed and an is used to find an equivalent ontology property, either in FDP specifications or other standardized ontologies or a proprietary well founded ontology.

B2SHARE entities ("concepts")

The implementation technology for the POC was chosen according to prior knowledge in libraries for JSON and JSON-LD from different languages and the underlying technology of B2SHARE REST API. Python 3 and PyLD [7] were chosen for this POC.

Code structure

  • B2SHARE-FAIR/src/fair/ contains the models that represent input data and the translation functions.
  • B2SHARE-FAIR/src/mapper/ contains the mapper from JSON to base models.
  • B2SHARE-FAIR/src/proxy/ contains the application endpoints and resources.
  • B2SHARE-FAIR/ontologies/ contains the B2SHARE ontology developed with the predicates (properties) that we considered specific of B2SHARE service.

The /fair/, /mapper/, /proxy/ are the required sources to execute the POC. Each of these has a /tests/ sub-folder that can be executed through pytest command.

Obs.: the translations serializing the data in JSON-LD are implemented in B2SHARE-FAIR/src/fair/translators.py

Deployment

This solution was developed and tested in a server Ubuntu 16 and Python 3.5.2. The requirements for deployment are in: https://github.com/jonimoreira/B2SHARE-FAIR/blob/master/requirements.txt

To start the API: gunicorn proxy.app:api

Validation

The functional validation was done by making a call to each endpoint considering data test from the B2SHARE test environment (data available on 23-02-2018): https://trng-b2share.eudat.eu/api/

Results of B2SHARE-FAIR REST API functional tests:

1. FDP (level 1): data repository http://localhost:8000/fdp/

{
	"r3d:startDate": "01/01/2016",
	"b2:b2note_url": "https://b2note.bsc.es/interface_main.html",
	"@context": {
		"r3d": "http://www.re3data.org/schema/3-0/",
		"foaf": "http://xmlns.com/foaf/",
		"dcat": "http://www.w3.org/ns/dcat#",
		"rdfs": "http://www.w3.org/2000/01/rdf-schema/",
		"rdf": "http://www.w3.org/1999/02/22-rdf-syntax-ns#",
		"lang": "http://id.loc.gov/vocabulary/iso639-1/",
		"fdp": "http://rdf.biosemantics.org/ontologies/fdp-o/",
		"owl": "http://www.w3.org/2002/07/owl#",
		"b2": "https://b2share.eudat.eu/ontology/b2share/",
		"xsd": "http://www.w3.org/2001/XMLSchema#",
		"dct": "http://purl.org/dc/terms/"
	},
	"@type": "r3d:Repository",
	"r3d:repositoryIdentifier": "https://trng-b2share.eudat.eu/fdp-repositoryID",
	"dct:publisher": "SURFsara",
	"b2:b2access_registration_link": "https://b2access.eudat.eu/",
	"b2:site_function": "trng",
	"b2:terms_of_use_link": "http://hdl.handle.net/11304/e43b2e3f-83c5-4e3f-b8b7-18d38d37a6cd",
	"@id": "https://trng-b2share.eudat.eu/",
	"dct:hasVersion": "2.1.1",
	"r3d:institution": "SURFsara",
	"r3d:institutionCountry": "The Netherlands",
	"dct:description": "The EUDAT B2SHARE data repository as a web application",
	"fdp:metadataIssued": "01/01/2016",
	"b2:training_site_link": "",
	"dct:identifier": "https://trng-b2share.eudat.eu/",
	"fdp:metadataModified": "23/02/2018",
	"fdp:metadataIdentifier": "https://trng-b2share.eudat.eu/fdp-metadataID",
	"dct:title": "EUDAT B2SHARE data repository",
	"r3d:lastUpdate": "23/02/2018",
	"rdfs:label": "EUDAT B2SHARE data repository"
}

https://github.com/jonimoreira/B2SHARE-FAIR/blob/master/src/fair/tests/data/fdp_l1_fdp.json

2. Catalogs (level 2): catalogs http://localhost:8000/catalogs/

[
	{
		"@id": "https://trng-b2share.eudat.eu/api/communities/c4234f93-da96-4d2f-a2c8-fa83d0775212",
		"foaf:logo": "/img/communities/aalto.jpg",
		"dct:description": "Aalto University",
		"@context": {
			"b2": "https://b2share.eudat.eu/ontology/b2share/",
			"foaf": "http://xmlns.com/foaf/",
			"dcat": "http://www.w3.org/ns/dcat#",
			"rdfs": "http://www.w3.org/2000/01/rdf-schema#",
			"rdf": "http://www.w3.org/1999/02/22-rdf-syntax-ns#",
			"lang": "http://id.loc.gov/vocabulary/iso639-1/",
			"fdp": "http://rdf.biosemantics.org/ontologies/fdp-o#",
			"owl": "http://www.w3.org/2002/07/owl#",
			"dct": "http://purl.org/dc/terms/",
			"xsd": "http://www.w3.org/2001/XMLSchema#",
			"pro": "http://purl.org/spar/pro/"
		},
		"b2:publication_workflow": "direct_publish",
		"dct:identifier": "c4234f93-da96-4d2f-a2c8-fa83d0775212",
		"dct:modified": "Wed, 21 Dec 2016 08:57:40 GMT",
		"@type": "dcat:Catalog",
		"b2:MemberRole": {
			"dct:description": "Member role of the community \"Aalto\"",
			"@id": "com:c4234f93da964d2fa2c8fa83d0775212:member",
			"@type": "pro:PublishingRole",
			"dct:identifier": 2,
			"dct:title": "com:c4234f93da964d2fa2c8fa83d0775212:member"
		},
		"dct:title": "Aalto",
		"b2:restricted_submission": true,
		"b2:AdminRole": {
			"dct:description": "Admin role of the community \"Aalto\"",
			"@id": "com:c4234f93da964d2fa2c8fa83d0775212:admin",
			"@type": "pro:PublishingRole",
			"dct:identifier": 1,
			"dct:title": "com:c4234f93da964d2fa2c8fa83d0775212:admin"
		},
		"dct:issued": "Wed, 21 Dec 2016 08:57:40 GMT"
	},
	{
		"@id": "https://trng-b2share.eudat.eu/api/communities/99916f6f-9a2c-4feb-a342-6552ac7f1529",
		"foaf:logo": "/img/communities/bbmri.png",
		"dct:description": "Biomedical Research.",
		"@context": {
			"b2": "https://b2share.eudat.eu/ontology/b2share/",
			"foaf": "http://xmlns.com/foaf/",
			"dcat": "http://www.w3.org/ns/dcat#",
			"rdfs": "http://www.w3.org/2000/01/rdf-schema#",
			"rdf": "http://www.w3.org/1999/02/22-rdf-syntax-ns#",
			"lang": "http://id.loc.gov/vocabulary/iso639-1/",
			"fdp": "http://rdf.biosemantics.org/ontologies/fdp-o#",
			"owl": "http://www.w3.org/2002/07/owl#",
			"dct": "http://purl.org/dc/terms/",
			"xsd": "http://www.w3.org/2001/XMLSchema#",
			"pro": "http://purl.org/spar/pro/"
		},
		"b2:publication_workflow": "direct_publish",
		"dct:identifier": "99916f6f-9a2c-4feb-a342-6552ac7f1529",
		"dct:modified": "Wed, 21 Dec 2016 08:57:40 GMT",
		"@type": "dcat:Catalog",
		"b2:MemberRole": {
			"dct:description": "Member role of the community \"BBMRI\"",
			"@id": "com:99916f6f9a2c4feba3426552ac7f1529:member",
			"@type": "pro:PublishingRole",
			"dct:identifier": 4,
			"dct:title": "com:99916f6f9a2c4feba3426552ac7f1529:member"
		},
		"dct:title": "BBMRI",
		"b2:restricted_submission": false,
		"b2:AdminRole": {
			"dct:description": "Admin role of the community \"BBMRI\"",
			"@id": "com:99916f6f9a2c4feba3426552ac7f1529:admin",
			"@type": "pro:PublishingRole",
			"dct:identifier": 3,
			"dct:title": "com:99916f6f9a2c4feba3426552ac7f1529:admin"
		},
		"dct:issued": "Wed, 21 Dec 2016 08:57:40 GMT"
	},
	{
		"@id": "https://trng-b2share.eudat.eu/api/communities/0afede87-2bf2-4d89-867e-d2ee57251c62",
		"foaf:logo": "/img/communities/clarin.png",
		"dct:description": "Linguistic data",
		"@context": {
			"b2": "https://b2share.eudat.eu/ontology/b2share/",
			"foaf": "http://xmlns.com/foaf/",
			"dcat": "http://www.w3.org/ns/dcat#",
			"rdfs": "http://www.w3.org/2000/01/rdf-schema#",
			"rdf": "http://www.w3.org/1999/02/22-rdf-syntax-ns#",
			"lang": "http://id.loc.gov/vocabulary/iso639-1/",
			"fdp": "http://rdf.biosemantics.org/ontologies/fdp-o#",
			"owl": "http://www.w3.org/2002/07/owl#",
			"dct": "http://purl.org/dc/terms/",
			"xsd": "http://www.w3.org/2001/XMLSchema#",
			"pro": "http://purl.org/spar/pro/"
		},
		"b2:publication_workflow": "direct_publish",
		"dct:identifier": "0afede87-2bf2-4d89-867e-d2ee57251c62",
		"dct:modified": "Wed, 21 Dec 2016 08:57:40 GMT",
		"@type": "dcat:Catalog",
		"b2:MemberRole": {
			"dct:description": "Member role of the community \"CLARIN\"",
			"@id": "com:0afede872bf24d89867ed2ee57251c62:member",
			"@type": "pro:PublishingRole",
			"dct:identifier": 6,
			"dct:title": "com:0afede872bf24d89867ed2ee57251c62:member"
		},
		"dct:title": "CLARIN",
		"b2:restricted_submission": false,
		"b2:AdminRole": {
			"dct:description": "Admin role of the community \"CLARIN\"",
			"@id": "com:0afede872bf24d89867ed2ee57251c62:admin",
			"@type": "pro:PublishingRole",
			"dct:identifier": 5,
			"dct:title": "com:0afede872bf24d89867ed2ee57251c62:admin"
		},
		"dct:issued": "Wed, 21 Dec 2016 08:57:40 GMT"
	},

	(...)

]

https://github.com/jonimoreira/B2SHARE-FAIR/blob/master/src/fair/tests/data/fdp_l2_catalogs.json

3. Catalog (level 2): catalog http://localhost:8000/catalogs/c4234f93-da96-4d2f-a2c8-fa83d0775212

{
	"@id": "https://trng-b2share.eudat.eu/api/communities/c4234f93-da96-4d2f-a2c8-fa83d0775212",
	"foaf:logo": "/img/communities/aalto.jpg",
	"dct:description": "Aalto University",
	"@context": {
		"b2": "https://b2share.eudat.eu/ontology/b2share/",
		"foaf": "http://xmlns.com/foaf/",
		"dcat": "http://www.w3.org/ns/dcat#",
		"rdfs": "http://www.w3.org/2000/01/rdf-schema#",
		"rdf": "http://www.w3.org/1999/02/22-rdf-syntax-ns#",
		"lang": "http://id.loc.gov/vocabulary/iso639-1/",
		"fdp": "http://rdf.biosemantics.org/ontologies/fdp-o#",
		"owl": "http://www.w3.org/2002/07/owl#",
		"dct": "http://purl.org/dc/terms/",
		"xsd": "http://www.w3.org/2001/XMLSchema#",
		"pro": "http://purl.org/spar/pro/"
	},
	"b2:publication_workflow": "direct_publish",
	"dct:identifier": "c4234f93-da96-4d2f-a2c8-fa83d0775212",
	"dct:modified": "Wed, 21 Dec 2016 08:57:40 GMT",
	"@type": "dcat:Catalog",
	"b2:MemberRole": {
		"dct:description": "Member role of the community \"Aalto\"",
		"@id": "com:c4234f93da964d2fa2c8fa83d0775212:member",
		"@type": "pro:PublishingRole",
		"dct:identifier": 2,
		"dct:title": "com:c4234f93da964d2fa2c8fa83d0775212:member"
	},
	"dct:title": "Aalto",
	"b2:restricted_submission": true,
	"b2:AdminRole": {
		"dct:description": "Admin role of the community \"Aalto\"",
		"@id": "com:c4234f93da964d2fa2c8fa83d0775212:admin",
		"@type": "pro:PublishingRole",
		"dct:identifier": 1,
		"dct:title": "com:c4234f93da964d2fa2c8fa83d0775212:admin"
	},
	"dct:issued": "Wed, 21 Dec 2016 08:57:40 GMT"
}

https://github.com/jonimoreira/B2SHARE-FAIR/blob/master/src/fair/tests/data/fdp_l2_catalog_c4234f93-da96-4d2f-a2c8-fa83d0775212.json

4. Datasets (level 3): datasets with query string http://localhost:8000/datasets/?page=1&q=test&size=10&sort=mostrecent&community:c4234f93-da96-4d2f-a2c8-fa83d0775212

[
	{
		"dct:identifier": "dataRecord",
		"dct:issued": "2018-02-21T09:03:06.085923+00:00",
		"@id": "https://trng-b2share.eudat.eu/api/records/5357770a5412453785fff358596a47c4",
		"dct:modified": "2018-02-21T09:03:06.085932+00:00",
		"b2:hasDistributionsLink": "https://trng-b2share.eudat.eu/api/files/ea76492d-c113-4e15-a232-7415affb9dfc",
		"b2:hasDistributions": [
			{
				"dcat:distribution": "29f558ca-bffa-4be8-bf03-5e0ce405f48d"
			},
			{
				"dcat:distribution": "c07ad375-dee8-4859-86a4-c5ecbb272d00"
			}
		],
		"b2:hasThemes": [],
		"@type": "dcat:Dataset",
		"b2:hasCommunity": "e9b9792e-79fb-4b07-b6b4-b9c2bd06d095",
		"b2:hasDescriptions": [],
		"@context": {
			"lang": "http://id.loc.gov/vocabulary/iso639-1/",
			"rdfs": "http://www.w3.org/2000/01/rdf-schema#",
			"xsd": "http://www.w3.org/2001/XMLSchema#",
			"dct": "http://purl.org/dc/terms/",
			"foaf": "http://xmlns.com/foaf/",
			"fdp": "http://rdf.biosemantics.org/ontologies/fdp-o#",
			"rdf": "http://www.w3.org/1999/02/22-rdf-syntax-ns#",
			"b2": "https://b2share.eudat.eu/ontology/b2share/",
			"owl": "http://www.w3.org/2002/07/owl#",
			"dcat": "http://www.w3.org/ns/dcat/"
		}
	},
	{
		"dct:identifier": "dataRecord",
		"dct:issued": "2018-02-21T09:00:43.616327+00:00",
		"@id": "https://trng-b2share.eudat.eu/api/records/73fdfca7f0fb4257a394e6a5ce1ab553",
		"dct:modified": "2018-02-21T09:00:43.616336+00:00",
		"b2:hasDistributionsLink": "https://trng-b2share.eudat.eu/api/files/be2e5420-204b-4498-bb89-014a19b03a9d",
		"b2:hasDistributions": {
			"dcat:distribution": "c4c701c7-dca9-48be-8e8c-de9813498ba1"
		},
		"b2:hasThemes": [],
		"@type": "dcat:Dataset",
		"b2:hasCommunity": "e9b9792e-79fb-4b07-b6b4-b9c2bd06d095",
		"b2:hasDescriptions": [],
		"@context": {
			"lang": "http://id.loc.gov/vocabulary/iso639-1/",
			"rdfs": "http://www.w3.org/2000/01/rdf-schema#",
			"xsd": "http://www.w3.org/2001/XMLSchema#",
			"dct": "http://purl.org/dc/terms/",
			"foaf": "http://xmlns.com/foaf/",
			"fdp": "http://rdf.biosemantics.org/ontologies/fdp-o#",
			"rdf": "http://www.w3.org/1999/02/22-rdf-syntax-ns#",
			"b2": "https://b2share.eudat.eu/ontology/b2share/",
			"owl": "http://www.w3.org/2002/07/owl#",
			"dcat": "http://www.w3.org/ns/dcat/"
		}
	},
	{
		"dct:modified": "2018-02-20T13:58:27.115806+00:00",
		"b2:hasDistributionsLink": "https://trng-b2share.eudat.eu/api/files/3172cf1b-e4fb-42db-82c4-ea5aa04c84c2",
		"dct:license": "Creative Commons Attribution-NoDerivs (CC-BY-ND)",
		"@id": "https://trng-b2share.eudat.eu/api/records/1de2fed054f14efcbb1a9d68ef4e1878",
		"b2:hasDescriptions": {
			"dct:description": "landcover data for EMEP. Test files."
		},
		"dct:identifier": "dataRecord",
		"dct:issued": "2018-02-20T13:58:27.115798+00:00",
		"@type": "dcat:Dataset",
		"b2:hasThemes": {
			"dcat:theme": "EMEP"
		},
		"b2:hasDistributions": [
			{
				"dcat:distribution": "4665e237-0736-4753-a741-d6021aca646b"
			},
			{
				"dcat:distribution": "c1eaf59c-b6ee-4347-8fd4-d46e0a80b809"
			},
			{
				"dcat:distribution": "881e8d8a-ece5-4d28-bb5a-0f3f28a0e189"
			},
			{
				"dcat:distribution": "5b2bb25f-fe63-4b7e-b687-a1d44e1628d3"
			}
		],
		"b2:hasCommunity": "e9b9792e-79fb-4b07-b6b4-b9c2bd06d095",
		"@context": {
			"lang": "http://id.loc.gov/vocabulary/iso639-1/",
			"rdfs": "http://www.w3.org/2000/01/rdf-schema#",
			"xsd": "http://www.w3.org/2001/XMLSchema#",
			"dct": "http://purl.org/dc/terms/",
			"foaf": "http://xmlns.com/foaf/",
			"fdp": "http://rdf.biosemantics.org/ontologies/fdp-o#",
			"rdf": "http://www.w3.org/1999/02/22-rdf-syntax-ns#",
			"b2": "https://b2share.eudat.eu/ontology/b2share/",
			"owl": "http://www.w3.org/2002/07/owl#",
			"dcat": "http://www.w3.org/ns/dcat/"
		}
	},

	(...)

]

https://github.com/jonimoreira/B2SHARE-FAIR/blob/master/src/fair/tests/data/fdp_l3_datasets_querystring_test_page_1_size_10.json

5. Datasets (level 3): dataset http://localhost:8000/datasets/1de2fed054f14efcbb1a9d68ef4e1878

{
	"dct:modified": "2018-02-20T13:58:27.115806+00:00",
	"b2:hasDistributionsLink": "https://trng-b2share.eudat.eu/api/files/3172cf1b-e4fb-42db-82c4-ea5aa04c84c2",
	"dct:license": "Creative Commons Attribution-NoDerivs (CC-BY-ND)",
	"@id": "https://trng-b2share.eudat.eu/api/records/1de2fed054f14efcbb1a9d68ef4e1878",
	"b2:hasDescriptions": {
		"dct:description": "landcover data for EMEP. Test files."
	},
	"dct:identifier": "dataRecord",
	"dct:issued": "2018-02-20T13:58:27.115798+00:00",
	"@type": "dcat:Dataset",
	"b2:hasThemes": {
		"dcat:theme": "EMEP"
	},
	"b2:hasDistributions": [
		{
			"dcat:distribution": "4665e237-0736-4753-a741-d6021aca646b"
		},
		{
			"dcat:distribution": "c1eaf59c-b6ee-4347-8fd4-d46e0a80b809"
		},
		{
			"dcat:distribution": "881e8d8a-ece5-4d28-bb5a-0f3f28a0e189"
		},
		{
			"dcat:distribution": "5b2bb25f-fe63-4b7e-b687-a1d44e1628d3"
		}
	],
	"b2:hasCommunity": "e9b9792e-79fb-4b07-b6b4-b9c2bd06d095",
	"@context": {
		"lang": "http://id.loc.gov/vocabulary/iso639-1/",
		"rdfs": "http://www.w3.org/2000/01/rdf-schema#",
		"xsd": "http://www.w3.org/2001/XMLSchema#",
		"dct": "http://purl.org/dc/terms/",
		"foaf": "http://xmlns.com/foaf/",
		"fdp": "http://rdf.biosemantics.org/ontologies/fdp-o#",
		"rdf": "http://www.w3.org/1999/02/22-rdf-syntax-ns#",
		"b2": "https://b2share.eudat.eu/ontology/b2share/",
		"owl": "http://www.w3.org/2002/07/owl#",
		"dcat": "http://www.w3.org/ns/dcat/"
	}
}

https://github.com/jonimoreira/B2SHARE-FAIR/blob/master/src/fair/tests/data/fdp_l3_dataset_1de2fed054f14efcbb1a9d68ef4e1878.json

6. Distributions (level 4): distributions (versions of a file) http://localhost:8000/distributions/3172cf1b-e4fb-42db-82c4-ea5aa04c84c2

[
	{
		"dct:issued": "2018-02-20T13:58:27.035754+00:00",
		"@type": "dcat:Distribution",
		"dct:hasVersion": "4665e237-0736-4753-a741-d6021aca646b",
		"dct:title": "emepGLC01.nc",
		"dct:modified": "2018-02-20T13:58:27.040867+00:00",
		"@id": "https://trng-b2share.eudat.eu/api/files/3172cf1b-e4fb-42db-82c4-ea5aa04c84c2/emepGLC01.nc",
		"dct:versionOf": "https://trng-b2share.eudat.eu/api/files/3172cf1b-e4fb-42db-82c4-ea5aa04c84c2",
		"@context": {
			"lang": "http://id.loc.gov/vocabulary/iso639-1/",
			"rdfs": "http://www.w3.org/2000/01/rdf-schema#",
			"xsd": "http://www.w3.org/2001/XMLSchema#",
			"dct": "http://purl.org/dc/terms/",
			"foaf": "http://xmlns.com/foaf/",
			"fdp": "http://rdf.biosemantics.org/ontologies/fdp-o#",
			"rdf": "http://www.w3.org/1999/02/22-rdf-syntax-ns#",
			"b2": "https://b2share.eudat.eu/ontology/b2share/",
			"owl": "http://www.w3.org/2002/07/owl#",
			"dcat": "http://www.w3.org/ns/dcat#"
		}
	},
	{
		"dct:issued": "2018-02-20T13:58:27.047180+00:00",
		"@type": "dcat:Distribution",
		"dct:hasVersion": "c1eaf59c-b6ee-4347-8fd4-d46e0a80b809",
		"dct:title": "glc2000xCLMf18.nc",
		"dct:modified": "2018-02-20T13:58:27.052225+00:00",
		"@id": "https://trng-b2share.eudat.eu/api/files/3172cf1b-e4fb-42db-82c4-ea5aa04c84c2/glc2000xCLMf18.nc",
		"dct:versionOf": "https://trng-b2share.eudat.eu/api/files/3172cf1b-e4fb-42db-82c4-ea5aa04c84c2",
		"@context": {
			"lang": "http://id.loc.gov/vocabulary/iso639-1/",
			"rdfs": "http://www.w3.org/2000/01/rdf-schema#",
			"xsd": "http://www.w3.org/2001/XMLSchema#",
			"dct": "http://purl.org/dc/terms/",
			"foaf": "http://xmlns.com/foaf/",
			"fdp": "http://rdf.biosemantics.org/ontologies/fdp-o#",
			"rdf": "http://www.w3.org/1999/02/22-rdf-syntax-ns#",
			"b2": "https://b2share.eudat.eu/ontology/b2share/",
			"owl": "http://www.w3.org/2002/07/owl#",
			"dcat": "http://www.w3.org/ns/dcat#"
		}
	},
	{
		"dct:issued": "2018-02-20T13:58:27.058480+00:00",
		"@type": "dcat:Distribution",
		"dct:hasVersion": "881e8d8a-ece5-4d28-bb5a-0f3f28a0e189",
		"dct:title": "glcSimple01degF18.nc",
		"dct:modified": "2018-02-20T13:58:27.063534+00:00",
		"@id": "https://trng-b2share.eudat.eu/api/files/3172cf1b-e4fb-42db-82c4-ea5aa04c84c2/glcSimple01degF18.nc",
		"dct:versionOf": "https://trng-b2share.eudat.eu/api/files/3172cf1b-e4fb-42db-82c4-ea5aa04c84c2",
		"@context": {
			"lang": "http://id.loc.gov/vocabulary/iso639-1/",
			"rdfs": "http://www.w3.org/2000/01/rdf-schema#",
			"xsd": "http://www.w3.org/2001/XMLSchema#",
			"dct": "http://purl.org/dc/terms/",
			"foaf": "http://xmlns.com/foaf/",
			"fdp": "http://rdf.biosemantics.org/ontologies/fdp-o#",
			"rdf": "http://www.w3.org/1999/02/22-rdf-syntax-ns#",
			"b2": "https://b2share.eudat.eu/ontology/b2share/",
			"owl": "http://www.w3.org/2002/07/owl#",
			"dcat": "http://www.w3.org/ns/dcat#"
		}
	},
	{
		"dct:issued": "2018-02-20T13:58:27.069858+00:00",
		"@type": "dcat:Distribution",
		"dct:hasVersion": "5b2bb25f-fe63-4b7e-b687-a1d44e1628d3",
		"dct:title": "NCDUMP.emepGLC01",
		"dct:modified": "2018-02-20T13:58:27.074953+00:00",
		"@id": "https://trng-b2share.eudat.eu/api/files/3172cf1b-e4fb-42db-82c4-ea5aa04c84c2/NCDUMP.emepGLC01",
		"dct:versionOf": "https://trng-b2share.eudat.eu/api/files/3172cf1b-e4fb-42db-82c4-ea5aa04c84c2",
		"@context": {
			"lang": "http://id.loc.gov/vocabulary/iso639-1/",
			"rdfs": "http://www.w3.org/2000/01/rdf-schema#",
			"xsd": "http://www.w3.org/2001/XMLSchema#",
			"dct": "http://purl.org/dc/terms/",
			"foaf": "http://xmlns.com/foaf/",
			"fdp": "http://rdf.biosemantics.org/ontologies/fdp-o#",
			"rdf": "http://www.w3.org/1999/02/22-rdf-syntax-ns#",
			"b2": "https://b2share.eudat.eu/ontology/b2share/",
			"owl": "http://www.w3.org/2002/07/owl#",
			"dcat": "http://www.w3.org/ns/dcat#"
		}
	}
]

https://github.com/jonimoreira/B2SHARE-FAIR/blob/master/src/fair/tests/data/fdp_l4_distributions_3172cf1b-e4fb-42db-82c4-ea5aa04c84c2.json

Performance analysis:

Performance analysis of the total transaction time to access a resource, comparing the method call GET resource (API):

  • (A) B2SHARE-FAIR REST x (B) BS2SHARE REST (equivalent/mapped)
  1. (A) '/fdp/' ==> (B) '/api/' :B2SHARE-FAIR/src/proxy/tests/test_webapp.py
  2. (A) '/catalogs/' ==> (B) '/communities/' : B2SHARE-FAIR/src/proxy/tests/test_communities.py
  3. (A) '/catalogs/{_id}' ==> (B) '/communities/{_id}' : B2SHARE-FAIR/src/proxy/tests/test_communities.py
  4. (A) '/datasets/' ==> (B) '/records/' : B2SHARE-FAIR/src/proxy/tests/test_records.py
  5. (A) '/datasets/{_id}' ==> (B) '/records/{_id}' : B2SHARE-FAIR/src/proxy/tests/test_records.py
  6. (A) '/datasets/{_qs}' ==> (B) '/records/{_qs}' : B2SHARE-FAIR/src/proxy/tests/test_records.py
  7. (A) '/distributions/{_id}' ==> (B) '/files/{_id}' : B2SHARE-FAIR/src/proxy/tests/test_files.py
  • Test cases compute the total time to access the resource 1000 times (consecutively):
  1. One test case executing 10 calls for {10, 20, 30, 40, 50, 100} times.
  2. One test case executing 10 calls for {10, 20, 30, 40, 50, 100} times.
  3. Five test cases executing 10 calls for {10, 50, 100} times: five communities pre-selected (list)
  4. One test case executing 10 calls for {10, 50, 100} times.
  5. Ten test cases executing 10 calls for {10, 50, 100} times: ten records pre-selected, varying the number of files and metadata descriptions.
  6. Two test cases executing 10 calls for {10, 50, 100} times: two query strings pre-selected
  7. Ten test cases executing 10 calls for {10, 50, 100} times: ten files (buckets) pre-selected, varying the number of contents (versions)
  • Test data (input):
  1. NA
  2. NA
  3. IDs: {c4234f93-da96-4d2f-a2c8-fa83d0775212, 99916f6f-9a2c-4feb-a342-6552ac7f1529, 0afede87-2bf2-4d89-867e-d2ee57251c62, 94a9567e-2fba-4677-8fde-a8b68bdb63e8, b344f92a-cd0e-4e4c-aa09-28b5f95f7e41 }
  4. NA
  5. IDs: {7547be3d2e93445783c4d343e6cdd1c0, a11736ab1b174028a1bbedea63e84411, ea735c4786f24ad4974fd7a58a7edc41, 3cb79e246ee34b3e9faaa3408feaf89e, 277e0971184242b1a80f4182e2f18aca, b2246d077d3e4d9396a47393eb3ff952, ad7cb0926f234428a850164e569e8162, d3f5b834ce404c2db22e071f2a2b7c77, 7ab78a953116446a9a18d45f42ba86ef, 79e55266573546238e4c80e5233c2f68 }
  6. QSs: { "?page=2&size=10&sort=mostrecent&q=test", "?page=1&size=10&sort=mostrecent&q=community:99916f6f-9a2c-4feb-a342-6552ac7f1529" }
  7. ID: { "88699ea0-e199-43f7-8a16-d311ecfa02e1", "5c11832e-444d-4740-8bdc-1fb55d12eeef", "25486e34-4f9c-4605-b0a5-f5f7e48d11b2", "c89a695c-f4c7-4ee5-a4b0-eda2f79dbdd9", "940fa97e-9a79-4ec0-9327-8f6b0b504b41", "eb6ebb0f-6b33-4972-87dd-78e6e281d3b9", "f91a4583-6f7e-4e6a-9bde-c75a635a4cef", "9bd0a681-d93f-46f9-8b37-c67e6edee571", "2d3af417-0de0-4b88-86d9-320b2084a945", "d5001514-5f6f-47f5-8ec2-5ed8c3629b7f" }
  • Running the tests: pytest (/src/). Performance tests have assert False to print the results in the console

Results: FDP overhead (total transaction time):

Test environment (I)

  1. Min: 06.13% | Max: 11.71%
  2. Min: 35.61% | Max: 42.35%
  3. Min: 11.81% | Max: 32.73%
  4. Min: 09.50% | Max: 12.91%
  5. Min: 14.92% | Max: 19.18%
  6. Min: 09.62% | Max: 12.03%
  7. Min: 10.33% | Max: 14.82%

Raw data in: https://github.com/jonimoreira/B2SHARE-FAIR/blob/master/docs/PerformanceAnalysis.xlsx

Obs.: an algorithm of anomalies detection was implemented to remove noisy measurements:

        std_dev = statistics.stdev(results)
        mean = statistics.mean(results)
        anomalies = 0
        filteredResults = []
        for result in results:
            if result > mean + std_dev:
                anomalies += 1
            elif result < mean - std_dev:
                anomalies += 1
            else:
                filteredResults.append(result)

The test rounds had the variables:

  • numCalls= number of synchronous calls
  • numCallsBlocks = each numCalls calls were executed by this number of blocks
  • difference = % of the overhead comparing B2SHARE-FAIR with B2SHARE
  • anomalies = number of anomalies found within numCallsBlocks blocks

Obs.: a threat to this validation observed in this scenario, which probably affected the results (for worst), is that both B2SHARE-FAIR REST API and the tests scripts were executed together in the same environment (VM described above).

Conclusions

The POC showed the efficacy of the approach, demonstrating how to enable B2SHARE data repository to be compliant with the FDP specification. Each requirement was addressed:

R1. Develop “non-intrusive” solution (decoupled to B2SHARE)

No changes in B2SHARE code and/or data insertion were necessary. The REST APIs accomplished their role on enabling the integration of B2SHARE with B2SHARE-FAIR in a decoupled manner, following SOA 2.0 principles.

R2. Improve semantic interoperability of B2SHARE

The POC showed how data stored in B2SHARE can be annotated with metadata from diverse ontologies guided by the FDP specification and the creation of the B2SHARE ontology. Furthermore, the implementation of the REST API providing data following the JSON-LD syntax could enable the serialization of these ontologies (RDF).

R3. Identify mappings between B2SHARE and FDP

The solution was inspired in the approaches of MDE transformation and semantic translation, which enable mappings between data representations. Therefore, prior knowledge in these researches supported the identification of mappings between B2SHARE REST API and FDP.

The POC validation showed that the automatic translations could produce the desired results in terms of compliance with FDP specification. Results show that all metadata used by B2SHARE-FAIR follow the metadata of the FDP layers, metadata from other existing ontologies (e.g. PRO) and the B2SHARE ontology.

Furthermore, the performance analysis illustrated a worst scenario where the test scripts and B2SHARE-FAIR REST run in parallel (concurring resources) in a basic and powerless VM. Even so, the results showed that this solution can provide an adequate performance, where the smaller overhead was measured as 6.13% and the higher overhead was measured as 42.35%. We believe that the overhead caused by the proxy is due to:

  1. network latency between the servers;
  2. the low capacity of the application server where B2SHARE-FAIR was deployed.

Therefore, we believe that this performance can be improved by deployment configurations.

Among the open issues identified as outcome of this POC, we highlight:

  1. It is required to adopt a well-established process to manage the translations for the root and community schemas, i.e. for each metadata of the record level (FDP dataset). We believe that a procedure should be implanted, for example:
  • a. Check FDP specification for an equivalent metadata;
  • b. If not found in the FDP spec, search standardized ontologies presenting equivalent metadata and chose one;
  • c. If not found, then create/extend an ontology with the metadata;
  • d. Validate the equivalence translation

Example: in the message below, what would be an equivalent metadata for "open_access"? How to find an existing ontology providing an equivalent metadata?

 "metadata": {
      "$schema": "https://trng-b2share.eudat.eu/api/communities/e9b9792e-79fb-4b07-b6b4-b9c2bd06d095/schemas/0#/json_schema", 
      "DOI": "http://doi.org/XXXX/b2share.09f6c89f9af74bb5a5be9f75c35a3d63", 
      "community": "e9b9792e-79fb-4b07-b6b4-b9c2bd06d095", 
      "community_specific": {}, 
      "creators": [
        {
          "creator_name": "np"
        }
      ], 
      "descriptions": [
        {
          "description": "a description ", 
          "description_type": "Other"
        }
      ], 
      "ePIC_PID": "http://hdl.handle.net/0000/09f6c89f9af74bb5a5be9f75c35a3d63", 
      "language": "en", 
      "open_access": true, 
      "owners": [
        32
      ], 
      "publication_state": "published", 
      "publisher": "http://dendro.fe.up.pt", 
      "titles": [
        {
          "title": "this is another test project"
        }
      ]
    }, 
    "updated": "2018-01-23T15:19:58.378657+00:00"

Obs.1: Ontology-driven conceptual modeling can play a major role in this methodology, especially to support (c). Obs.2: The process to change root (generic) and community (custom) metadata schemas in B2SHARE should be aligned with this methodology. Ideally, this process should be merged with this methodology.

  1. Validate "F" (FAIR), findability of data (B2FIND): The procedure to publish metadata in B2FIND is in https://eudat.eu/services/userdoc/B2FIND-integration . B2FIND provides joint metadata catalogue and a discovery portal, having the metadata stored through B2SHARE. “B2FIND is open to discuss metadata publishing with interested communities and accompanies participants through the integration process”. The semantic mapping of the harvested metadata uses an elaborate and flexible software stack, which allows clearly formulated and easy implementation of the mapping rules according to your specific needs. Two prerequisites must be fulfilled to publish metadata in the B2FIND catalogue:

Therefore, a semantic mapping should be created and assessed with the communities to determine how the mapping can be configured and adapted. The available B2FIND API can be used to leverage the configuration of the mappings and new metadata ingestion. The publication of the new metadata follows the process: “After the iterated process of adaption and review of the mapping has achieved an agreed state, an initial upload of the mapped records is performed”.

  1. Validate "A" (FAIR), accessibility of sensitive data (e.g. health). Discuss if the Publishing Roles Ontology (https://sparontologies.github.io/pro/current/pro.html) should play a role for the accessibility definition.
  • a. Insert diverse types of data from distinct domains (e.g. e-health, logistics, emergency);
  • b. Execute metrics evaluation;
  • c. Analyze results and describe open issues that should be addressed.
  1. Discuss an approach to include FDP level 5

  2. Apply cache solution on the proxy side to speed-up the solution

Main EUDAT resources

References

[1] https://eudat.eu/eudat-cdi

[2] https://www.eudat.eu/services/b2share

[3] https://github.com/DTL-FAIRData/FAIRDataPoint/wiki/FAIR-Data-Point-Specification

[4] https://ontouml.org/

[5] https://B2SHARE.eudat.eu/help/api

[6] Ganzha, M., et al. Streaming semantic translations. in 2017 21st International Conference on System Theory, Control and Computing (ICSTCC). 2017.

[7] https://github.com/digitalbazaar/pyld

[8] https://sparontologies.github.io/pro/current/pro.html