This repo stems from the time before the tool was integrated into the Wikimedia dumping infrastructure. The up-to-date repository can be found at wikimedia/operations-dumps-dcat.
A project aimed at generating a DCAT-AP document for Wikibase installations in general and Wikidata in particular.
Takes into account access through:
- Content negotiation (various formats)
- MediaWiki api (various formats)
- Entity dumps e.g. json, ttl (assumes that these are compressed)
An example result can be found at lokal-profil / dcatap.rdf. The live DCAT-AP description of Wikidata can be found here.
- Copy
config.example.json
toconfig.json
and change the contents to match your installation. Refer to the Config section below for an explanation of the individual configuration parameters. - Copy
catalog.example.json
to a suitable place (e.g. on-wiki) and update the translations to fit your wikibase installation. Set this value ascatalog-i18n
in the config file. - Create the dcatap.rdf file by running
php DCAT.php
orphp DCAT.php --config="<path_1>" --dumpDir="<path_2>" --outputDir="<path_3>"
where each of the options is optional and can be left out. The options are:--config
is the relative path to the json file containing the configurations, defaults to./config.json
--dumpDir
is the relative path to the directory containing the dumps (if any), defaults to thedirectory
parameter in the config file--outputDir
is the relative path to the directory where thedcatap.rdf
file should be created, defaults to thedirectory
parameter in the config file
- Translations which are generic to the tool are handled by Intuition and should be translated through translatewiki.net.
- Translations which are specific to a project/catalog are added to
the location specified in the
catalog-i18n
parameter of the config file.
Below follows a key by key explanation of the config file.
directory
: Relative path to the directory containing the dump subcategories (if any) and for the final dcat file.api-enabled
: (Boolean
) Is API access activated for the MediaWiki installation?dumps-enabled
: (Boolean
) Is JSON dump generation activated for the WikiBase installation?uri
: URL used as basis for rdf identifiers, e.g. http://www.example.org/aboutcatalog-homepage
: URL for the homepage of the WikiBase installation, e.g. http://www.example.orgcatalog-issued
: ISO date at which the WikiBase installation was first issued, e.g. 2000-12-24catalog-license
: License of the catalog, i.e. of the dcat file itself (not the contents of the WikiBase installation), e.g. http://creativecommons.org/publicdomain/zero/1.0/catalog-i18n
: URL or path to json file containing i18n strings for catalog title and description. Can be an on-wiki page, e.g. https://www.example.org/w/index.php?title=MediaWiki:DCAT.json&action=rawkeywords
: (array
) List of keywords applicable to all of the datasetsthemes
: (array
) List of thematic ids in accordance with Eurovoc, e.g. 2191 for http://eurovoc.europa.eu/2191publisher
:name
: Name of the publisherhomepage
: URL for or the homepage of the publisheremail
: Contact e-mail for the publisher, should be a function address, e.g. info@example.orgpublisherType
: Publisher type according to ADMS, e.g. NonProfitOrganisation
contactPoint
:name
: Name of the contact pointemail
: E-mail for the contact point, should ideally be a function address, e.g. support@example.orgvcardType
: Type of contact point, eitherOrganization
orIndividual
ld-info
:accessURL
: URL to the content negotiation endpoint of the WikiBase installation, e.g. http://www.example.org/entity/mediatype
: (object
) List of IANA media types available through content negotiation in the format file-ending:media-typelicense
: License of the data in the distribution, e.g. http://creativecommons.org/publicdomain/zero/1.0/
api-info
:accessURL
: URL to the MediaWiki API endpoint of the wiki, e.g. http://www.example.org/w/api.phpmediatype
: (object
) List of non-deprecated formats available thorough the API, see ld-info:mediatype above for formattinglicense
: See ld-info:license above
dump-info
:accessURL
: URL to the directory where the .json.gz files reside ($1
is replaced on the fly by the actual filename), e.g. http://example.org/dumps/$1mediatype
: (object
) List of media types. e.g.{"json": "application/json"}
compression
: (object
) List of compression formats, in the format name:file-ending e.g.{"gzip": "gz"}
license
: See ld-info:license above