Skip to content

florenthemmi/ckanext-oaipmh

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

OAI-PMH Plugin for CKAN

This plugin provides two things: a harvester which can be configured to harvest datasets from a OAI-PMH data source and a fully compatible interface for OAI-PMH which can list all datasets and resources in CKAN for OAI-PMH.

Installation

This plugin requires Pyoai. Pyoai have dependencies so you will need to install this dependencies with:

sudo apt-get install -y libxml2-dev libxslt1-dev

Then, you can do:

. /usr/lib/ckan/default/bin/activate

pip install -e git+https://github.com/opendatasoft/ckanext-oaipmh#egg=ckanext-oaipm

To make OAI-PMH harvester working in CKAN, add the extension name 'oaipmh_harvester' to the configuration option 'ckan.plugins' of the CKAN ini file in use.

In any case, you should already have 'harvest ckan_harvester' in the configuration option.

Now restart CKAN.

Harvester

Then navigate to http://localhost:5000/harvest to see your existing harvest sources. Navigate to http://localhost:5000/harvest/new to add a new harvesting source. For this source do:

  • Fill in URL to a OAI-PMH repository.
  • Select 'Source Type' to be 'OAI-PMH'.
  • Click save

The OAI-PMH harvester support a number of configuration options to control their behaviour. Those need to be defined as a JSON object in the configuration form field. The currently supported configuration options are:

  • default_tags: A list of tags that will be added to all harvested datasets. Tags don't need to previously exist.
  • default_extras: A dictionary of key value pairs that will be added to extras of the harvested datasets (existing extras are overwritten).
  • force_all: By default, after the first harvesting, the harvester will gather only the modified packages from the remote site since the last harvesting. Setting this property to true will force the harvester to gather all remote packages regardless of the modification date. Default is False.

Here is an example of a configuration object (the one that must be entered in the configuration field):

{
  "default_tags": ["ods"],
  "default_extras": {"company": "OpenDataSoft"},
  "force_all": true
}

You may need to configure your fetch and gather consumer to be run as daemons or via a the paster commands.

This is clearly documented in ckanext-harvest extension, see it here:

https://github.com/okfn/ckanext-harvest/blob/master/README.rst

Please note that this fork work with the latest version of CKAN (tested with v2.2a) and CKAN harvester.

As documented in the code, resource type for a dataset is now automatically detected. The following formats are supported by CKAN and are implemented in the OAI-PMH harvester: "rdf", "pdf", "api", "zip", "xls", "csv", "txt", "xml", "json" and "html". In CKAN harvester, all unknown resource type use the "data" format for displaying purposes.

This plugin use "html" for the default format of a resource (if not found). To be recognized, the format need to be at the end of the resource. For example:

Interface

The interface is simple to install, add the extension name 'oaipmh' to the configuration option 'ckan.plugins' of the CKAN ini file in use.

To acccess the interface, go to http://localhost:5000/oai. Use the interface as described in OAI-PMH documentation.

Tests

This extension offers a suite of tests, to run them, issue the following command:

python setup.py nosetests

If you get an error about test.ini not being found, please modify the test-core.ini file to have:

use = config:../pyenv/src/ckan/test.ini

pointing to a CKAN source tree

About

OAI PMH extension to CKAN

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%