The application consists of clients to call Entity Linking services (EL) in English, and modules to operate on the results. Implements the Entity Linking System Combination described in our *SEM 2015 paper.
The EL services currently supported are:
- TagMe
- DBpedia Spotlight
Wikipedia Miner(public instance no longer accessible)- AIDA: both installed locally and in the public web service
- Babelfy
- Python 2.7
- lxml
- MySQL-python (aka MySQLdb)
- nltk
- pyspotlight
- requests
To call TagMe and Babelfy, you need to request a key: Tagme, Babelfy. The application's config module has variables to enter the keys.
- analysis: Parses client responses. Computes entity-cooccurrence tables.
- clients: Clients to call the services
- config: Configuration
- main: Example how to use. Creates runners and calls them for each service
- model: Data types and some methods for them
- readers: To preprocess input before calling a client
- runners: Classes here use a reader, client and writer to create an annotation workflow
- utils: General tools useful for several modules
- writers: To postprocess the annotations and output them (to a file etc)
-
activate the services to call in config.py
-
call main.py
usage: App to work with Entity Linking [-h] [-i MYINPUT] [-o MYOUT] [-s MYSKIPLIST] [-c CORPUS_NAME] optional arguments: -h, --help show this help message and exit -i MYINPUT, --input MYINPUT Input file, directory or text. A default can be set in config.py (default: /path/to/some/default/input) -o MYOUT, --output MYOUT Output file or files. Default names are created dynamically by code in writers.py module (default: None) -r MYOUTRESPS, --resp_output MYOUTRESPS Output directory for client responses. A default is created dynamically by code in writers.py module (default: None) -s MYSKIPLIST, --skip_list MYSKIPLIST File with filenames to skip (default: /path/to/some/default/list) -c CORPUS_NAME, --corpus CORPUS_NAME Name of the corpus (for output files etc.). A default can be set in config.py (default: SOME_DEFAULT_NAME)