Skip to content

Commit

Permalink
✨ Add example & de-duplicate Readme (#43)
Browse files Browse the repository at this point in the history
* ✨ Add example & de-duplicate Readme

* 👷 Add include extension
  • Loading branch information
GabrielePicco authored Dec 23, 2022
1 parent e680379 commit b2c2a27
Show file tree
Hide file tree
Showing 4 changed files with 12 additions and 183 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/publish-pages-doc.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,5 +13,5 @@ jobs:
- uses: actions/setup-python@v2
with:
python-version: 3.x
- run: pip install mkdocs mkdocs-material mkdocstrings[python] mkdocs-markdownextradata-plugin mdx_include
- run: pip install mkdocs mkdocs-material mkdocstrings[python] mkdocs-markdownextradata-plugin mdx_include mkdocs-include-markdown-plugin
- run: mkdocs gh-deploy --force
7 changes: 7 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,13 @@ $ pip install zshot

</div>

## Examples

| Example | Notebook |
|:------------------------------:|--------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Installation and Visualization | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/IBM/zshot/blob/examples/Zshot_Example.ipynb) |


## Zshot Approach

ZShot contains two different components, the **mentions extractor** and the **linker**.
Expand Down
185 changes: 3 additions & 182 deletions docs/index.md
Original file line number Diff line number Diff line change
@@ -1,182 +1,3 @@
<div align="center">
<img height="130px" width="130px" src="./img/graph.png" />

<h1>Zshot</h1>

<p>
<strong>Zero and Few shot named entity & relationships recognition</strong>
</p>
<p>

<a href="https://pages.github.ibm.com/Dublin-Research-Lab/zshot/"><img alt="Tutorials" src="https://img.shields.io/badge/docs-tutorials-green" /></a>
<a href="https://pypi.org/project/zshot/"><img src="https://img.shields.io/pypi/v/zshot" /></a>
<a href="https://pypi.org/project/zshot/"><img src="https://img.shields.io/pypi/dm/zshot" /></a>
<a href="https://github.com/IBM/zshot/actions/workflows/python-tests.yml"> <img alt="Build" src="https://github.com/IBM/zshot/actions/workflows/python-tests.yml/badge.svg" /></a>
<a href="https://app.codecov.io/github/ibm/zshot"> <img alt="Build" src="https://codecov.io/github/ibm/zshot/branch/main/graph/badge.svg" /></a>

</p>
</div>

**Documentation**: <a href="https://ibm.github.io/zshot" target="_blank">https://ibm.github.io/zshot</a>

**Source Code**: <a href="https://github.com/IBM/zshot/" target="_blank">https://github.com/IBM/zshot</a>


Zshot is a highly customisable framework for performing Zero and Few shot named entity recognition.

Can be used to perform:

- **Mentions extraction**: Identify globally relevant mentions or mentions relevant for a given domain
- **Wikification**: The task of linking textual mentions to entities in Wikipedia
- **Zero and Few Shot named entity recognition**: using language description perform NER to generalize to unseen domains (work in progress)
- **Zero and Few Shot named relationship recognition** (work in progress)

## Requirements

* `Python 3.6+`

* <a href="https://spacy.io/" target="_blank"><code>spacy</code></a> - Zshot rely on <a href="https://spacy.io/" class="external-link" target="_blank">Spacy</a> for pipelining and visualization
* <a href="https://pytorch.org/get-started" target="_blank"><code>torch</code></a> - PyTorch is required to run pytorch models.
* <a href="https://huggingface.co/docs/transformers/index" target="_blank"><code>transformers</code></a> - Required for pre-trained language models.
* <a href="https://huggingface.co/docs/evaluate/index" target="_blank"><code>evaluate</code></a> - Required for evaluation.
* <a href="https://huggingface.co/docs/datasets/index" target="_blank"><code>datasets</code></a> - Required to evaluate over datasets (e.g.: OntoNotes).

## Optional Dependencies

* <a href="https://github.com/flairNLP/flair" target="_blank"><code>flair</code></a> - Required if you want to use Flair mentions extractor and for TARS linker.
* <a href="https://github.com/facebookresearch/BLINK" target="_blank"><code>blink</code></a> - Required if you want to use Blink for linking to Wikipedia pages.


## Installation

<div class="termy">

```console
$ pip install zshot

---> 100%
```

</div>


## Example: Zero-Shot Entity Recognition

### How to use it

* Create a file `main.py` with:

```Python
import spacy
from zshot import PipelineConfig, displacy
from zshot.linker import LinkerRegen
from zshot.mentions_extractor import MentionsExtractorSpacy
from zshot.utils.data_models import Entity

nlp = spacy.load("en_core_web_sm")
nlp_config = PipelineConfig(
mentions_extractor=MentionsExtractorSpacy(),
linker=LinkerRegen(),
entities=[
Entity(name="Paris",
description="Paris is located in northern central France, in a north-bending arc of the river Seine"),
Entity(name="IBM",
description="International Business Machines Corporation (IBM) is an American multinational technology corporation headquartered in Armonk, New York"),
Entity(name="New York", description="New York is a city in U.S. state"),
Entity(name="Florida", description="southeasternmost U.S. state"),
Entity(name="American",
description="American, something of, from, or related to the United States of America, commonly known as the United States or America"),
Entity(name="Chemical formula",
description="In chemistry, a chemical formula is a way of presenting information about the chemical proportions of atoms that constitute a particular chemical compound or molecule"),
Entity(name="Acetamide",
description="Acetamide (systematic name: ethanamide) is an organic compound with the formula CH3CONH2. It is the simplest amide derived from acetic acid. It finds some use as a plasticizer and as an industrial solvent."),
Entity(name="Armonk",
description="Armonk is a hamlet and census-designated place (CDP) in the town of North Castle, located in Westchester County, New York, United States."),
Entity(name="Acetic Acid",
description="Acetic acid, systematically named ethanoic acid, is an acidic, colourless liquid and organic compound with the chemical formula CH3COOH"),
Entity(name="Industrial solvent",
description="Acetamide (systematic name: ethanamide) is an organic compound with the formula CH3CONH2. It is the simplest amide derived from acetic acid. It finds some use as a plasticizer and as an industrial solvent."),
]
)
nlp.add_pipe("zshot", config=nlp_config, last=True)

text = "International Business Machines Corporation (IBM) is an American multinational technology corporation" \
" headquartered in Armonk, New York, with operations in over 171 countries."

doc = nlp(text)
displacy.serve(doc, style="ent")
```


### Run it

Run with

```console
$ python main.py

Using the 'ent' visualizer
Serving on http://0.0.0.0:5000 ...
```


The script will annotate the text using Zshot and use Displacy for visualising the annotations

### Check it

Open your browser at <a href="http://127.0.0.1:5000" class="external-link" target="_blank">http://127.0.0.1:5000</a> .

You will see the annotated sentence:

<img src="./img/annotations.png" />

### How to create a custom component

If you want to implement your own mentions_extractor or linker and use it with ZShot you can do it. To make it easier for the user to implement a new component, some base classes are provided that you have to extend with your code.

It is as simple as create a new class extending the base class (`MentionsExtractor` or `Linker`). You will have to implement the predict method, which will receive the SpaCy Documents and will return a list of `zshot.utils.data_models.Span` for each document.

This is a simple mentions_extractor that will extract as mentions all words that contain the letter s:

```python
from typing import Iterable
import spacy
from spacy.tokens import Doc
from zshot import PipelineConfig
from zshot.utils.data_models import Span
from zshot.mentions_extractor import MentionsExtractor

class SimpleMentionExtractor(MentionsExtractor):
def predict(self, docs: Iterable[Doc], batch_size=None):
spans = [[Span(tok.idx, tok.idx + len(tok)) for tok in doc if "s" in tok.text] for doc in docs]
return spans

new_nlp = spacy.load("en_core_web_sm")

config = PipelineConfig(
mentions_extractor=SimpleMentionExtractor()
)
new_nlp.add_pipe("zshot", config=config, last=True)
text_acetamide = "CH2O2 is a chemical compound similar to Acetamide used in International Business " \
"Machines Corporation (IBM)."

doc = new_nlp(text_acetamide)
print(doc._.mentions)

>>> [is, similar, used, Business, Machines, materials]
```

### How to evaluate ZShot

Evaluation is an important process to keep improving the performance of the models, that's why ZShot allows to evaluate the component with two predefined datasets: OntoNotes and MedMentions, in a Zero-Shot version in which the entities of the test and validation splits don't appear in the train set.

The package `evaluation` contains all the functionalities to evaluate the ZShot components. The main function is `zshot.evaluation.zshot_evaluate.evaluate`, that will take as input the SpaCy `nlp` model and the dataset(s) and split(s) to evaluate. It will return a `str` containing a table with the results of the evaluation. For instance the evaluation of the ZShot custom component implemented above would be:

```python
from zshot.evaluation.zshot_evaluate import evaluate
from datasets import Split

evaluation = evaluate(new_nlp, datasets="ontonotes",
splits=[Split.VALIDATION])
print(evaluation)
```
{%
include-markdown "../README.md"
%}
1 change: 1 addition & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,7 @@ repo_url: https://github.com/IBM/zshot
edit_uri: ''
plugins:
- search
- include-markdown
- mkdocstrings:
handlers:
python:
Expand Down

0 comments on commit b2c2a27

Please sign in to comment.