Skip to content

Commit

Permalink
📝 Update docs with GLiNER
Browse files Browse the repository at this point in the history
Signed-off-by: Marcos Martínez Galindo <marcosmartinezgalindo@Marcoss-MacBook-Pro.local>
  • Loading branch information
Marcos Martínez Galindo authored and Marcos Martínez Galindo committed Aug 15, 2024
1 parent 0a1f81a commit 8e13681
Show file tree
Hide file tree
Showing 6 changed files with 35 additions and 10 deletions.
13 changes: 7 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,9 +47,9 @@ Can be used to perform:

### Optional Dependencies

* <a href="https://github.com/flairNLP/flair" target="_blank"><code>flair</code></a> - Required if you want to use Flair mentions extractor and for TARS linker.
* <a href="https://github.com/flairNLP/flair" target="_blank"><code>flair</code></a> - Required if you want to use Flair mentions extractor and for TARS linker and TARS Mentions Extractor.
* <a href="https://github.com/facebookresearch/BLINK" target="_blank"><code>blink</code></a> - Required if you want to use Blink for linking to Wikipedia pages.

* <a href="https://github.com/urchade/GLiNER" target="_blank"><code>gliner</code></a> - Required if you want to use GLiNER Linker or GLiNER Mentions Extractor.

## Installation

Expand Down Expand Up @@ -81,7 +81,7 @@ ZShot contains two different components, the **mentions extractor** and the **li
### Mentions Extractor
The **mentions extractor** will detect the possible entities (a.k.a. mentions), that will be then linked to a data source (e.g.: Wikidata) by the **linker**.

Currently, there are 6 different **mentions extractors** supported, SMXM, TARS, 2 based on *SpaCy*, and 2 that are based on *Flair*. The two different versions for *SpaCy* and *Flair* are similar, one is based on Named Entity Recognition and Classification (NERC) and the other one is based on the linguistics (i.e.: using Part Of the Speech tagging (PoS) and Dependency Parsing(DP)).
Currently, there are 7 different **mentions extractors** supported, SMXM, TARS, GLiNER, 2 based on *SpaCy*, and 2 that are based on *Flair*. The two different versions for *SpaCy* and *Flair* are similar, one is based on Named Entity Recognition and Classification (NERC) and the other one is based on the linguistics (i.e.: using Part Of the Speech tagging (PoS) and Dependency Parsing(DP)).

The NERC approach will use NERC models to detect all the entities that have to be linked. This approach depends on the model that is being used, and the entities the model has been trained on, so depending on the use case and the target entities it may be not the best approach, as the entities may be not recognized by the NERC model and thus won't be linked.

Expand All @@ -90,14 +90,15 @@ The linguistic approach relies on the idea that mentions will usually be a synta
### Linker
The **linker** will link the detected entities to a existing set of labels. Some of the **linkers**, however, are *end-to-end*, i.e. they don't need the **mentions extractor**, as they detect and link the entities at the same time.

Again, there are 4 **linkers** available currently, 2 of them are *end-to-end* and 2 are not. Let's start with those thar are not *end-to-end*:
Again, there are 5 **linkers** available currently, 3 of them are *end-to-end* and 2 are not.

| Linker Name | end-to-end | Source Code | Paper |
|:-----------:|:----------:|----------------------------------------------------------|--------------------------------------------------------------------|
| Blink | X | [Source Code](https://github.com/facebookresearch/BLINK) | [Paper](https://arxiv.org/pdf/1911.03814.pdf) |
| GENRE | X | [Source Code](https://github.com/facebookresearch/GENRE) | [Paper](https://arxiv.org/pdf/2010.00904.pdf) |
| SMXM | &check; | [Source Code](https://github.com/Raldir/Zero-shot-NERC) | [Paper](https://aclanthology.org/2021.acl-long.120/) |
| TARS | &check; | [Source Code](https://github.com/flairNLP/flair) | [Paper](https://kishaloyhalder.github.io/pdfs/tars_coling2020.pdf) |
| SMXM | &check; | [Source Code](https://github.com/Raldir/Zero-shot-NERC) | [Paper](https://aclanthology.org/2021.acl-long.120/) |
| TARS | &check; | [Source Code](https://github.com/flairNLP/flair) | [Paper](https://kishaloyhalder.github.io/pdfs/tars_coling2020.pdf) |
| GLINER | &check; | [Source Code](https://github.com/urchade/GLiNER) | [Paper](https://arxiv.org/abs/2311.08526) |

### Relations Extractor
The **relations extractor** will extract relations among different entities *previously* extracted by a **linker**..
Expand Down
3 changes: 1 addition & 2 deletions docs/entity_linking.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,6 @@

The **linker** will link the detected entities to a existing set of labels. Some of the **linkers**, however, are *end-to-end*, i.e. they don't need the **mentions extractor**, as they detect and link the entities at the same time.

There are 4 **linkers** available currently, 2 of them are *end-to-end* and 2 are not. Let's start with those thar are not *end-to-end*.

There are 5 **linkers** available currently, 3 of them are *end-to-end* and 2 are not.

::: zshot.Linker
11 changes: 11 additions & 0 deletions docs/gliner_linker.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
# GLiNER Linker

GLiNER is a Named Entity Recognition (NER) model capable of identifying any entity type using a bidirectional transformer encoder (BERT-like). It provides a practical alternative to traditional NER models, which are limited to predefined entities, and Large Language Models (LLMs) that, despite their flexibility, are costly and large for resource-constrained scenarios.

The GLiNER **linker** will use the **entities** specified in the `zshot.PipelineConfig`, it just uses the names of the entities, it doesn't use the descriptions of the entities.


- [Paper](https://arxiv.org/abs/2311.08526)
- [Original Source Code](https://github.com/urchade/GLiNER)

::: zshot.linker.LinkerGLINER
11 changes: 11 additions & 0 deletions docs/gliner_mentions_extractor.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
# GLiNER Mentions Extractor

GLiNER is a Named Entity Recognition (NER) model capable of identifying any entity type using a bidirectional transformer encoder (BERT-like). It provides a practical alternative to traditional NER models, which are limited to predefined entities, and Large Language Models (LLMs) that, despite their flexibility, are costly and large for resource-constrained scenarios.

The GLiNER **mentions extractor** will use the **mentions** specified in the `zshot.PipelineConfig`, it just uses the names of the mentions, it doesn't use the descriptions of the mentions.


- [Paper](https://arxiv.org/abs/2311.08526)
- [Original Source Code](https://github.com/urchade/GLiNER)

::: zshot.mentions_extractor.MentionsExtractorGLINER
5 changes: 4 additions & 1 deletion docs/mentions_extractor.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# MentionsExtractor
The **mentions extractor** will detect the possible entities (a.k.a. mentions), that will be then linked to a data source (e.g.: Wikidata) by the **linker**.

Currently, there are 6 different **mentions extractors** supported, 2 of them are based on *SpaCy*, 2 of them are based on *Flair*, TARS and SMXM. The two different versions for *SpaCy* and *Flair* are similar, one is based on NERC and the other one is based on the linguistics (i.e.: using PoS and DP). The TARS and SMXM models can be used when the user wants to specify the mentions wanted to be extracted.
Currently, there are 7 different **mentions extractors** supported, 2 of them are based on *SpaCy*, 2 of them are based on *Flair*, TARS, SMXM and GLiNER. The two different versions for *SpaCy* and *Flair* are similar, one is based on NERC and the other one is based on the linguistics (i.e.: using PoS and DP). The TARS and SMXM models can be used when the user wants to specify the mentions wanted to be extracted.

The NERC approach will use NERC models to detect all the entities that have to be linked. This approach depends on the model that is being used, and the entities the model has been trained on, so depending on the use case and the target entities it may be not the best approach, as the entities may be not recognized by the NERC model and thus won't be linked.

Expand All @@ -10,4 +10,7 @@ The linguistic approach relies on the idea that mentions will usually be a synta
The SMXM model uses the description of the mentions to give the model information about them.

TARS model will use the labels of the mentions to detect them.

The GLiNER model will use the labels of the mentions to detect them.

::: zshot.MentionsExtractor
2 changes: 1 addition & 1 deletion docs/tars_mentions_extractor.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,4 +7,4 @@ The TARS **mentions extractor** will use the **mentions** specified in the `zsho
- [Paper](https://kishaloyhalder.github.io/pdfs/tars_coling2020.pdf)
- [Original Source Code](https://github.com/flairNLP/flair)

::: zshot.linker.LinkerTARS
::: zshot.mentions_extractor.MentionsExtractorTARS

0 comments on commit 8e13681

Please sign in to comment.