From 8e136811b7ba0082d992cbbcb7a220d7524dffd3 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Marcos=20Mart=C3=ADnez=20Galindo?= Date: Thu, 15 Aug 2024 05:41:03 +0100 Subject: [PATCH] :memo: Update docs with GLiNER MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Marcos Martínez Galindo --- README.md | 13 +++++++------ docs/entity_linking.md | 3 +-- docs/gliner_linker.md | 11 +++++++++++ docs/gliner_mentions_extractor.md | 11 +++++++++++ docs/mentions_extractor.md | 5 ++++- docs/tars_mentions_extractor.md | 2 +- 6 files changed, 35 insertions(+), 10 deletions(-) create mode 100644 docs/gliner_linker.md create mode 100644 docs/gliner_mentions_extractor.md diff --git a/README.md b/README.md index a69167d..fe27d4e 100644 --- a/README.md +++ b/README.md @@ -47,9 +47,9 @@ Can be used to perform: ### Optional Dependencies -* flair - Required if you want to use Flair mentions extractor and for TARS linker. +* flair - Required if you want to use Flair mentions extractor and for TARS linker and TARS Mentions Extractor. * blink - Required if you want to use Blink for linking to Wikipedia pages. - +* gliner - Required if you want to use GLiNER Linker or GLiNER Mentions Extractor. ## Installation @@ -81,7 +81,7 @@ ZShot contains two different components, the **mentions extractor** and the **li ### Mentions Extractor The **mentions extractor** will detect the possible entities (a.k.a. mentions), that will be then linked to a data source (e.g.: Wikidata) by the **linker**. -Currently, there are 6 different **mentions extractors** supported, SMXM, TARS, 2 based on *SpaCy*, and 2 that are based on *Flair*. The two different versions for *SpaCy* and *Flair* are similar, one is based on Named Entity Recognition and Classification (NERC) and the other one is based on the linguistics (i.e.: using Part Of the Speech tagging (PoS) and Dependency Parsing(DP)). +Currently, there are 7 different **mentions extractors** supported, SMXM, TARS, GLiNER, 2 based on *SpaCy*, and 2 that are based on *Flair*. The two different versions for *SpaCy* and *Flair* are similar, one is based on Named Entity Recognition and Classification (NERC) and the other one is based on the linguistics (i.e.: using Part Of the Speech tagging (PoS) and Dependency Parsing(DP)). The NERC approach will use NERC models to detect all the entities that have to be linked. This approach depends on the model that is being used, and the entities the model has been trained on, so depending on the use case and the target entities it may be not the best approach, as the entities may be not recognized by the NERC model and thus won't be linked. @@ -90,14 +90,15 @@ The linguistic approach relies on the idea that mentions will usually be a synta ### Linker The **linker** will link the detected entities to a existing set of labels. Some of the **linkers**, however, are *end-to-end*, i.e. they don't need the **mentions extractor**, as they detect and link the entities at the same time. -Again, there are 4 **linkers** available currently, 2 of them are *end-to-end* and 2 are not. Let's start with those thar are not *end-to-end*: +Again, there are 5 **linkers** available currently, 3 of them are *end-to-end* and 2 are not. | Linker Name | end-to-end | Source Code | Paper | |:-----------:|:----------:|----------------------------------------------------------|--------------------------------------------------------------------| | Blink | X | [Source Code](https://github.com/facebookresearch/BLINK) | [Paper](https://arxiv.org/pdf/1911.03814.pdf) | | GENRE | X | [Source Code](https://github.com/facebookresearch/GENRE) | [Paper](https://arxiv.org/pdf/2010.00904.pdf) | -| SMXM | ✓ | [Source Code](https://github.com/Raldir/Zero-shot-NERC) | [Paper](https://aclanthology.org/2021.acl-long.120/) | -| TARS | ✓ | [Source Code](https://github.com/flairNLP/flair) | [Paper](https://kishaloyhalder.github.io/pdfs/tars_coling2020.pdf) | +| SMXM | ✓ | [Source Code](https://github.com/Raldir/Zero-shot-NERC) | [Paper](https://aclanthology.org/2021.acl-long.120/) | +| TARS | ✓ | [Source Code](https://github.com/flairNLP/flair) | [Paper](https://kishaloyhalder.github.io/pdfs/tars_coling2020.pdf) | +| GLINER | ✓ | [Source Code](https://github.com/urchade/GLiNER) | [Paper](https://arxiv.org/abs/2311.08526) | ### Relations Extractor The **relations extractor** will extract relations among different entities *previously* extracted by a **linker**.. diff --git a/docs/entity_linking.md b/docs/entity_linking.md index 010c4fc..0eb68d6 100644 --- a/docs/entity_linking.md +++ b/docs/entity_linking.md @@ -2,7 +2,6 @@ The **linker** will link the detected entities to a existing set of labels. Some of the **linkers**, however, are *end-to-end*, i.e. they don't need the **mentions extractor**, as they detect and link the entities at the same time. -There are 4 **linkers** available currently, 2 of them are *end-to-end* and 2 are not. Let's start with those thar are not *end-to-end*. - +There are 5 **linkers** available currently, 3 of them are *end-to-end* and 2 are not. ::: zshot.Linker \ No newline at end of file diff --git a/docs/gliner_linker.md b/docs/gliner_linker.md new file mode 100644 index 0000000..fd26c1b --- /dev/null +++ b/docs/gliner_linker.md @@ -0,0 +1,11 @@ +# GLiNER Linker + +GLiNER is a Named Entity Recognition (NER) model capable of identifying any entity type using a bidirectional transformer encoder (BERT-like). It provides a practical alternative to traditional NER models, which are limited to predefined entities, and Large Language Models (LLMs) that, despite their flexibility, are costly and large for resource-constrained scenarios. + +The GLiNER **linker** will use the **entities** specified in the `zshot.PipelineConfig`, it just uses the names of the entities, it doesn't use the descriptions of the entities. + + +- [Paper](https://arxiv.org/abs/2311.08526) +- [Original Source Code](https://github.com/urchade/GLiNER) + +::: zshot.linker.LinkerGLINER \ No newline at end of file diff --git a/docs/gliner_mentions_extractor.md b/docs/gliner_mentions_extractor.md new file mode 100644 index 0000000..47929ea --- /dev/null +++ b/docs/gliner_mentions_extractor.md @@ -0,0 +1,11 @@ +# GLiNER Mentions Extractor + +GLiNER is a Named Entity Recognition (NER) model capable of identifying any entity type using a bidirectional transformer encoder (BERT-like). It provides a practical alternative to traditional NER models, which are limited to predefined entities, and Large Language Models (LLMs) that, despite their flexibility, are costly and large for resource-constrained scenarios. + +The GLiNER **mentions extractor** will use the **mentions** specified in the `zshot.PipelineConfig`, it just uses the names of the mentions, it doesn't use the descriptions of the mentions. + + +- [Paper](https://arxiv.org/abs/2311.08526) +- [Original Source Code](https://github.com/urchade/GLiNER) + +::: zshot.mentions_extractor.MentionsExtractorGLINER \ No newline at end of file diff --git a/docs/mentions_extractor.md b/docs/mentions_extractor.md index 26658f9..e732697 100644 --- a/docs/mentions_extractor.md +++ b/docs/mentions_extractor.md @@ -1,7 +1,7 @@ # MentionsExtractor The **mentions extractor** will detect the possible entities (a.k.a. mentions), that will be then linked to a data source (e.g.: Wikidata) by the **linker**. -Currently, there are 6 different **mentions extractors** supported, 2 of them are based on *SpaCy*, 2 of them are based on *Flair*, TARS and SMXM. The two different versions for *SpaCy* and *Flair* are similar, one is based on NERC and the other one is based on the linguistics (i.e.: using PoS and DP). The TARS and SMXM models can be used when the user wants to specify the mentions wanted to be extracted. +Currently, there are 7 different **mentions extractors** supported, 2 of them are based on *SpaCy*, 2 of them are based on *Flair*, TARS, SMXM and GLiNER. The two different versions for *SpaCy* and *Flair* are similar, one is based on NERC and the other one is based on the linguistics (i.e.: using PoS and DP). The TARS and SMXM models can be used when the user wants to specify the mentions wanted to be extracted. The NERC approach will use NERC models to detect all the entities that have to be linked. This approach depends on the model that is being used, and the entities the model has been trained on, so depending on the use case and the target entities it may be not the best approach, as the entities may be not recognized by the NERC model and thus won't be linked. @@ -10,4 +10,7 @@ The linguistic approach relies on the idea that mentions will usually be a synta The SMXM model uses the description of the mentions to give the model information about them. TARS model will use the labels of the mentions to detect them. + +The GLiNER model will use the labels of the mentions to detect them. + ::: zshot.MentionsExtractor \ No newline at end of file diff --git a/docs/tars_mentions_extractor.md b/docs/tars_mentions_extractor.md index 5a84d27..854cc32 100644 --- a/docs/tars_mentions_extractor.md +++ b/docs/tars_mentions_extractor.md @@ -7,4 +7,4 @@ The TARS **mentions extractor** will use the **mentions** specified in the `zsho - [Paper](https://kishaloyhalder.github.io/pdfs/tars_coling2020.pdf) - [Original Source Code](https://github.com/flairNLP/flair) -::: zshot.linker.LinkerTARS \ No newline at end of file +::: zshot.mentions_extractor.MentionsExtractorTARS \ No newline at end of file