A web application designed for NLP data annotation using Interactive Clustering methodology.
Interactive clustering is a method intended to assist in the design of a training data set.
This iterative process begins with an unlabeled dataset, and it uses a sequence of two substeps :
- the user defines constraints on data sampled by the computer ;
- the computer performs data partitioning using a constrained clustering algorithm.
Thus, at each step of the process :
- the user corrects the clustering of the previous steps using constraints, and
- the computer offers a corrected and more relevant data partitioning for the next step.
Simplified diagram of how Interactive Clustering works.
Example of iterations of Interactive Clustering.
This web application implements this annotation methodology with several features:
- data preprocessing and vectorization in order to reduce noise in data;
- constrainted clustering in order to automatically partition the data;
- constraints sampling in order to select the most relevant data to annotate;
- binary constraints annotation in order to correct clustering relevance;
- annotation review and conflicts analysis in order to improve constraints consistency.
For more details, read the Documentation and the articles in the References section.
Interactive Clustering GUI requires Python 3.8 or above.
To install with pip
:
# install package
python3 -m pip install cognitivefactory-interactive-clustering-gui
# install spacy language model dependencies (the one you want, with version "3.4.x")
python3 -m spacy download fr_core_news_md-3.4.0 --direct
To install with pipx
:
# install pipx
python3 -m pip install --user pipx
# install package
pipx install --python python3 cognitivefactory-interactive-clustering-gui
# install spacy language model dependencies (the one you want, with version "3.4.x")
python3 -m spacy download fr_core_news_md-3.4.0 --direct
To display the help message:
cognitivefactory-interactive-clustering-gui --help
To launch the web application:
cognitivefactory-interactive-clustering-gui # launch on 127.0.0.1:8080
Then, go to one of the following pages in your browser:
- Welcome page (web application home): http://localhost:8080
- Swagger (interactive documentation): http://localhost:8080/docs
To work on this project or contribute to it, please read:
- the Copier PDM template documentation ;
- the Contributing page for environment setup and development help ;
- the Code of Conduct page for contribution rules.
-
Interactive Clustering:
- PhD report:
Schild, E. (2024, in press). De l'Importance de Valoriser l'Expertise Humaine dans l'Annotation : Application à la Modélisation de Textes en Intentions à l'aide d'un Clustering Interactif. Université de Lorraine.
; - First presentation:
Schild, E., Durantin, G., Lamirel, J.C., & Miconi, F. (2021). Conception itérative et semi-supervisée d'assistants conversationnels par regroupement interactif des questions. In EGC 2021 - 21èmes Journées Francophones Extraction et Gestion des Connaissances. Edition RNTI. https://hal.science/hal-03133007.
; - Theoretical study:
Schild, E., Durantin, G., Lamirel, J., & Miconi, F. (2022). Iterative and Semi-Supervised Design of Chatbots Using Interactive Clustering. International Journal of Data Warehousing and Mining (IJDWM), 18(2), 1-19. http://doi.org/10.4018/IJDWM.298007. https://hal.science/hal-03648041.
; - Methodological discussion:
Schild, E., Durantin, G., & Lamirel, J.C. (2021). Concevoir un assistant conversationnel de manière itérative et semi-supervisée avec le clustering interactif. In Atelier - Fouille de Textes - Text Mine 2021 - En conjonction avec EGC 2021. https://hal.science/hal-03133060.
- Implementation:
Schild, E. (2021). cognitivefactory/interactive-clustering. Zenodo. https://doi.org/10.5281/zenodo.4775251.
- PhD report:
-
Web application:
- FastAPI:
https://fastapi.tiangolo.com/
- FastAPI:
- Several comparative studies of Interactive Clustering methodology on NLP datasets:
Schild, E. (2021). cognitivefactory/interactive-clustering-comparative-study. Zenodo. https://doi.org/10.5281/zenodo.5648255
. (GitHub: cognitivefactory/interactive-clustering-comparative-study).
Organizational diagram of the different Comparative Studies of Interactive Clustering.
Schild, E. (2021). cognitivefactory/interactive-clustering-gui. Zenodo. https://doi.org/10.5281/zenodo.4775270.