Having downloaded the data from HuggingFace Datasets into
tsv
folder, run the data import.
python preprocess.py
python vectorize.py "e5-multilingual-large" --prefix query --show_progress_bar
Run Qdrant container in Docker and mount local storage to persist the collection.
docker run -p 6333:6333 -v $(pwd)/qdrant_mount/qdrant_storage:/qdrant/storage qdrant/qdrant
Create the collection in Qdrant.
python collection.py "e5-multilingual-large-query" --postfix "query"
python search_ui.py
@inproceedings{dorkin-sirts-2024-sonajaht,
title = "S{\~o}najaht: Definition Embeddings and Semantic Search for Reverse Dictionary Creation",
author = "Dorkin, Aleksei and
Sirts, Kairit",
editor = "Bollegala, Danushka and
Shwartz, Vered",
booktitle = "Proceedings of the 13th Joint Conference on Lexical and Computational Semantics (*SEM 2024)",
month = jun,
year = "2024",
address = "Mexico City, Mexico",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2024.starsem-1.33",
pages = "410--420",
}