Code for experiments conducted in the paper 'Leveraging Contextual Embeddings for Detecting Diachronic Semantic Shift' published in proceedings of LREC 2020 conference
Please cite the following paper [bib] if you use this code:
Matej Martinc, Petra Kralj Novak and Senja Pollak. Leveraging Contextual Embeddings for Detecting Diachronic Semantic Shift. In Proceedings of the Twelfth International Conference on Language Resources and Evaluation (LREC 2020). Marseille, France.
Published results were produced in Python 3 programming environment on Linux Mint 18 Cinnamon operating system. Instructions for installation assume the usage of PyPI package manager.
To get the source code, train data, fine-tuned BERT model and trained embeddings, clone the project from the repository with 'git clone https://gitlab.com/matej.martinc/semantic_shift_detection'
To only get the source code, clone the repository from github with 'git clone https://github.com/EMBEDDIA/semantic_shift_detection'
Install dependencies if needed: pip install -r requirements.txt
To reproduce the results on the LiverpoolFC corpus published in the paper run the code in the command line using following commands:
Generate time specific representation for each word using the already fine-tuned model:
python get_embeddings.py
Visualize everything and calculate Pearson correlation:
python visualize.py
To fine-tune custom BERT model on the corpus
python fine_tuning.py
Matej Martinc
- Knowledge Technologies Department, Jožef Stefan Institute, Ljubljana