The Semantic Snowball Effect: Long-lived Words Accrue More Meanings

Online supplement

Manual Version Number: 1.0.0

Short Summary:

Why do some words have more meanings than others? Some researchers have argued for an explanation based on efficient communication: words that are shorter, more frequent, and easier to pronounce get more meanings, allowing for a more compact organization of the lexicon. However, these explanations mostly address synchronic effects, while linguistic ambiguity is inherently diachronic. We propose a novel approach where we rely on the longevity of words to estimate their degree of ambiguity. Using data covering more than half of a millennium, we find that words used for longer periods become more ambiguous. Our results support the intuition that the process of meaning accumulation is time-driven, indicating that time signatures are important predictors of linguistic features.

Folder/File Overview:

Root Folder:

analysis.R: R script for the statistical analysis of the results.

`data/` Folder:

Contains input and output data related to word age estimation and frequencies:

age_estimations.csv: Main dataset
age_estimation_1800.csv: Cross-validated dataset

`figures/` Folder:

Contains the figures used both in the paper and the supplementary material.

`notebooks/` Folder:

Contains Jupyter notebooks used in the study:

change_point_detection.ipynb: Notebook that validates the etymology extraction using Google n-gram and change point detection algorithms.
semantic-snowball-model.ipynb: Main notebook containing the cultural evolutionary model of the Semantic Snowball Effect.

`src/` Folder:

Contains Python scripts for processing data:

etymology_extraction.py: Python script that extracts etymology information for words.
wiki_tokenizer.py: Script to tokenize the Wiki40b dataset.
wikitionnary_preprocessing.py: Python script for preprocessing the Wiktionary dump.

Additional Files:

requirements.txt: Lists the necessary Python packages to run the project.
run_preprocessing.sh: Bash script for preprocessing of the dataset.

Instructions to Run the Software:

All the analyses were run on R version 4.2.2 (2022-10-31) and Python 3.11.2.

Pre-requisites:

Before running the project, ensure the following resources are available:

Download the Wiki40b French dataset using the wiki-tokenizer tool. Place it in the data/wiki-corpus/ folder.
Download the French Wiktionary dump from WiktionaryX and place the unpacked files in the data/WikitionaryX folder.

Workflow:

Install Dependencies: Install the required Python packages by running:
```
pip install -r requirements.txt
```
Preprocess Data:
- Make the run_preprocessing.sh script executable:
```
chmod +x run_preprocessing.sh
```
- Run the script to preprocess the Wiki40b dataset and compute word ages and meanings:
```
./run_preprocessing.sh
```
The results will be saved in data/age_estimations.csv.
Validity Check: Use the change_point_detection.ipynb notebook to validate the etymology extraction method and obtain the cross-validated dataset.
Analyze the Results: Use the analysis.R script to reproduce the statistical analysis and generate the figures.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

The Semantic Snowball Effect: Long-lived Words Accrue More Meanings

Online supplement

Short Summary:

Folder/File Overview:

Root Folder:

`data/` Folder:

`figures/` Folder:

`notebooks/` Folder:

`src/` Folder:

Additional Files:

Instructions to Run the Software:

Pre-requisites:

Workflow:

About

Releases 2

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
data		data
figures		figures
notebooks		notebooks
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
analysis.R		analysis.R
requirements.txt		requirements.txt
run_preprocessing.sh		run_preprocessing.sh

License

alexeykosh/2023-longevity-number-of-meanings

Folders and files

Latest commit

History

Repository files navigation

The Semantic Snowball Effect: Long-lived Words Accrue More Meanings

Online supplement

Short Summary:

Folder/File Overview:

Root Folder:

data/ Folder:

figures/ Folder:

notebooks/ Folder:

src/ Folder:

Additional Files:

Instructions to Run the Software:

Pre-requisites:

Workflow:

About

Resources

License

Stars

Watchers

Forks

Releases 2

Packages 0

Languages

`data/` Folder:

`figures/` Folder:

`notebooks/` Folder:

`src/` Folder:

Packages