This text comparison tool is designed to assist in the evaluation of texts within the context of a master's thesis. It uses token-based similarity metrics to compare and analyze texts, making it easier to identify differences and similarities between them.
The main goal of this text comparison tool is to facilitate the analysis of textual data within the context of a master's thesis. It is particularly useful for researchers working with large text datasets, as it provides a streamlined and efficient method for comparing and contrasting textual data.
The tool is implemented in Python and utilizes the rapidfuzz
library for token-based similarity metrics, allowing for a more accurate comparison of texts. Additionally, it leverages the pandas
library for efficient data manipulation and the pathlib
library for easy file management.
- Python 3.6 or higher
- rapidfuzz
- pandas
- pathlib
- Clone this repository or download it as a ZIP file.
- Make sure you have Python 3.6 or higher installed on your system.
- Install the required libraries by running the following command in your terminal:
pip install rapidfuzz pandas
- Place your text files and participant data file (in Excel format) in the res/ folder.
- Update the global variables in the script to match your file names and any other settings you'd like to change.
- Run the script using the following command in your terminal:
python3 main.py
- The script will generate a new Excel file with the comparison results in the res/ folder.