TEIMMA: The First Content Reuse Annotator for Text, Images, and Math
Features.
1. Annotates text, math, and images.
2. Visualization of existing similairity: Users can use any of the four algorithms to assist them in annotating the present similarity in a scientific document pair.
3. Input documents extension: Currently = LaTex, PDF, Plain text (.txt) (Internally: HTML). It also provides converting PDF to LaTeX for accurate access to math.
4. Annotations can be exported from the database (It uses: PostgreSQL)
5. It remembers if an already annotated document pair is uploaded again for annotations. Hence, allowing multiple, multiple suspicious-source or source-suspicious recordings.
python3 -m venv mathReuseAnno
(More on creating virtual environment)
source mathReuseAnno/bin/activate
(activate virtual environment)
pip install -r requirements.txt
sudo apt-get install latexml
(Additional installation instructions if needed or for differrent operating systems)
python manage.py migrate
(can also be used without a database but recommended to have a database). If you want to use the app without a database refer to Dcoker Installation Section.
python manage.py runserver
http://localhost:8000/mainUI/
docker pull ankitsatpute/reuseanno:latest
(configuration with PostgreSQL)
docker pull ankitsatpute/reuseanno:wthtdb
(configuration without PostgreSQL)
docker build -t reuseanno .
docker run --publish 8000:8000 reuseanno
(remember to put /mainUI/ after 8000 if using with PostgreSQL else only 8000 suffices for viewing main UI)
- Choose the suspicious document (left side upload) and Source document (right side upload) file with LateX source content.
- File(s) with any other extension will throw an unknown error.
- Also, choose your source and suspicious files in the correct columns; otherwise, this will significantly record wrong cases.
- Click the upload button after choosing both files from a local directory.
- This might take a while as latex conversion to HTML5 is in the process using LaTeXML.
- You can observe the progress on the terminal.
- Upon successful conversion, both documents will be shown on the main UI page. You can verify if the content is converted correctly and if the source and suspicious documents are correct.
- The generated HTML5 files will be cached till the next upload overwrites. If you would like to upload the same suspicious and source files again, the UI selects the same files, and cached HTML5 files will be used to display.
- You can also scroll into the individual documents to see further content
- To start recording a plagiarism case, click on the
Start Recording
button and select a part of the text from the suspicious document first and then the source document.- You will see that both the selected texts will be highlighted with the same background colour.
- If the colour is not assigned, please redo the selection by clicking on
Start Recording
again.
- Select an option from
Type of case.
- If not chosen,
Text
will be selected as the default option. - You can choose more than one box to indicate that you annotate with multi-type.
- If not chosen,
- Select the appropriate
Obfuscation type
from the drop-down menu. If you think that the obfuscation in the text does not match the available choice, enter an appropriate option of yours in the blockEnter the custom name
. - Click
Finish Recording
to save the recorded case. After this, the page will refresh, and your saved case will be shown again with the same assigned background colour.
View all recorded cases
: All documented cases will be viewed in a JSON format.
jsonDocinfo = {
"inspecDoc": {"inspecDocName": None, "inspecText": None,
"inspecMath": None, "inspecImages": None,
"inspecDocHTML": None,
"inspecDocHTMLParent": None
},
"potsrcDoc": {"potsrcDocName": None, "potsrcText": None,
"potsrcMath": None, "potsrcImages": None,
"potsrcDocHTML": None,
"potsrcDocHTMLParent": None
},
"contentType" : None,
"colorHighlight": None
}
recordings = {"inspecDocstart": None, "inspecDocend": None,
"potsrcDocstart": None, "potsrcDocend": None,
"obfuscation":None, "recordingType":None
}
}
Clear all recorded cases
: This will delete all recorded cases permanently; make sure you have taken a backup of previously recorded cases; otherwise, previously recorded cases will not be able to restore.Home
: clears the uploaded document and takes the user to the main UI.About
: About the development of the tool.Delete last record
: Deletes the last selected case from the cached annotations and the database.
MIT