This project is part of a semantic web course initiative, focusing on the analysis of various texts. The chosen text for this project is the "Odyssey" by Homer.
The project's objectives are structured into four distinct steps:
- Convert the initial file (
./etape_1/hom.od_eng.xml
) into a.json
format. - Disambiguate the texts using pywsd.
- Compare the disambiguation results with data in OntoSenticNet.
- Perform an ontological alignment between the project data and LemonUby.
The project is divided into four steps, each with specific tasks:
Convert the file ./etape_1/hom.od_eng.xml
to ./etape_1/hom.od_eng.json
using the [xmltodict] library.
The text was cleaned to remove special characters that were not encoded correctly. The milestones
and locations
XML tags were also removed. The file was then restructured to retain only relevant elements. Disambiguation was performed using pywsd and nltk.
The synsets obtained from the previous step were used to find the most similar concept in OntoSenticNet using Fuseki. The output provides information about the word's synset, concept, and associated sentics.
The ontological alignment was achieved using the Lemon ontology. Two alignment methods were explored:
- A direct alignment inspired by LemonUby's structure.
- A triplet-based alignment, with three variations of the alignment file.
The developed codes produce results that align with the project's objectives. The project is designed as a pipeline, allowing for easy processing of properly formatted input data.