This project aims to enhance Information Retrieval Systems by integrating semantic search and event extraction techniques. It focuses on developing a multilingual semantic search system for Southeast Asian news articles using a fine-tuned bi-encoder for multilingual semantic embeddings enabling efficient retrieval. Additionally, a token classification model is employed to extract and tag events within the article. Elasticsearch is then used to facilitate seamless storage, indxing and retrieval of the data.
Ensure that Docker and Docker compose is installed
First, initialize the Elasticsearch users and groups by executing the command:
docker-compose up setup
If the setup completed without error, startup the other stack components together with the frontend and backend:
docker-compose up
Note
By default, Elasticsearch users are initialized with the values of the passwords defined in the .env file ("changeme" by default). For more details on changing users' passwords and other configuration, head to docker-elk
Once the stack is up and running, the frontend can be accessed at http://localhost:3000 and the backend documentation at http://localhost:8000/docs
- docker-elk/: Contains configuration for Elasticsearch, Logstash and Kibana
- frontend/: Contains the frontend application built in Next.js
- backend/: Contains the backend application code