Project for Introduction to Machine Learning course
Data: https://www.kaggle.com/datasets/amrwael/nlp-project-fcis-23
The aim of this project was to perform data clustering on the provided dataset containing a collection of 20 000
documents. We preprocessed the data (removing stopwords, lemmatization, vectorization etc.) and built a few models
focusing on the KMeans method. We also added our own interpretation to the final clusters, which you can see in our
presentation: Presentation/presentation.pdf
.
- Magdalena Jeczeń (@m24jeczen)
- Marta Szuwarska (@szuvarska)