NLPClustering

Project for Introduction to Machine Learning course

Data: https://www.kaggle.com/datasets/amrwael/nlp-project-fcis-23

The aim of this project was to perform data clustering on the provided dataset containing a collection of 20 000 documents. We preprocessed the data (removing stopwords, lemmatization, vectorization etc.) and built a few models focusing on the KMeans method. We also added our own interpretation to the final clusters, which you can see in our presentation: Presentation/presentation.pdf.

Authors

Magdalena Jeczeń (@m24jeczen)
Marta Szuwarska (@szuvarska)

Name		Name	Last commit message	Last commit date
Latest commit History 51 Commits
Images		Images
Presentation		Presentation
NLPClustering.ipynb		NLPClustering.ipynb
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NLPClustering

Authors

About

Releases

Packages

Contributors 2

Languages

szuvarska/NLPClustering

Folders and files

Latest commit

History

Repository files navigation

NLPClustering

Authors

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages