Clustered unlabeled textual contents from IMDB and WiKi with unsupervised K-means, and Latent Dirichlet Allocation (LDA)
Preprocessed text content by tokenizing, stemming and stop-words removing, performed feature extraction with TF-IDF
Identified latent topics and keywords of each cluster and visualized training results after dimensionality reduction with Principal Component analysis (PCA)
Source data attached. Please open with Jupyter Notebook