- Text Extraction:
1.1. Question (1):
1.2. Solution: Uncompress the compressed file
- Understanding the data:
2.1. Understand the problem statetement:
2.2. Basic EDA and Visualization pre-cleaning process:
- Text Preprocessing:
3.1. Removing Noise:
3.2. Removing Punctuation:
3.3. Tokenization:
3.4. Removing Stopwords:
- Embedded Representation
4.1. Question (2.1)
4.2. Solution: Embedding Visualization
4.3. Question (2.2)
4.4. Solution: Query similarity with gensim
- Text Classification:
5.1. Question (2.3):
5.2. Solution: Text Classification with Naive Bayes (NB)
5.3. Question (2.4):
5.4. Solution: Improve the accuracy of the model
- Extraction of Characteristic words
6.1. Question(3)
6.2. Solution: Topic Modelling with LDA
- Conclusion