Skip to content

Latest commit

 

History

History
52 lines (26 loc) · 813 Bytes

README.md

File metadata and controls

52 lines (26 loc) · 813 Bytes

Japanese-Text-Similarity

  1. Text Extraction:

1.1. Question (1):

1.2. Solution: Uncompress the compressed file

  1. Understanding the data:

2.1. Understand the problem statetement:

2.2. Basic EDA and Visualization pre-cleaning process:

  1. Text Preprocessing:

3.1. Removing Noise:

3.2. Removing Punctuation:

3.3. Tokenization:

3.4. Removing Stopwords:

  1. Embedded Representation

4.1. Question (2.1)

4.2. Solution: Embedding Visualization

4.3. Question (2.2)

4.4. Solution: Query similarity with gensim

  1. Text Classification:

5.1. Question (2.3):

5.2. Solution: Text Classification with Naive Bayes (NB)

5.3. Question (2.4):

5.4. Solution: Improve the accuracy of the model

  1. Extraction of Characteristic words

6.1. Question(3)

6.2. Solution: Topic Modelling with LDA

  1. Conclusion