Skip to content

Latest commit

 

History

History
17 lines (11 loc) · 1015 Bytes

README.md

File metadata and controls

17 lines (11 loc) · 1015 Bytes

NLP-Classification

This repo contains python implementations for extracting features from text, that I have used in my research mostly for user input classification tasks.
Two approaches are implemented:

  • One based on word-embeddings, which is described as part of the baseline methods in [1].
  • A typical statistical n-gram language modeling approach, that estimates the conditional probability of a sentence in a class.

API Referernce

To do....

Toy Example

A toy example is provided, to play around with. The dataset used is a randomly selected subset of the "SMS Spam Collection" dataset available at the UCI Machine learning repository.

References

  1. Cedric De Boom, Steven Van Canneyt, Thomas Demeester, and Bart Dhoedt. 2016. Representation learning for very short texts using weighted word embedding aggregation. Pattern Recogn. Lett. 80, C (September 2016), 150-156. DOI: https://doi.org/10.1016/j.patrec.2016.06.012