Introduction

This program classifies legal issues into a binary value for each National Subject Matter Index (NSMI). (https://nsmi.lsntap.org/browse-v2) "Category" means 20 indexes. "Class" means sub categories under the category.

Each article has a binary value(0 or 1) that indicates if this article is related to a specific legal class.
We ignore unlabeled entries when constructing a model.
Among 100+ classes from NSMI-v2, we extracted 36 classes which has more than 10 positive submissions.
After getting the classifier, we chose 16 classes that has reasonable recall.

First, the program converts each text into an tf-idf vector.
After getting the vectors, we apply sklearn LogisticRegression model with L1 loss.
We predict the model with 10-fold cross-validation.

Name		Name	Last commit message	Last commit date
Latest commit History 54 Commits
figs		figs
models		models
prevs		prevs
stats		stats
2019-12-06_95p-confidence_binary.csv		2019-12-06_95p-confidence_binary.csv
EDA.ipynb		EDA.ipynb
README.md		README.md
clf_tfidf_l1.ipynb		clf_tfidf_l1.ipynb
comparison.ipynb		comparison.ipynb
infer_prev.ipynb		infer_prev.ipynb
legal_advice.org		legal_advice.org
legal_advice.pdf		legal_advice.pdf
predictor.ipynb		predictor.ipynb
predictor_small.ipynb		predictor_small.ipynb

Provide feedback