A Machine Learning Processing with SMS Data to predict whether the SMS is Spam/Ham with various ML Algorithms like MultinomialNB, LogisticRegression, SVC, DecisionTreeClassifier, RandomForestClassifier, KNeighborsClassifier, AdaBoostClassifier, BaggingClassifier, ExtraTreesClassifier, GradientBoostingClassifier, XGBClassifier to compare accuracy and using various data cleaning and processing techniques like PorterStemmer, CountVectorizer, TFIDF Vetorizer. It is implemented using MultinomialNB to gain accuracy of 97.09%.
Text Preprocessing Type | GaussianNB | Multinomial NB | BernoulliNB |
---|---|---|---|
TFIDF Vectorizer + PorterStemmer | 86.94% | 97.09% | 98.35% |
CountVectorizer + PorterStemmer | 88.00% | 96.42% | 97.00% |
- The dataset used is SMS Spam Dataset created by UCI Machine Learning. This dataset is also available on kaggle. For instance, to download this dataset click here.
- Loading Data
- Data Cleaning
- EDA
- Data Preprocessing
- Classification Model Building
- Classification Model Testing
- Model Results
- Performance Evaluation