K-Nearest Neighbours (KNN) From Scratch

This repository is an implmentation of K-Nearest Neighbours (KNN) from scratch by just using NumPy as it's main processing library. It also includes k-fold cross-validation that has been implemented from scratch. Dataset used here is a binary image dataset for handwritten numbers. It is similar to MNIST dataset but it is not exactly the same.

Running the program

The code resides in the file kNN.py. To run it simply run it as a usual Python program.

Changable Parameters

There are two parameters:

NO_OF_FOLDS defines the number of folds in k-fold cross validation
NO_OF_NEIGHBORS defines the maximum number of neighbours to be considered

Methodology

For a given number of NO_OF_NEIGHBORS the given dataset is iterated through 1 to NO_OF_NEIGHBORS only considering odd numbers. For each iteration k-fold cross validation is also performed. The distance metric is Euclidean distance. Other distancs were not included to reduce the overall time taken for the program to run.

Accuracy and output

Although done for a single dataset, the accuracy is around 99% for training. Please feel free to try with other datasets (you will need to cdit some helper functions).

A typical output is as follows:

==================================================
Current fold: 1
________________________________________
Training for fold: 1
k is 3
Classification errors: 20, Training set accuracy: 98.70801033591732
k is 5
Classification errors: 29, Training set accuracy: 98.1266149870801
k is 7
Classification errors: 34, Training set accuracy: 97.80361757105943
k is 9
Classification errors: 40, Training set accuracy: 97.41602067183463
k is 11
Classification errors: 41, Training set accuracy: 97.3514211886305
________________________________________
Validating for fold: 1
Classification errors: 8 Validation set accuracy: 0.9792746113989638
==================================================
Current fold: 2
________________________________________
(1162, 1024)
Training for fold: 2
k is 3
Classification errors: 13, Training set accuracy: 98.88123924268503
k is 5
Classification errors: 23, Training set accuracy: 98.02065404475043
k is 7
Classification errors: 27, Training set accuracy: 97.67641996557659
k is 9
Classification errors: 32, Training set accuracy: 97.24612736660929
k is 11
Classification errors: 35, Training set accuracy: 96.98795180722891
________________________________________
Validating for fold: 2
Classification errors: 7 Validation set accuracy: 0.9818652849740933
==================================================
Current fold: 3
________________________________________
(1162, 1024)
Training for fold: 3
k is 3
Classification errors: 15, Training set accuracy: 98.7091222030981
k is 5
Classification errors: 24, Training set accuracy: 97.93459552495698
k is 7
Classification errors: 29, Training set accuracy: 97.50430292598968
k is 9
Classification errors: 32, Training set accuracy: 97.24612736660929
k is 11
Classification errors: 34, Training set accuracy: 97.07401032702238
________________________________________
Validating for fold: 3
Classification errors: 9 Validation set accuracy: 0.9766839378238342
==================================================
Current fold: 4
________________________________________
(1162, 1024)
Training for fold: 4
k is 3
Classification errors: 13, Training set accuracy: 98.88123924268503
k is 5
Classification errors: 23, Training set accuracy: 98.02065404475043
k is 7
Classification errors: 29, Training set accuracy: 97.50430292598968
k is 9
Classification errors: 33, Training set accuracy: 97.16006884681583
k is 11
Classification errors: 39, Training set accuracy: 96.64371772805508
________________________________________
Validating for fold: 4
Classification errors: 9 Validation set accuracy: 0.9766839378238342
==================================================
Current fold: 5
________________________________________
Training for fold: 5
k is 3
Classification errors: 19, Training set accuracy: 98.76943005181347
k is 5
Classification errors: 32, Training set accuracy: 97.92746113989638
k is 7
Classification errors: 37, Training set accuracy: 97.60362694300518
k is 9
Classification errors: 41, Training set accuracy: 97.34455958549223
k is 11
Classification errors: 44, Training set accuracy: 97.15025906735751
________________________________________
Validating for fold: 5
Classification errors: 12 Validation set accuracy: 0.9692307692307692
==================================================
Testing for given test dataset
________________________________________
Classification errors for testing dataset: 8
Testing accuracy with the the best yet k, k=3 is 99.15433403805497%

Reviews and feedback

I am not clainming that this is the best from scracth solution available for KNN by any means so any feedback, reviews and comments are welcome.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
binary-hand-digits		binary-hand-digits
LICENSE		LICENSE
README.md		README.md
kNN.py		kNN.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

K-Nearest Neighbours (KNN) From Scratch

Running the program

Changable Parameters

Methodology

Accuracy and output

Reviews and feedback

About

Languages

License

nikhilraj1997/knn-scratch

Folders and files

Latest commit

History

Repository files navigation

K-Nearest Neighbours (KNN) From Scratch

Running the program

Changable Parameters

Methodology

Accuracy and output

Reviews and feedback

About

Topics

Resources

License

Stars

Watchers

Forks

Languages