Automatic-Removal-of-Outliers-to-Improve-Density-Based-Clustering-Performance

This is a course Project for the University of Alberta course CMPUT 697, Fall 2019. This project aims to improve the clustering performance of HDBSCAN, a well-known hierarchical density-based clustering algorithm by automatically removing outliers.
We propose 6 different methods that leverage well-known algorithms to remove outliers from data automatically. Experiments on simulated data demonstrate that one of these variants, consistently performs well in the automatic removal of noise, thus improving the performance of HDBSCAN.

Details of the dataset For this task, 6 datasets were generated with ground truth values. Each dataset is 2D with various numbers of clusters, different densities, and distribution of noise. The following figures show a visual representation of the datasets and statistics about the data.

Results In the results section we look at each dataset individually by looking at the number of clusters discovered, the number of ground truth clusters, the number of mis-clustered points, the number of pruned inliers, etc. We also report two performance evaluation metrics, DBCV and ARI.

Dataset 1:

Dataset 2:

Dataset 3:

Dataset 4:

Dataset 5:

Dataset 6:

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
data		data
KNN-HDBSCAN.py		KNN-HDBSCAN.py
LICENSE		LICENSE
LOF-HDBSCAN.py		LOF-HDBSCAN.py
LOF-KNN-Common-HDBSCAN.py		LOF-KNN-Common-HDBSCAN.py
LOF-LSCP-HDBSCAN.py		LOF-LSCP-HDBSCAN.py
LOF-Recurse-KNN-HDBSCAN.py		LOF-Recurse-KNN-HDBSCAN.py
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Automatic-Removal-of-Outliers-to-Improve-Density-Based-Clustering-Performance

About

Releases

Packages

Languages

License

mdabedr/Automatic-Removal-of-Outliers-to-Improve-Density-Based-Clustering-Performance

Folders and files

Latest commit

History

Repository files navigation

Automatic-Removal-of-Outliers-to-Improve-Density-Based-Clustering-Performance

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages