Analysis-of-Hard-drive-failures

Data centers can largely benefit from a service that employs data mining to predict hard drive failures. Although hard drive failures are rare, they are costly occurrences. Failures in hard drives could result in temporary system unavailability and/or data loss. Hard drive manufacturers use Self-Monitoring and Reporting Technology (SMART) attributes collected during normal operations to predict failures. These SMART attributes report daily diagnostics of hard drives such as read/write error rates, spin retry count, power cycle count, etc. We used publicly available data from Backblaze, who started recording the stats of a large number of hard drives (~47000) from their own data center. In this project, we analyze and compare the performance of various machine learning algorithms (Linear Regression, Decision Tree, AdaBoost, XGBoost, Gradient Boosting, k-Nearest Neighbors and Random Forest) when used to predict hard drive failures using Backblaze data in the year 2018.

The Initial data cleaning and filtering was done in Spark. The Python Notebooks for Analysis, Data Preperation, Model Assesment and Cross-Validation have also been included.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
Analysis.ipynb		Analysis.ipynb
Cross_Validation.ipynb		Cross_Validation.ipynb
Model_Assesment.ipynb		Model_Assesment.ipynb
Preperation.ipynb		Preperation.ipynb
README.md		README.md
Report.pdf		Report.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Analysis-of-Hard-drive-failures

About

Releases

Packages

Languages

itushar/Analysis-of-Hard-drive-failures

Folders and files

Latest commit

History

Repository files navigation

Analysis-of-Hard-drive-failures

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages