This project is a machine learning approach to classify heartbeat sounds into four categories:
- Normal (
0
) - Extrahls (
1
) - Murmur (
2
) - Extrastole (
4
)
The model utilizes the Dangerous Heartbeat Dataset (DHD) from Kaggle to learn and predict heartbeat sound types. Initial experiments and model iterations are detailed below.
To run the pre-trained model, follow the steps below:
- Edit
run_model.py
to specify the audio file path to be classified. - Run
run_model.py
to load the trained model and classify the audio file. - That's it! The script will output the predicted heartbeat sound type (0 = Normal, 1 = Extrahls, 2 = Murmur, 4 = Extrastole).
The goal of this project is to develop an efficient, lightweight machine learning model capable of classifying heartbeat sounds into distinct categories. This model, with its compact size, is intended to be deployable on mobile devices such as smartphones.
-
Baseline Model:
- Feature Extraction: Used audio frequency as the primary feature.
- Classifier: Basic neural network model.
- Result: Accuracy capped at around 30%.
-
Gradient Boosting:
- Algorithm: Implemented using "PerpetualBoosters" for gradient boosting.
- Result: No significant improvement over the baseline.
To improve model performance, custom preprocessing techniques were developed, as follows:
-
Preprocessor 1 (
preprocessor1.py
):- Method: Divided audio into frames based on delta time and extracted frequency-amplitude pairs from each frame using
librosa
. - Padding: End of data was padded to ensure uniform length across samples.
- Datasets: Generated three datasets—mini, small, and main.
- Result: Achieved ~72% accuracy with the main dataset. However, model size was still large (~770 MB).
- Method: Divided audio into frames based on delta time and extracted frequency-amplitude pairs from each frame using
-
Preprocessor 2 (
preprocessor2.py
):- Method: Employed alignment padding, using Euclidean distance to align smaller samples with larger ones for minimal distance across padding. This ensures heartbeat samples are consistently aligned in the padded arrays.
- Datasets: Due to processing time, only mini and small datasets were generated.
- Result: Achieved ~73% accuracy using the small dataset, with a drastically reduced model size of ~6 MB—over 99% smaller than previous models, making it feasible for mobile deployment.
The current model achieves an accuracy of ~73% on the small dataset, with a compact model size that supports local execution on mobile devices (model size of ~6MB).
-
Data Preprocessing
preprocessor1.py
: Initial preprocessing script that segments audio and pads data to uniform length.preprocessor2.py
: Advanced preprocessing script for alignment padding based on Euclidean distance.
-
Model Training
train.py
: Script used to train the model on the processed datasets.
-
Model Inference
run_model.py
: Single script to load the trained model and run inference on a given audio file, usesalignment_reference.pkl
,amp_scaler.pkl
,freq_scaler.pkl
, andfinal_model_fcnn_classifier_16_8_7368.pth
.
- Additional Feature Engineering: Exploring features beyond frequency and amplitude, such as Mel-frequency cepstral coefficients (MFCCs) or spectral contrast.
- Model Optimization: Experimenting with quantization and pruning techniques to further reduce model size without sacrificing accuracy.
- Other approachs: Using some other approach like CNN, RNN, etc. instead of a fully connected neural network.
Dataset used: Dangerous Heartbeat Dataset (DHD).