Grade: 40/40
Implement steps described in the research paper and produce similar results.
Paper: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0188756
Create Python virtual environment:
[ ! -d "venv" ] && (echo "Creating python3 virtual environment"; python3 -m venv venv)
Install required packages:
pip install -r requirements.txt
Directory | Description |
---|---|
data | dataset |
models | saved and trained models |
references | research paper |
reports | model's stats, figures |
src | python source code |
- Create report file saver and loader for easy and reproducible way to check results
- Apply filters to remove noise
- notch filter 50Hz
- band pass 0.15Hz to 40Hz
- Crop the signal to 5 minutes (300 seconds)
- Load signal for all drivers
- Create epochs from the raw signal using the window of 1 second
- Create class for easier signal preprocessing procedures
- Create class for dynamic feature extraction
- Define features:
- 4 different entropies for each 1 second epoch
- standard deviation, mean, power density spectrum
- Use product of preprocessing procedures and feature extraction to extract more features
- Concatenate features into a final dataframe
- Clean the dataframe, replace bad values
- Create a normalization process that avoids data leakage
- Use LOO (leave one participant out) approach to find the best
C
andgamma
parameters for the SVM model - Train the SVM model with multiple combinations of entropies (function
powerset
) to find out which entropy combination has the highest accuracy on the train dataset - Train the following models using the Grid Search method:
- SVM
- Neural network (BP)
- KNN
- Random Forest (RF)
- Validate accuracy using testing set and report performance on each model
- Determine significant electrodes by calculating the weight for each electrode for each driver with the formula describe in the research paper:
- Fix the code in EntropyHub library (sample entropy) where NaN is returned by the
np.log
if the power density for a given frequency equals to 0 - Add additional preprocessing procedure which decomposes signal to alpha and beta waves
- ICA - Principal component analysis
- filter low 1hz to remove drifts
- Fix data leakage. Don't scale on the whole dataset. Scale only the train dataset seperatly of test data. Fit on train, transform on train, transform on test
- Fix the code for dataframe creation - add interface that allows feature concaternation
- Explore which features are most important for prediction
Optional:
- Visualize training/testing error
- Visualize weight-based topographies for each subject
- Visualize weight-based topographies average
Model name | F1 score | Accuracy | Area under curve |
---|---|---|---|
RandomForestClassifier | 1 | 1 | 1 |
SVC | 0.999182 | 0.999167 | 0.999172 |
MLPClassifier | 0.994481 | 0.994444 | 0.994444 |
KNeighborsClassifier | 0.982515 | 0.9825 | 0.98258 |
RandomForestClassifier (AUC = 1.000) | SVM (AUC = 0.999) |
---|---|
KNeighborsClassifier (AUC = 0.983) | MLPClassifier (AUC = 0.994) |
---|---|
Each row is defined by a tripplet:
- driver (driver_id)
- epoch (epoch_id)
- driving state (y=is_fatigue_state)
Number of rows:
drivers (NUM_DRIVERS=12) *
epochs (SIGNAL_DURATION_SECONDS_DEFAULT=300) *
driving_states (len(driving_states)=2)
12 * 300 * 2 = 7200 rows
Each feature column is defined by a tripplet:
- feature (
mean
, approximate entropy (AE
), standard deviation (std
)...) - channel (
FC4
,T6
,P3
...) - electrodes on the cap which driver wears during the driving session - preprocess procedure (
standard
, alpha waves (AL
), channel rereferencing...)
Number of columns:
driver_id (1) +
is_fatigue_state (1) +
epoch_id (1) +
features (len(feature_names)=7) *
channels (len(channels_good)=30) *
N preprocess procedures (N <1, +>)
1 + 1 + 1 + 7 * 30 * N = 210 * N
in most cases N = 5 (standard, AL, AH, BL, BH)
num of cols = 1050
Total number of columns is product (multiplying) of feature (7), channels (30) and preprocess procedure (N)
Each driver has two driving states (normal and fatigue) which gives 2 raw signals for each driver.
Each signal consists of 300 seconds which will be transformed to 300 epochs.
For each epoch (row), all columns are caculated.
is_fatigued | driver_id | epoch_id | CP3_PE_standard | CP3_PE_AL | ... | FT7_PE_standard | FT7_PE_AL | ... | |
---|---|---|---|---|---|---|---|---|---|
0 | 0 | 0 | 0 | 0.361971 | 0.361971 | ... | 1.84037e-23 | 1.84037e-23 | ... |
1 | 0 | 0 | 1 | 0.232837 | 0.232837 | ... | 1.4759e-23 | 1.4759e-23 | ... |
2 | 0 | 0 | 2 | 0.447734 | 0.447734 | ... | 1.27735e-23 | 1.27735e-23 | ... |
3 | 1 | 0 | 0 | 3.18712 | 3.18712 | ... | 1.4759e-23 | 1.4759e-23 | ... |
4 | 1 | 0 | 1 | 2.81654 | 2.81654 | ... | 1.27735e-23 | 1.27735e-23 | ... |
EEG data:
- .cnt files were created by a 40-channel Neuroscan amplifier including the EEG data in two states in the process of driving.
Entropy data (not used):
- four entropies of twelve healthy subjects for driver fatigue detection
- the digital number represents different participants
- each .mat file included five files
- FE - fuzzy entropy
- SE - sample entropy
- AE - approximate entropy
- PE - spectral entropy
- Class_label 0 or 1
- 1 represents the fatigue state
- 0 represents the normal state
analyze the multiple entropy fusion method and evaluate several channel regions to effectively detect a driverβs fatigue state based on electroencephalogram (EEG) records
- collected by attaching electrodes to driverβs
- non-fatigue data: driver was driving for 20 minutes. Last 5 minutes are captured as non-fatigue
- fatigue data: driver was driving for 40-60 minutes. Last 5 minutes are captured as fatigue data.
- dataset is split randomly 50:50 train/test
- 5 minute EEG data from 30 electrodes
- sectioned into 1 second epoch
- 5 * 60 = 300 * 1 = 300 epoch for one participant
- total 3600 fatigue units and 3600 normal units
- 32 channels (30 effective and 2 reference channels)
- PE - special entropy - calculated by applying the Shannon function to the normalized power spectrum based on the peaks of a Fourier transform
- AE - Approximate entropy - calculated in time domain without phase-space reconstruction of signal (short-length time series data) [41]
- SE - Sample entropy - similar to AE. Se is less sensitive to changes in data length with larger values corresponding to greater complexity or irregularity in the data [41]
- FE - Fuzzy entropy - stable results for different parameters. Best noise resistance using fuzzy membership function.
- m: dimension of phase space
- m = 2
- r: similarity tolerance
- r = 0.2 * SD (SD = standard deviation of the time series)
Features were normalized to [-1, 1] using min-max normalization:
- Feature vector is built using the concatenation process, which concatenates the features.
- The min-max normalization of each feature xi, i = 1,. . .,n, is computed as follows:
- Support vector machine (SVM)
- Back propagation neural network (BP)
- Random forest (RF)
- K-nearest neighbor (KNN)
With leave-one-out (LOO) cross-validation parameters :
- c=-1 - the penalty parameter
- g=-5 - the kernel parameter
- AR order 10.
Combining multiple entropies always yields better accuracy.
Significant electrodes were chosen from 30 electrodes.
- Calculate Acc(i) of single i electrode using multiple entropy fusion method based on training data by SVM classifier
- Obtain accuracy for each electrode and then recalculate it by combining pairwise electrode (with 29 electrodes)
- Calculate the weight for each electrode
$V_i=\frac{Acc(i) + \sum_{j=1, j\not=i}^{30}{Acc_{(ij)} + Acc_{(i)} - Acc_{(j)}}}{30}$
Pick 10 electrodes with biggest weight. These 10 electrodes produce 4 clusters/regions A,B,C,D.
- A gives the best prediction results and even better prediction compared when all electrodes were used for a prediction
- Many channels are flatlined during the driving process and spike during some moments
- In addition each BCIT dataset includes 4 additional EOG channels placed vertically above the right eye (veou), vertically below the right eye (veol), horizontally on the outside of the right eye (heor), and horizontally on the outside of the left eye (heol)
ipython kernel install --driver --name=eeg
to use venv in jupyter- Two different libs (EntropyHub and Antropy) produce the same result for sample entropy
- Applying filter before and after converting to epochs gives different results