https://doi.org/10.1155/2022/7902418
-
Diabetes is a chronic disease that causes many people to suffer and causes socioeconomic losses and there are two types of diabetes
-
Type 1 diabetes (T1DM) - When self-Blood Glucose (BG) control is impossible due to the problem of inability to produce insulin in β cells of the pancreas
-
Type 2 diabetes (T2DM) - Also known as non-insulin dependent diabetes mellitus, which means that insulin production is possible, but cells become resistant to insulin and insulin can't perform its role properly
-
-
T2DM patients account for more than 90% of all diabetes patients, but most of the previous studies were conducted on T1DM patients, so we collected data from T2DM patients
-
Thus, this project aims to predict BG level in patients 15, 30 and 60 minutes(Prediction Horizon) using the data below.
-
Internal Factors : EGV (Estimated Glucose Value) from CGM (Continuous Glucose Monitoring) device, height, weight, underlying disease
-
External Factors : Insulin administration, carbohydrate intake
-
-
We used the above factors to make five models : Simple RNN, LSTM, Bidirectional LSTM, Stacked LSTM, and GRU
-
Then, we applied genetic algorithm (GA) to find optimized weights that minimize RMSE
-
And we compared the performance of the model with the traditional time series prediction model ARIMA as baseline and found that the model proposed in this work performed better
-
With cooperation with the university hospital, we collected data from 55 diabetic patients who visited or were hospitalized from 2019.05 ~ 2021.03
-
Previous studies often used less than 10 patients or simulated data, so it is sufficiently meaningful that this project used data from 55 diabetic patients
-
Data collected by attaching a CGM device for 5 to 7 consecutive days
-
CGM device - Dexcom G5
-
Sampling rate - 5 minutes
-
BG is recorded as 'high' if it is over 400 and 'low' if it is less than 60
-
Now, let's take a look at the data
-
First, raw-data is shown in the table below
-
As you can see,there are 13 features including timestamps, event types, patient information, glucose values, etc
-
However, only five features used in this study - insulin value administered, carbohydrate value, glucose value, timestamp, event type
-
Therefore, raw data is pre-processed so that it can be converted into a table of shapes as shown below (If necessary, please refer to 0.1 ~ 0.3 of source code for the detailed preprocessing process - Soucre code)
- As can be seen from the above table, three features of insulin, carbohydrates, and Blood Glucose Values are used right now
- Now we can see neat graph like above
__< "Developing an individual Glucose Prediction Model Using Recurrent Neural Network" > __reference link
- The above research framework is a univariate prediction model using only CGM data as input variables
- On the other hand, in this study, there are following developments:
- Multivariate models added to insulin administration point and carbohydrate intake point features
- Applied optimization methodology, genetic algorithm
- ARIMA model was used as baseline
- And most of all, the performance of the model has improved
-
I applied five RNN-based algorithms that show clear advantages in sequence data such as time series data
- Vanila RNN
- LSTM
- Stacked LSTM
- Bidirectional LSTM
- GRU
-
Below is a description of the frame to be fitted to the model
< "Developing an individual Glucose Prediction Model Using Recurrent Neural Network" > reference link
-
Lookback : How many minutes ago we want to use the BG as the input variable
-
delay : Prediction Horizon(PH) , future BG we want to predict (15min, 30min, 60min)
-
windowing step : We applied sliding window, move the window one timestep to the right in a fixed size
-
-
Figure demonstrated more about lookback
< "Artificial Neural Network Algorithm for Online Glucose Prediction from Continuous Glucose Monitoring" >reference link
-
We applied lookback values to the model by varying them, such as 3, 6, 9, 12, and each showed no significant difference in performance, so we thought 6 was the best fit
-
This means that BG up to 30 minutes ago were used as input variables
-
So far, we have made five prediction models
-
However, there is a problem that RNN can be perform well in some patients' data while GRU can perform well in others
-
Therefore, Genetic Algorithm (GA) is applied in a way that minimizes variability due to performance differences in each model depending on the data and makes it robust
-
The flow of GA is as follows
-
Initial chromosome generation
-
Evaluation of fitness by generation
-
Generation crossover and mutations
-
Evaluate the fitness for the next generation
-
-
In this study, GA is applied in the following order:
-
Each prediction model has an array that containing predicted BG as an output (If the input variables are [t1,t2,t3, ...], output is [pre_t7, pre_t8, pre_t9, ...] when PH = 30min)
-
The predicted BG for each model were named Model1, Model2, ..., Model5
-
Our aim is to find the weight combination that minimizes the RMSE
-
The constraints of each weight are as follows.
Parameters used in GA
Parameter_Name Parameter_value Num_iteration 25000 Population_Size 100 mutation_probability 0.1 elit_ratio 0.01 crossover_probability 0.5 parents_portion 0.3 crossover_type uniform selection_type roulette max_iteration_without_improvement None
-
-
The ARIMA model is a traditional time series forecasting methodology
-
In real-time, time series data are often non-stationary, but AR(p), MA(q), and ARMA(p, q) models cannot explain this non-stationary
-
Therefore, model including the process of eliminating this non-stationary is the ARIMA(p,d,q) model
-
However, if the ARIMA model is used without any preset, t + 5, t + 10, t + 15, ... at a fixed point in time t only these predictions are
Result example of worst case - PH = 15 minutes, which implemented GA ensemble weight optimization , Clarke Error Grid Anylsis