Blood Glucose Level Prediction

https://doi.org/10.1155/2022/7902418

1. Introduction

Diabetes is a chronic disease that causes many people to suffer and causes socioeconomic losses and there are two types of diabetes
- Type 1 diabetes (T1DM) - When self-Blood Glucose (BG) control is impossible due to the problem of inability to produce insulin in β cells of the pancreas
- Type 2 diabetes (T2DM) - Also known as non-insulin dependent diabetes mellitus, which means that insulin production is possible, but cells become resistant to insulin and insulin can't perform its role properly
T2DM patients account for more than 90% of all diabetes patients, but most of the previous studies were conducted on T1DM patients, so we collected data from T2DM patients
Thus, this project aims to predict BG level in patients 15, 30 and 60 minutes(Prediction Horizon) using the data below.
- Internal Factors : EGV (Estimated Glucose Value) from CGM (Continuous Glucose Monitoring) device, height, weight, underlying disease
- External Factors : Insulin administration, carbohydrate intake
We used the above factors to make five models : Simple RNN, LSTM, Bidirectional LSTM, Stacked LSTM, and GRU
Then, we applied genetic algorithm (GA) to find optimized weights that minimize RMSE
And we compared the performance of the model with the traditional time series prediction model ARIMA as baseline and found that the model proposed in this work performed better

2. Dataset

With cooperation with the university hospital, we collected data from 55 diabetic patients who visited or were hospitalized from 2019.05 ~ 2021.03
Previous studies often used less than 10 patients or simulated data, so it is sufficiently meaningful that this project used data from 55 diabetic patients
Data collected by attaching a CGM device for 5 to 7 consecutive days
CGM device - Dexcom G5
Sampling rate - 5 minutes
BG is recorded as 'high' if it is over 400 and 'low' if it is less than 60
Now, let's take a look at the data
First, raw-data is shown in the table below

As you can see,there are 13 features including timestamps, event types, patient information, glucose values, etc
However, only five features used in this study - insulin value administered, carbohydrate value, glucose value, timestamp, event type
Therefore, raw data is pre-processed so that it can be converted into a table of shapes as shown below (If necessary, please refer to 0.1 ~ 0.3 of source code for the detailed preprocessing process - Soucre code)

As can be seen from the above table, three features of insulin, carbohydrates, and Blood Glucose Values are used right now

Now we can see neat graph like above

3. Materials and methods

3.1. Research framework

__< "Developing an individual Glucose Prediction Model Using Recurrent Neural Network" > __reference link

The above research framework is a univariate prediction model using only CGM data as input variables
On the other hand, in this study, there are following developments:
1. Multivariate models added to insulin administration point and carbohydrate intake point features
2. Applied optimization methodology, genetic algorithm
3. ARIMA model was used as baseline
4. And most of all, the performance of the model has improved

3.2. Prediction models

I applied five RNN-based algorithms that show clear advantages in sequence data such as time series data
- Vanila RNN
- LSTM
- Stacked LSTM
- Bidirectional LSTM
- GRU
Below is a description of the frame to be fitted to the model

< "Developing an individual Glucose Prediction Model Using Recurrent Neural Network" > reference link
- Lookback : How many minutes ago we want to use the BG as the input variable
- delay : Prediction Horizon(PH) , future BG we want to predict (15min, 30min, 60min)
- windowing step : We applied sliding window, move the window one timestep to the right in a fixed size
Figure demonstrated more about lookback

< "Artificial Neural Network Algorithm for Online Glucose Prediction from Continuous Glucose Monitoring" >reference link
We applied lookback values to the model by varying them, such as 3, 6, 9, 12, and each showed no significant difference in performance, so we thought 6 was the best fit
This means that BG up to 30 minutes ago were used as input variables

3.3. Optimization

So far, we have made five prediction models
However, there is a problem that RNN can be perform well in some patients' data while GRU can perform well in others
Therefore, Genetic Algorithm (GA) is applied in a way that minimizes variability due to performance differences in each model depending on the data and makes it robust
The flow of GA is as follows
1. Initial chromosome generation
2. Evaluation of fitness by generation
3. Generation crossover and mutations
4. Evaluate the fitness for the next generation

In this study, GA is applied in the following order:

Each prediction model has an array that containing predicted BG as an output (If the input variables are [t1,t2,t3, ...], output is [pre_t7, pre_t8, pre_t9, ...] when PH = 30min)
The predicted BG for each model were named Model1, Model2, ..., Model5
Our aim is to find the weight combination that minimizes the RMSE

The constraints of each weight are as follows.

Parameters used in GA

Parameter_Name	Parameter_value
Num_iteration	25000
Population_Size	100
mutation_probability	0.1
elit_ratio	0.01
crossover_probability	0.5
parents_portion	0.3
crossover_type	uniform
selection_type	roulette
max_iteration_without_improvement	None

3.4. Baseline - ARIMA (Autoregressive Integrated Moving Average) Model

The ARIMA model is a traditional time series forecasting methodology
In real-time, time series data are often non-stationary, but AR(p), MA(q), and ARMA(p, q) models cannot explain this non-stationary
Therefore, model including the process of eliminating this non-stationary is the ARIMA(p,d,q) model
However, if the ARIMA model is used without any preset, t + 5, t + 10, t + 15, ... at a fixed point in time t only these predictions are

4. Results

Result example of worst case - PH = 15 minutes, which implemented GA ensemble weight optimization , Clarke Error Grid Anylsis

Name		Name	Last commit message	Last commit date
Latest commit History 119 Commits
Data_Result		Data_Result
Image		Image
In_Progress		In_Progress
Outdated		Outdated
ARIMA_Model.ipynb		ARIMA_Model.ipynb
CG-EGA.ipynb		CG-EGA.ipynb
GA_Residual_Analysis.ipynb		GA_Residual_Analysis.ipynb
GA_Uni.ipynb		GA_Uni.ipynb
Multi_GA_fixed.ipynb		Multi_GA_fixed.ipynb
Naive_Method.ipynb		Naive_Method.ipynb
README.md		README.md
R_EGA.R		R_EGA.R
Visualize_RMSE_Boxplot.ipynb		Visualize_RMSE_Boxplot.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Blood Glucose Level Prediction

1. Introduction

2. Dataset

3. Materials and methods

3.1. Research framework

3.2. Prediction models

3.3. Optimization

3.4. Baseline - ARIMA (Autoregressive Integrated Moving Average) Model

4. Results

About

Releases

Packages

Languages

dongsikchoi/Blood-Glucose-Prediction-LSTM

Folders and files

Latest commit

History

Repository files navigation

Blood Glucose Level Prediction

1. Introduction

2. Dataset

3. Materials and methods

3.1. Research framework

3.2. Prediction models

3.3. Optimization

3.4. Baseline - ARIMA (Autoregressive Integrated Moving Average) Model

4. Results

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages