-
Notifications
You must be signed in to change notification settings - Fork 0
/
Parkinsons Disease Analysis.Rmd
195 lines (155 loc) · 6.3 KB
/
Parkinsons Disease Analysis.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
---
title: "Parkinson's Data Analysis"
output:
html_document:
toc: yes
toc_float:
collapsed: yes
smooth_scroll: yes
collapse_button_text: "Click to show/hide code"
bibliography : 'URL'
---
```{r, echo=FALSE, eval=FALSE, showtext=TRUE}
knitr::opts_chunk$set(echo = TRUE)
```
---
```{r, include=FALSE}
options(tinytex.verbose = TRUE)
```
```{r,message=FALSE, warning=FALSE, echo=FALSE, eval=FALSE, showtext=TRUE}
library(readr)
library(ggplot2)
library(tidyverse)
```
# Importing the dataset
```{r, message=FALSE, warning=FALSE}
library(readr)
data <- read_csv("C:/Users/Admins/Desktop/parkinsons.csv")
data$name<- NULL
```
# Boxplot of the 'MDVP:Shimmer' feature
```{r}
ggplot(data, aes(x = factor(status), y = `MDVP:Shimmer`)) +
geom_boxplot() +
ggtitle("Boxplot of MDVP:Shimmer") +
xlab("Status (0 = Healthy, 1 = Parkinson's)") +
ylab("MDVP:Shimmer")+
theme_bw()
```
# scatterplot of the 'NHR' vs. 'HNR' features
```{r}
ggplot(data, aes(x = NHR, y = HNR, color = factor(status))) +
geom_point() +
ggtitle("Scatterplot of NHR vs. HNR") +
xlab("NHR") +
ylab("HNR") +
scale_color_discrete(name = "Status", labels = c("Healthy", "Parkinson's"))+theme_bw()
```
# Barplot of the distribution of the 'status' feature
```{r}
ggplot(data, aes(x = factor(status))) +
geom_bar(fill= (col="darkgrey")) +
ggtitle("Distribution of Status") +
xlab("Status (0 = Healthy, 1 = Parkinson's)") +
ylab("Count")+
theme_bw()
```
# Density plot of the 'D2' feature
```{r}
ggplot(data, aes(x = D2, color = factor(status))) +
geom_density() +
ggtitle("Density Plot of D2") +
xlab("D2") +
ylab("Density") +
scale_color_discrete(name = "Status", labels = c("Healthy", "Parkinson's"))+theme_bw()
```
# histogram of "MDVP:Fhi(Hz)" column
```{r}
ggplot(data, aes(x = `MDVP:Fhi(Hz)`)) +
geom_histogram(fill = "blue", alpha = 0.5, binwidth = 30) +
ggtitle("Histogram of MDVP:Fhi(Hz)") +
xlab("MDVP:Fhi(Hz)") +
ylab("Frequency")+theme_bw()
```
# linear logistic model
```{r, message=FALSE,warning=FALSE}
# Load the caret library
library(caret)
```
```{r}
# Load the necessary libraries
library(caret)
```
# Split the data into training and testing sets
```{r}
set.seed(123)
splitIndex <- createDataPartition(data$status, p = 0.7, list = FALSE)
train_data <- data[splitIndex, ]
test_data <- data[-splitIndex, ]
# Exclude the "name" column from the data
train_data_no_name <- train_data[,-1]
test_data_no_name <- test_data[,-1]
```
# Fit a logistic regression model
```{r, message=FALSE, warning=FALSE}
logistic_model <- train(status ~ ., data = train_data_no_name, method = "glm", family = "binomial")
```
# Predict the status on the test data
```{r}
predictions <- predict(logistic_model, newdata = test_data_no_name)
# Evaluate the model performance using Mean Absolute Error (MAE)
mae <- mean(abs(predictions - test_data$status))
mae
```
# Linear regression model
```{r, message=FALSE, warning=FALSE}
# Load the necessary libraries
library(caret)
# Split the data into training and testing sets
set.seed(123)
splitIndex <- createDataPartition(data$status, p = 0.7, list = FALSE)
train_data <- data[splitIndex, ]
test_data <- data[-splitIndex, ]
# Exclude the "name" column from the data
train_data_no_name <- train_data[,-1]
test_data_no_name <- test_data[,-1]
```
# simple linear regression model
```{r}
model <- lm(status ~ ., data = train_data)
```
# simple linear regression Model predictions
```{r}
predictions <- predict(model, newdata = test_data)
# Evaluate the model performance using Mean Absolute Error (MAE)
mae <- mean(abs(predictions - test_data$status))
mae
```
# Knn linear model
```{r}
library(class)
# Split the data into training and test sets
set.seed(123)
train_index <- sample(1:nrow(data), 0.7 * nrow(data))
train_data <- data[train_index, ]
test_data <- data[-train_index, ]
```
```{r, message=FALSE, warning=FALSE}
model_knn <- train(status ~ ., data = train_data, method = "knn")
summary(model_knn)
```
```{r, message=FALSE, warning=FALSE}
# Predict on the test data
predictions_knn <- predict(model_knn, newdata = test_data)
# Calculate the MSE
mse <- mean((predictions - test_data$status)^2)
mse
```
# Report
The mean of the “MDVP:Fo(Hz)” variable, which represents the mean fundamental frequency, is significantly higher in individuals with Parkinson’s disease (127.7 Hz) compared to healthy individuals (115.2 Hz). This suggests that a higher mean fundamental frequency may be associated with Parkinson’s disease. Similarly, the mean of the “Jitter:DDP” variable, which is significantly higher in individuals with Parkinson’s disease (0.014) compared to healthy individuals (0.012).
The mean of the “MDVP:Shimmer” variable, which represents the local variations in amplitude, is also significantly higher in individuals with Parkinson’s disease (0.066) compared to healthy individuals (0.055). This suggests that higher values of jitter and shimmer may be indicative of Parkinson’s disease.
To identify the best model between the Knn and the linear regression model, we used the The Mean Absolute Error (MAE). THe model with the lowest model is the best model for the data, therefore the KNN model is the best model with MAE of 0.1618094, followed by the linear logistic model with 0.2018143, and the linear regression model with a MAE of 0.2984471.
Conclusion :
the Parkinson’s disease data set provides several insights into the characteristics of individuals with and without Parkinson’s disease. A higher mean fundamental frequency, higher values of jitter and shimmer, and positive associations between these variables and the presence of Parkinson’s disease suggest that they may be useful indicators of Parkinson’s disease. Further analysis and modeling is necessary to determine the strength of these associations and the utility of these variables in predicting Parkinson’s disease.
references
https://api.citedrive.com/bib/dc64ec6f-3d25-480a-9add-b702bb9d2faa/references.bib?x=eyJpZCI6ICJkYzY0ZWM2Zi0zZDI1LTQ4MGEtOWFkZC1iNzAyYmI5ZDJmYWEiLCAidXNlciI6ICIyNzQ2IiwgInNpZ25hdHVyZSI6ICIxODA1ZmZlMzhlMzI3YTUyNzAwYThlMDUzY2RlN2RiMWJkM2E4NzNmNTA3NjI3ZTA3YWU0MzI1MWVjMjE5ODcwIn0=/bibliography.bib