-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
vignette update + function addition #19
base: main
Are you sure you want to change the base?
Changes from 1 commit
7e6ed14
246d2ca
81f6f02
83876d1
fe5ed86
77a385d
bd4b806
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. delete this file / add to gitignore There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. done, added to gitignore |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,93 @@ | ||
#' @title predictPreeclampsia | ||
#' | ||
#' @description Uses 45 CpGs to predict early preeclampsia (PE delivered before or at 34 weeks of gestation) | ||
#' on placental DNA methylation microarray data. | ||
#' | ||
#' @details Assigns the class labels "early-PE" or "normotensive" to each sample | ||
#' and returns a class probability. | ||
#' | ||
#' It is recommended that users apply beta-mixture quantile normalization (BMIQ) to their data | ||
#' prior to prediction. This was the normalization method used on the training data. | ||
#' | ||
#' @param betas matrix or array of methylation values on the beta scale (0, 1), | ||
#' where the variables are arranged in rows, and samples in columns. | ||
#' | ||
#' @return produces a list with components detailed in the `mixOmics::predict` R documentation | ||
#' | ||
#' @examples | ||
#' | ||
#' To predict early preeclampsia on 450k/850k samples | ||
#' | ||
#' Load data | ||
#' data(peBetas) | ||
#' predictPreeclampsia(peBetas, dist = "max.dist") | ||
#' | ||
#' @export predictPreeclampsia | ||
#' | ||
|
||
predictPreeclampsia <- function(betas, ...){ | ||
|
||
# read in data to generate model | ||
data(trainBetas, envir=environment()) | ||
data(trainLabels, envir=environment()) | ||
|
||
# model | ||
set.seed(2022) | ||
mod = mixOmics::splsda(trainBetas, trainLabels, ncomp = 1, keepX = 45) | ||
trainCpGs = colnames(mod)$X | ||
peCpGs = mixOmics::selectVar(mod)$name | ||
|
||
# check that there are no NAs in the predictors (or if there are, how many) | ||
pp <- intersect(colnames(betas), peCpGs) | ||
|
||
if(length(pp) < length(peCpGs)){ | ||
stop(paste( | ||
"Only", length(pp), "out of 45 predictive CpGs present. All 45 predictive CpGs are needed to run the function." | ||
)) | ||
} else { | ||
message(paste(length(pp), "of 45 predictive CpGs present.")) | ||
message("BMIQ normalization is recommended for best results. If choosing other method, it is recommended to compare results to predictions on BMIQ normalized data.") | ||
} | ||
|
||
# set up data for prediction | ||
|
||
# if input data is missing any of the cpgs present in the training data, this function | ||
# adds the ones that are missing as NAs | ||
# necessary for `mixOmics::predict` to work | ||
|
||
outersect = function(x, y) { | ||
sort(c(x[!x%in%y], | ||
y[!y%in%x])) | ||
} | ||
|
||
if(inherits(betas, 'matrix')){ | ||
} else if (inherits(betas, 'array')) { | ||
} else { | ||
|
||
# throw an error | ||
print(paste0("Input data must be a matrix or an array")) | ||
} | ||
|
||
subset <- betas[,colnames(betas) %in% trainCpGs] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. consider renaming "subset" to something else, since |
||
|
||
# order | ||
subset <- subset[drop=FALSE,, trainCpGs] | ||
|
||
if(all(colnames(subset) == trainCpGs) == FALSE){ | ||
stop() | ||
} else | ||
|
||
# predict | ||
out <- mixOmics:::predict.mixo_spls(mod, subset) | ||
|
||
# get class probabilities | ||
CP <- out$predict[,,1] | ||
CP <- t(apply(as.matrix(CP), 1, function(data) exp(data)/sum(exp(data)))) | ||
CP <- as.data.frame(CP) %>% tibble::rownames_to_column("Sample_ID") | ||
CP$Pred_Class <- CP$comp1 | ||
CP <- CP %>% | ||
dplyr::mutate(Pred_Class = dplyr::case_when(EOPE > 0.55 ~ "EOPE", | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Consider renaming There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Agreed - will make this change too! |
||
EOPE < 0.55 ~ "Normotensive")) | ||
|
||
return(CP) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's delete this