-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Baptiste Nusaibah validation #11
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Très bon début c'est super !
Il faut qu'on discute de l'output.
A terme, il faut que ça ait la structure d'un test de package, donc avec la syntaxe de testthat que je vous avais présentée.
https://github.com/Oncostat/grstat/blob/main/tests/testthat/test-ae-tables.R
Cette syntaxe n'a pas de nuance, soit le test passe, soit il échoue.
Proposition :
- utiliser
expect_xxx()
pour tester les différences de N et pct (diff majeure) - utiliser
message()
pour signaler les différences de style (diff mineure), genre les informations manquantes dans une tables mais pas dans l'autre
Par contre c'est difficile de lire le code sans voir l'application vu que les data sont private.
Ce serait compliqué de faire ce qu'il y a dans #9 pour pouvoir tout faire sur GitHub ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ce seront des fonctions de testing, elles ne doivent pas aller dans R/
Utilise usethis::use_test_helper()
Tu peux aussi aller voir la doc de testthat: https://cran.r-project.org/web/packages/testthat/vignettes/special-files.html
|
||
if (ncol(tabR)!=ncol(tabSAS)){stop("Different number of arm")} | ||
if (all(dim(tabR)==dim(tabSAS))){ | ||
print("Check: same dimension of tables") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
programmation défensive: on ne print pas si tout va bien, on warn s'il y a un problème
if (all(dim(tabR)==dim(tabSAS))){ | ||
print("Check: same dimension of tables") | ||
df=tabR%>%arrange(grade)%>%full_join(tabSAS,by="grade",suffix = c(".r",".sas")) | ||
indice=which(df[,paste0(tabR%>%select(-grade)%>%colnames(),paste=".r")]!=df[,paste0(tabSAS%>%select(-grade)%>%colnames(),paste=".sas")], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
je n'aime vraiment pas les indices, je trouve que c'est à risque d'erreur
Cf mon commentaire sur Teams
mutate(grade = replace_na(grade, 0)) %>% | ||
group_by(grade) %>% | ||
mutate(across(starts_with("N"), ~sum(., na.rm = T))) %>% | ||
distinct(grade, .keep_all = T) %>% |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On ne peut pas remplacer mutate+distinct par summarise ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
je crois que c'est parce que je n'arrivais pas à keep toutes les variables dans la base avec summarize(.by=) quand il y avait plusieurs bras
data <- colnames(data) %>% | ||
imap( | ||
~data %>% select(all_of(.x)) %>% | ||
separate(.x, into = c(paste0("N", .y), paste0("pct", .y)), sep = "\\(") | ||
) %>% | ||
bind_cols() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://tidyr.tidyverse.org/reference/separate_wider_delim.html
On doit pouvoir s'en sortir sans boucle avec separate_wider_regex()
, pas évident mais l'exemple aide beaucoup.
ngroups <- (ncol(data) - 1) / 2 | ||
for(i in 1:ngroups){ | ||
npatients <- sum(data[, paste0("N", i)]) | ||
data[data$grade == 0, paste0("pct", i)] <- round(data[data$grade == 0, paste0("N", i)] * 100 / npatients, round) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Je vais avoir besoin de lancer le code pour trouver comment appliquer purrr
mutate(grade = replace_na(grade, 0)) %>% | ||
group_by(grade) %>% |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mutate(.by=grade)
, plus concis et ne nécessite pas de ungroup()
if (nrow(tabR)!=nrow(tabSAS)){stop("Different number of grade levels") | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
soit sur une ligne, sans les {},
soit sur 3 lignes
jamais sur 2
data <- colnames(data) %>% | ||
imap( | ||
~data %>% select(all_of(.x)) %>% | ||
separate(.x, into = c(paste0("N", .y), paste0("pct", .y)), sep = "\\(") | ||
) %>% | ||
bind_cols() | ||
|
||
#extraction of figures into numeric columns | ||
data <- data %>% | ||
mutate( | ||
across(everything(), ~as.numeric(str_extract(.x, "\\d+\\.?\\d*"))) | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Belle utilisation de imap()
:-)
Il y a une nouvelle fonction detidyr
qui ferait le taff aussi (à adapter à plusieurs bras, c'est juste pour l'exemple):
data %>%
separate_wider_regex(cols=-c(.id, label, variable, grade),
patterns=c(N="\\d+", " \\(", pct="\\d+", "%\\)"))
Add of functions for validation of function ae_table_grade