-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ERROR - reading gene covariate file: duplicate variable name 'Fetal_Neuron' #156
Comments
Originally reported by @Al-Murphy. Potentially related to:
|
One thing I'm noticing is that the error only occurs with specific combinations of CTD level and test type. Specifically, CTD level 2 with the linear tests is the only one that's failing. |
We can see the celltype names aren't duplicated in the original CTD: colnames(HCL$level_2$specificity_quantiles)[duplicated(colnames(HCL$level_2$specificity_quantiles))]
> character(0) This remains true even after restandardising the CTD: HCL2=EWCE::standardise_ctd(HCL, force_standardise = T)
colnames(HCL2$level_2$specificity_quantiles)[duplicated(colnames(HCL2$level_2$specificity_quantiles))]
> character(0) So something is happening further downstream of this step. |
Ok, I think i pinpointed the reason. At level 2 the CTD contains the cell types "Fetal_Neuron" and "Fetal_neuron". gcf <- data.table::fread("/var/folders/hd/jm8lzp7s4dl_wlkykzhz66x80000gn/T//RtmpWR0d37/file1c2f6515b4df")
cols <- grep("fetal_neuron",names(gcf), ignore.case = TRUE, value = TRUE)
cols
> "Fetal_neuron" "Fetal_Neuron"
gcf[,cols,with=FALSE]
R doesn't recognize these as duplicates, but internally MAGMA must be ignoring case so it does recognize them as duplicates and thus throws the error. Specifically at this step:
|
I could add a step to drop dup columns when ignoring case, but the real solution is to regenerate the CTD after correcting the cell type annotations, because this will alter the expression and specificity scores. |
I've made some updates in MAGMA.Celltyping 2.0.14 (now pushed to GH), so that it automatically drops duplicate celltypes, but gives users more informative messages about why they're being dropped and which ones. It also recommends to them to reprocess the CTD accordingly. |
Checklist
Affected version
2.0.13
I'm guessing @Al-Murphy is using the latest version.
Steps to reproduce the bug
Actual behavior
Expected behavior
Returns enrichment results.
Session info
The text was updated successfully, but these errors were encountered: