-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
addition of MetaboLightsParam #17
Conversation
@jorainer the tests pass, but not for linux aha, do you know what is happening ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks really nice @philouail ! Thanks!
I have some change requests - one is actually open for discussion: maybe it would help if we in addition add a parameter check.names = FALSE
that is passed to all read.table()
functions as well as the conversion of the sample data data.frame
to a DataFrame
(which will again start messing with column names). So, with check.names = FALSE
users would get what they see online in the MetaboLights database, and with check.names = TRUE
the columns would be formatted/fixed in the standard R fashion.
R/MsExperiment.R
Outdated
fl <- object@spectra@backend@spectraData[1, "derived_spectral_data_file"] | ||
nme <- colnames(merged_data)[which(merged_data[1, ] == fl)] | ||
merged_data <- merged_data[grepl(param@filePattern, | ||
merged_data[, nme]), ] | ||
|
||
object@sampleData <- DataFrame(merged_data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here I would suggest to also call DataFrame(merged_data, check.names = FALSE)
to avoid renaming of column names. IMHO it would be best if the content (column names and content) of the sampleData()
is exactly the same as what the user sees in MetaboLights online/web. Thus, also no replacing of
with _
etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixing the DataFrame thing but see my other comment regarding the rest :)
files <- unlist(files) | ||
files <- gsub("href=\"|\"", "", files) | ||
function(object, param, keepOntology = TRUE, keepProtocol = TRUE, | ||
simplify = TRUE, ...) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe add a check.names = FALSE
parameter too? that one could be passed to both the read.table()
calls as well as the object@sampleData <- DataFrame
call. To me that would be a nice parameter that allows users to configure how they want the sample data column names to be processed - and saves us from defining how to handle white spaces in column names etc.
R/MsExperiment.R
Outdated
.clean_merged <- function(x, keepProtocol, keepOntology, simplify) { | ||
# remove ontology | ||
if (!keepOntology) | ||
x <- x[, -which(grepl("Term", names(x)))] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jorainer continuing the discussion about checking column names or not here. I believe we cannot give the choice to the user whether or not to have numbered column. because as soon as this (or the one below) subsetting happen, there is an automatic renaming of duplicated columns. I could not find a way to prevent that..
Also I don't really want to allow the read.table(check.names = TRUE) when importing the assay and sample info, because in each there is already duplication of column, so after merging AND subsetting there would be double numbering...
Also the tables will be different compared to MetaboLights anyway because a lot of columns are not even present on the MetaboLights website...(they derive from the ontology term that the user input)
Lastly, even If i find a way to have as an output a table with duplicated column names with NO numbering, if the user subset the MsExperiment
object by calling []
the numbering will get automatically added..
So I think it is a bit more complicated than expected this numbering thing....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me! thanks Phili!
Here is the code in order to get a full MsExperiment object with spectra and sample data from the database.
I changed from
merged()
tocbind()
, it is more stable i found and does not show warnings because of duplicated columns.Tell me what you think about the extra parameters I implemented in order to simplify the sampleData !
Also see my note in the code (in the helper function description) about the duplicated ontology column automatic numbering so we can discuss it.
Lastly, unfortunately the GHA beaks because of MsBackendMetaboLights, is there a workaround ? For me the unit tests work locally.
Any other things I'm missing also ? Should there be an
offline
parameter similar to MsBackendMetaboLights ?