Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a function to append multiple spectra variables to a Spectra #342

Open
jorainer opened this issue Nov 21, 2024 · 3 comments
Open

Add a function to append multiple spectra variables to a Spectra #342

jorainer opened this issue Nov 21, 2024 · 3 comments
Assignees
Labels
enhancement New feature or request

Comments

@jorainer
Copy link
Member

We have already the joinSpectraData() function, but I would still like to have a different (maybe easier?) way of adding new spectra variable columns to a Spectra object. With [[<- and $<- it's already possible to add a single new spectra variable (or also to replace an existing variable), but we have frequently a data.frame or DataFrame (or matrix) with multiple columns I would like to add.

I would thus propose to have an additional function, e.g. cbind,Spectra,ANY <- function(x, y) that allows to add data for multiple values in one go. The properties would be:

  • nrow(y) has to be same as length(x)
  • throw an error if x has already a spectra variables with a name, to avoid replacing values, but just adding new values.

ideally, this should be added as a new method for MsBackend with a default implementation that (re)uses the [[<- method and a loop - some backends, such as MsBackendMemory could implement their own version that directly calls cbind on its @spectraData data.frame.

@lgatto @sgibb , any thoughts?

ping @philouail

@jorainer jorainer added the enhancement New feature or request label Nov 21, 2024
@lgatto
Copy link
Member

lgatto commented Nov 21, 2024

That sounds like a useful thing to have, indeed.

nrow(y) has to be same as length(x) is equivalent to nrow(y) is identical to nrow(spectraData(x)). Any consideration regarding regarding rownames(x), or some mechanism to insure that the rows or y and spectraData(x) are matching when cinding?

@jorainer
Copy link
Member Author

yes, nrow(y) is equivalent to nrow(spectraData(x)) - but a backend can have a more efficient lenght() method implemented that does not require to first get the spectra data with the spectraData(x) call.

I would actually keep the function simple and don't do any checks on rownames - or change row names. we don't explicitly support or force row names on spectraData(), as e.g. if you have a SQL backend, there are no row names available.

So, cbind() should blindly append columns (i.e. spectraVariables) to a Spectra assuming the user provides the rows in the correct order (similar to a $<- call). I would throw an error if a) nrow() does not match the number of spectra in Spectra or if any of the colnames in y are already spectraVariables().

And actually, I would maybe implement cbind2() instead of cbind() - then we have a bit better control over the dispatching.

@lgatto
Copy link
Member

lgatto commented Nov 22, 2024

So, cbind() should blindly append columns (i.e. spectraVariables) to a Spectra assuming the user provides the rows in the correct order (similar to a $<- call).

My suggestion would be to be explicit in the manual page and mention these risks, and then point to joinSpectraData() for a more complete/safe option (and inversely, mention cbind[2]() in the joinSpectraData() section). More seasoned users will be aware of the dangers or cbind(), but new users who are used to tidyverse's joins might not be.

And actually, I would maybe implement cbind2() instead of cbind() - then we have a bit better control over the dispatching.

👍🏻

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants