-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug in metadata reading and spectral matching #109
Comments
OK, so seems there are several problems. I will look into it, thanks for reporting. |
The bug with the metadata reading and missing scores is very bizarre and I also have no idea. Are spectra somehow read in batches or so? |
Looks like we have problems with the msp files you provided. Are these in "standard" format? I have trouble finding a proper definition of the file format - NIST however defines that each spectrum has to start with The other problem is the peak list - you have in addition to the 2 elements per row (m/z and intensity) also sometimes a third element with annotation. That is at present not properly handled. I could add support for that, but would be nice to have some reference/format definition. could well be that the problem you see later with the scores is related to the problem that the peak values are not correctly handled. |
There is no proper definition of the MSP file format :D I would advise against trying to fix it because you will run into the same issues as we do with matchms where you have to support a million flavours of MSP - maybe I can just convert the spectra to NIST format and then force NAME to be the first row on NIST - how does spectra deal with it if there is no NAME present? |
if there is no NAME it will consider the full content of a msp file as being one single spectrum... we're essentially splitting by NAME. but the thing is we could split by whitespace instead. which would then not require any specific order of elements. A bigger problem for now is the 3rd column of the peaks data. I will have to think how to support that (it makes sense to also provide peak annotations if available...) |
I'm not sure if this makes sense. I'd rather see an R implementation of mzSpecLib and abandon MSP files all together - nothing is standardized etc. - we can remove the comments with matchms, that is already implemented - so overall, will try to minimize the msp and remove comments and switch to NIST format. |
maybe mgf would be more standardized as an alternative? |
Yeah this is also something to try |
Anyway. We need to at least throw an error or similar if we get encounter an unexpected MSP format. |
I have a PR in |
When reading spectra from the files in the attached archive, multiple things go wrong.
Firstly, some metadata is not read correctly (missing and inserted as NA) and also the individual entries end up in the wrong places, so the InChIKey from spectrum 2 is assigned to spectrum 1 and InChIKey of spectrum 2 is then NA.
Also, during matching, not all scores are calculated or they are listed as NA.
The code used for matching is the following:
Also, to actually get the 0 scores, the threshold functions have to be extended with
| TRUE
because 0 scores seem to be represented as NA or so.problematic.zip
The text was updated successfully, but these errors were encountered: