Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reaction_string corresponds to multiple ec #6

Open
yufengwhy opened this issue Sep 20, 2024 · 2 comments
Open

reaction_string corresponds to multiple ec #6

yufengwhy opened this issue Sep 20, 2024 · 2 comments

Comments

@yufengwhy
Copy link

Is it rational that a reaction_string corresponds to multiple ec in your provided data cached_enzymemap.p ?

Such as:

O.O=[N+]([O-])c1ccc(OP(=O)(O)O)cc1>>O=P(O)(O)O.O=[N+]([O-])c1ccc(O)cc1 {'3.1.3.5', '3.1.3.41', '3.1.1.2', '3.1.8.1', '3.1.3.26', '3.1.3.8', '3.1.3.2', '3.6.1.1', '3.1.4.46', '3.1.3.21', '3.9.1.3', '3.1.3.25', '3.1.3.23', '3.1.6.6', '3.1.4.16', '3.1.3.9', '3.1.3.16', '3.1.3.62', '3.1.3.89', '3.9.1.2', '3.6.1.9', '3.1.3.75', '3.1.3.85', '3.1.3.48', '3.1.3.3', '3.1.3.1', '3.1.3.18', '3.1.3.58', '3.1.6.1', '3.1.3.73'} 30

@pgmikhael
Copy link
Owner

Hi,

This is parsed directly from the EnzymeMap dataset, and I think their paper goes into the mapping a bit. Additionally, some ECs are not so specific that they do apply to multiple reactions (like 1.1.1.1) -- others may refer to slightly different substrates, but are still closely related.

@benjamin-perry-duke
Copy link

I work in this space and think this definitely should require some further investigation. EnzymeMap just takes the BRENDA .txt file download from a year or two ago and parses out the information prior to filtering. The reaction you included falls broadly under 3.1.3 which is Phosphoric-monoester hydrolases, but the specific reagents should only be one EC number (maybe a couple more if the EC numbers are non-specific).

I've identified the problem, in many of these EC numbers, they contain a chart of enzyme-ligand interactions. This chart (for example for this EC number contains p-nitrophenyl phosphate + H2O (the substrate on the reaction @yufengwhy mentioned) yet the EC number for that page is for phosphoglycolate phosphatase.

So it seems like either BRENDA has a problem with their data curation or EnzymeMap did not properly filter (perhaps a filter to associate the name of the reaction to the actual EC class name would be sufficient here). Either way, this means the dataset likely has significant false entries they likely alter the results of the paper and many other models built on this dataset. I am not sure how you filtered but might be worth taking a look at. I'm at Duke BME so happy to collaborate further @pgmikhael if you'd like.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants