Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MEO Node ID Error #189

Open
ntalluri opened this issue Oct 10, 2024 · 3 comments
Open

MEO Node ID Error #189

ntalluri opened this issue Oct 10, 2024 · 3 comments

Comments

@ntalluri
Copy link
Collaborator

ntalluri commented Oct 10, 2024

If a node is in the format [name]_[info], MEO ignores the [info] part and only uses the [name] during reconstructions.

This happened when doing parameter tuning egfr dataset with MEO, the node names were missing the “_HUMAN” suffix. In the gold standard dataset, nodes such as EGF_HUMAN and EDGR_HUMAN are expected, but in the MEO output, they appear without the suffix. This error led to zero matching nodes to the gold standard nodes during evaluation.

input prize file

NODEID	prize	sources	targets	active
1433Z_HUMAN	1.041379133		True	True
41_HUMAN	3.389112802		True	True
4ET_HUMAN	2.569973509		True	True

raw pathway:

Source	Type	Target	Oriented	Weight
ERBB2	pd	CTNB1	true	0.553333
ERBB2	pd	CDK1	true	0.773333
EGFR	pd	SHC1	true	0.666667
EGFR	pd	41	true	0.553333
@agitter
Copy link
Collaborator

agitter commented Oct 10, 2024

MEO may not support node IDs with underscores based on Neha's EGFR dataset testing and my initial inspection of its Vertex class: https://github.com/agitter/meo/blob/master/src/alg/Vertex.java#L45-L50

@ntalluri
Copy link
Collaborator Author

Doing something similar that domino does would help with fixing this issue

Ex.replacing the "" with a "-" and then in post processing, replacing the "-" with ""

@agitter
Copy link
Collaborator

agitter commented Oct 21, 2024

The relevant DOMINO code is:

spras/spras/domino.py

Lines 215 to 231 in a26f4d0

def pre_domino_id_transform(node_id):
"""
DOMINO requires module edges to have the 'ENSG0' string as a prefix for visualization.
Prepend each node id with this ID_PREFIX.
@param node_id: the node id to transform
@return the node id with the prefix added
"""
return ID_PREFIX + node_id
def post_domino_id_transform(node_id):
"""
Remove ID_PREFIX from the beginning of the node id if it is present.
@param node_id: the node id to transform
@return the node id without the prefix, if it was present, otherwise the original node id
"""
return node_id.removeprefix(ID_PREFIX)

I don't believe replacing "_" with "-" will work because then "A_B" and "A-B" would be the same. Those IDs are unlikely to both be in the network, but they could be. We'll need to replace "_" with another string that is very unlikely to appear in real protein names. If MEO supports Unicode, using some combination of strange Unicode characters would be safe.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants