Understanding and reformatting the knowledge graph. #2

sgfin · 2018-07-14T18:32:38Z

Thanks for sharing this knowledge graph! I would love to be able to do a compare and contrast with some other methods, and ideally expand it a bit by joining it with other resources.

My apologies for the question of ignorance, but as a preliminary step, I am trying to convert the knowledge graph into a simpler triple format that I can load as a flat file into something like numpy. As such, I want to be sure I correctly understand the structure.

Could you confirm if I am reading this correctly? It appears that each triple forms two rows that look like this

<http://www.ncbi.nlm.nih.gov/gene/448835> <http://purl.obolibrary.org/obo/RO_0000085> <http://aber-owl.net/go/instance_0> . <http://aber-owl.net/go/instance_0> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://purl.obolibrary.org/obo/GO_0031424> .

. Of the sets of brackets, it appears the first identifies the source node, the second encodes the edge's relationship, and the sixth identifies the target node. The third/fourth, appear to be an identifier of the tuple and the fifth appears to be the same everywhere.

Is the above interpretation correct? If so, is there an easy way to build up a simple dictionary of the node/edges urls? I'd prefer to encode them as simple numbers with a separate table mapping each number to a string, but couldn't find a node/edge dictionary in the repo.

Thanks so much again for all your work, and I hope this isn't a pain for you to answer.

The text was updated successfully, but these errors were encountered:

monaalsh · 2018-07-15T10:00:51Z

Thanks for your interest .
Yes, every row represents a triple, please refer to the paper for details about representing instances and ontology classes.
As for the graph output, you can use
RDFWrapper.groovy script which takes this input graph and can output
an edge list (which can be used for creating a python dictionary) and a mapping file to map each URI to an integer ID.

sgfin · 2018-07-17T04:16:53Z

Thanks so much for your response. My apologies, but one more question:

Do you by any chance have a mapping file between URIs and either UMLS CUIs or their original source IDs (Pubchem, GO, etc.)? It looks like the code may reference some of these files in a data folder (which are also maybe used to test performance grouped by class?), but I don't see them.

Alternatively, do you know if there is a package in Python that would facilitate the URI -> original ID conversion, perhaps by utilizing the links out to the ontology? I am hoping to integrate with some other datasets, so I can't simply use custom integer values, and the URIs are not completely trivial to parse into their original IDs, though it looks like I may be able to hand engineer such a parser by inspection.

Thanks again

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Understanding and reformatting the knowledge graph. #2

Understanding and reformatting the knowledge graph. #2

sgfin commented Jul 14, 2018 •

edited

Loading

monaalsh commented Jul 15, 2018 •

edited

Loading

sgfin commented Jul 17, 2018

Understanding and reformatting the knowledge graph. #2

Understanding and reformatting the knowledge graph. #2

Comments

sgfin commented Jul 14, 2018 • edited Loading

monaalsh commented Jul 15, 2018 • edited Loading

sgfin commented Jul 17, 2018

sgfin commented Jul 14, 2018 •

edited

Loading

monaalsh commented Jul 15, 2018 •

edited

Loading