Ricgraph scripts can be found in various places:
- Directory harvest: scripts for harvesting sources and inserting the results in Ricgraph. Documentation for these scripts.
- Directory import_export: scripts to export items from Ricgraph. Documentation for these scripts.
- Directory enhance: scripts for finding and enriching items from Ricgraph. Documentation for these scripts.
- The module code ricgraph.py can be found in directory ricgraph.
- The code for Ricgraph Explorer can be found in directory ricgraph_explorer. Documentation for Ricgraph Explorer.
- Documentation for writing your own scripts (this file).
All code is documented and hints to use it can be found in the source files.
Return to main README.md file.
You can make your own harvesting script of your favorite source. The easiest way to do so is to take one of the harvesting scripts as an example. For example, if you use the script harvest_pure_to_ricgraph.py, you'll recognize the three parts:
- Code for harvesting. This is done with
harvest_json_and_write_to_file()
which also writes the harvested json data to a file. It gets data from a source. - Code for parsing. This is done with
parse_pure_persons()
,parse_pure_organizations()
andparse_pure_resout()
for persons, organizations and research outputs from Pure. It does data processing to get harvested results in a "useful" shape for inserting nodes and edges in Ricgraph. - Code for inserting the parsed results in Ricgraph. This is done with
parsed_persons_to_ricgraph()
,parsed_organizations_to_ricgraph()
andparsed_resout_to_ricgraph()
. It inserts the nodes and edges in Ricgraph.
You can adapt each of these parts as suits the source you would like to harvest.
import ricgraph as rcg
rcg.open_ricgraph()
rcg.empty_ricgraph() # use this only if you need to empty the graph
# some things happen
rcg.close_ricgraph()
This structure is used in the programming examples in the directory harvest.
import ricgraph as rcg
rcg.open_ricgraph()
rcg.empty_ricgraph() # use this only if you need to empty the graph
# Harvesting code: code to get data from a system
# Parsing code: post process the data found, and put it in a format that
# can easily be processed in Python, e.g. in a DataFrame
# Code to store the post processed results in Ricgraph
rcg.close_ricgraph()
Ricgraph stores objects and relations to objects. Therefore, most calls to insert nodes in have two nodes as parameter that are to be connected. Or two sets of nodes. Examples of these calls are (without the opening, emptying and closing of the graph):
import ricgraph as rcg
# example 1
rcg.create_two_nodes_and_edge() # create two nodes and connect with one edge
# example 2
rcg.create_nodepairs_and_edges_df() # the same, now using a DataFrame to insert
# a number of node pairs and their edges in one go
# example 3
rcg.create_nodepairs_and_edges_params() # the same, now using Python Dicts to insert
# a number of node pairs and their edges in one go
Unification is the process of making sure that every personal identifier found for a certain person is connected to every other, via the person-root node. E.g., if there are four identifiers for a person: ORCID, ISNI, FULL_NAME and SCOPUS_AUTHOR_ID, they have to be unified pairwise. There is a function call to make this easier:
import ricgraph as rcg
rcg.unify_personal_identifiers() # takes a DataFrame with all identifiers to be unified
Of course, there are function calls to create, read, update and delete (CRUD) nodes. "Read" is used as term for "Find" or "Search".
import ricgraph as rcg
rcg.create_update_node() # create or update a node
rcg.read_node() # read (find) a node and return one
rcg.read_all_nodes() # read (find) nodes and return all nodes found
rcg.delete_node() # delete a node
There are several function calls to get neighbors of nodes. For a more extensive description how to use these, see the code comments in file ricgraph.py in directory ricgraph or the code examples in file ricgraph_explorer.py in directory ricgraph_explorer.
import ricgraph as rcg
rcg.get_personroot_node() # get a 'person-root' node starting from any 'person' node
rcg.get_all_personroot_nodes() # get all 'person-root' nodes (there should be only one)
rcg.get_all_neighbor_nodes() # get all neighbor nodes connected to a node.
# it is possible to restrict to nodes having
# a certain property 'name' or 'category'