Skip to content

A collection of tools for graph synthesis, processing and analysis

License

Notifications You must be signed in to change notification settings

IntelCompH2020/GraphAnalysisToolbox

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Graph Analysis Toolbox

The Graph Analysis Toolbox is a generic software for the management and processing of a interrelated collection of multiple graphs.

It can be used to process multiple graphs. Functionality include (but it is not limited to):

  1. Similarity graphs: generated from node attributes, based on different similarity measures (Jensen-Shannon, Hellinger, L1, L2).
    • General implementations based on the neighbors module from scikit-learn.
    • Specific implementation for fast computation of Hellinger distances using Numba and cuda.
  2. Community detection algorithms (Louvain, Walktrap, FastGreeedy, Label Propagation)
  3. Bipartite graphs from attributes
  4. Transductive graphs: Graphs generated by connecting target nodes from a bipartite graph. Link weights are computed from the links of a graph connecting the source nodes.
  5. Transitive graphs, computed as the composition of two bipartite graphs.
  6. Analysis of graph partitions.
  7. Analysis of graph nodes (centrality measures, PageRank).
  8. Edicion tools for the collection of graphs:
    • Create, add, remove graphs
    • Subsampling
    • Reduction to graphs of equivalence classes
  9. Tools for visualization:
    • Graph layout algorithms.
    • Exportation to GEXF format
    • Visualization of bipartite graphs (requires Halo, not included)

Usage:

As an application:

The software includes two applications that can be used to generate and manipulate graphs through an interactive menu:

  • mainRDIgraphs.py: Provides accces to the sofware functionality through an interative menu. It reads the links to the source data from a configuration file (parameters.yaml). You would need to edit this file to use other data.
  • mainRDIlab.py: It uses the software functionality to carry out experiments for analysing RDI corpus collections.

Write

python mainRDIgraphs.py --h
python mainRDIlab.py --h

to see the available options.

As a sofware package:

The software include several class packages that can be used independently. Classes include (and are not limited to):

  • SimGraph: Generation of similarity graphs
  • CommunityPlus: Wrapper to community detection algorithms
  • DataGraph (requires SimGraph and CommunityPlus): provides tools for graph processing and analysis.
  • SuperGraph (requires DataGraph): provides tools for handling collections of DataGraph objects, including tools for the generation of new datagraphs.

This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No. 101004870. H2020-SC6-GOVERNANCE-2018-2019-2020 / H2020-SC6-GOVERNANCE-2020