Skip to content

Header-Only C++ Library for Graph Representation and Algorithms

License

Notifications You must be signed in to change notification settings

bob-feracho/CXXGraph

 
 

Repository files navigation

CXXGraph

DOI

DOI

codecov CodeFactor

GitHub license GitHub release

Generic badge Generic badge Generic badge

Generic badge Generic badge







Introduction

CXXGraph is a comprehensive C++ library that manages graph algorithms. This header-only library serves as an alternative to the Boost Graph Library (BGL).

We are Looking for...

We are looking for:

  • A Web Developer for the development of the CXXGraph website. All documentation is currently hosted on this GitHub page.
  • Developers and Contributors to provide input. If you are new to the open-source world, we will guide you step by step!

If you are interested, please contact us at zigrazor@gmail.com or contribute to this project. We are waiting for you!

Roadmap

Completed Description Date of Completition
✔️ Release 0.4.0 Oct 7, 2022
✔️ Release 0.5.0 Mar 23, 2023
✔️ First Stable Release 1.0.0 Mar 28, 2023
✔️ Release 1.0.1 May 7, 2023
✔️ Release 1.1.0 May 8, 2023
✔️ Stable Release 2.0.0 Jun 1, 2023
✔️ Stable Release 3.0.0 Nov 3, 2023
✔️ Release 3.1.0 Jan 9, 2023
📝 Introduce Hypergraph #122 TBD
📝 Stable Release 4.0.0 TBD

Table of Contents

Install and Uninstall

Install Linux Tarballs

To install on Unix/Linux systems, execute the following from the command line:

$ sudo tar xjf CXXGraph-{version}.tar.bz2

To uninstall:

$ sudo rm -f /usr/include/Graph.hpp /usr/include/CXXGraph*

Install RPM

To install on Fedora/CentOS/RedHat systems, execute the following from the command line:

$ sudo rpm -ivh CXXGraph-{version}.noarch.rpm

To uninstall:

$ sudo rpm -e CXXGraph-{version}

Install DEB

To install on Debian/Ubuntu systems, execute the following from the command line:

$ sudo dpkg -i CXXGraph_{version}.deb

To uninstall:

$ sudo apt-get remove CXXGraph

Install From Source

For self-compiled installations using CMake, execute the following from the command line once compilation is complete:

$ sudo make install

Classes Explanation

The Classes Explanation can be found in the Classes Section of the Doxygen Documentation

Prerequisites

  • The minimum C++ standard required is C++17
  • A GCC compiler version 7.3.0 and later OR a MSVC compiler that supports C++17

How to use

To use the library simply put the header file where you need it. It's that easy!

Example

Work in Progress

Unit-Test Execution

The Unit-Test requires CMake 3.9 and later, and the GoogleTest library.

Install GoogleTest

GoogleTest

git clone https://github.com/google/googletest.git
cd googletest        # Main directory of the cloned repository
mkdir -p build       # Create a directory to hold the build output
cd build
cmake ..             # Generate native build scripts for GoogleTest
make                 # Compile
sudo make install    # Install in /usr/local/ by default

How to Compile GoogleTest

From the base directory:

mkdir -p build       # Create a directory to hold the build output
cd build             # Enter the build folder
cmake -DTEST=ON ..   # Generate native build scripts for GoogleTest,
make                 # Compile

How to Run GoogleTest

After the build has compiled, run the "test_exe" executable in the "build" directory with the following command:

./test_exe

Benchmark Execution

The Benchmark requires CMake 3.9 and later, the GoogleTest library, and the Google Benchmark library.

Install Google Benchmark

Google Benchmark

# Check out the library
$ git clone https://github.com/google/benchmark.git
# Google Benchmark requires GoogleTest as a dependency. Add the source tree as a subdirectory
$ git clone https://github.com/google/googletest.git benchmark/googletest
# Go to the library's root directory
$ cd benchmark
# Make a build directory to place the build output
$ cmake -E make_directory "build"
# Generate the build system files with CMake
$ cmake -E chdir "build" cmake -DCMAKE_BUILD_TYPE=Release ../
# If starting with CMake 3.13, you can use the following:
# cmake -DCMAKE_BUILD_TYPE=Release -S . -B "build"
# Build the library
$ cmake --build "build" --config Release
# Install the library
$ sudo cmake --build "build" --config Release --target install

How to Compile Google Benchmark

From the base directory:

mkdir -p build             # Create a directory to hold the build output
cd build                   # Enter the build folder
cmake -DBENCHMARK=ON ..    # Generate native build scripts for Google Benchmark
make                       # Compile

How to Run Google Benchmark

After the build has compiled, run the "benchmark" executable in the "build" directory with the following command:

./benchmark

Benchmark Results

You can check the benchmark result using this link.

Packaging

Tarballs

To create a tarball package, execute the following from the command line:

# Enter Packaging Directory
$ cd packaging
# Execute the script to generate tarballs
$ ./tarballs.sh

RPM

(Fedora/CentOS/RedHat)

To create an RPM package, execute the following from the command line:

# Enter Packaging Directory
$ cd packaging/rpm
# Execute the script to generate tarballs
$ ./make_rpm.sh

DEB

(Debian/Ubuntu)

To create a deb package, execute the following from the command line:

# Enter Packaging Directory
$ cd packaging/deb
# Execute the script to generate tarballs
$ ./make_deb.sh

Algorithm Explanation

Dijkstra

Graph Dijkstras Shortest Path Algorithm(Dijkstra's Shortest Path) [Dijkstra's Algorithm](https://www.interviewbit.com/blog/find-shortest-path-dijkstras-algorithm/) is used to find the shortest path from a source node to all other reachable nodes in the graph. The algorithm initially assumes all the nodes are unreachable from the given source node so we mark the distances of all nodes as infinity. (infinity) from source node (INF / infinity denotes unable to reach).

Dial

Dial specialization of dijkstra’s algorithm.

When edge weights are small integers (bounded by a parameter C), specialized queues which take advantage of this fact can be used to speed up Dijkstra's algorithm. The first algorithm of this type was Dial's algorithm (Dial 1969) for graphs with positive integer edge weights, which uses a bucket queue to obtain a running time O(|E|+|V|C).(source wikipedia)

Below is complete algorithm:

  1. Maintains some buckets, numbered 0, 1, 2,…,wV.
  2. Bucket k contains all temporarily labeled nodes with distance equal to k.
  3. Nodes in each bucket are represented by list of vertices.
  4. Buckets 0, 1, 2,..wV are checked sequentially until the first non-empty bucket is found. Each node contained in the first non-empty bucket has the minimum distance label by definition.
  5. One by one, these nodes with minimum distance label are permanently labeled and deleted from the bucket during the scanning process.
  6. Thus operations involving vertex include:
    • Checking if a bucket is empty
    • Adding a vertex to a bucket
    • Deleting a vertex from a bucket.
  7. The position of a temporarily labeled vertex in the buckets is updated accordingly when the distance label of a vertex changes.
  8. Process repeated until all vertices are permanently labeled (or distances of all vertices are finalized).

At this link you can find a step-by-step illustrations.

Prim's Algorithm

Prim's Algorithm Prim's Algorithm is is a greedy algorithm that finds a minimum spanning tree for a weighted undirected graph. This means it finds a subset of the edges that forms a tree that includes every vertex, where the total weight of all the edges in the tree is minimized. The algorithm operates by building this tree one vertex at a time, from an arbitrary starting vertex, at each step adding the cheapest possible connection from the tree to another vertex.

Steps:

  1. Initialize a tree with a single vertex, chosen arbitrarily from the graph.
  2. Grow the tree by one edge: of the edges that connect the tree to vertices not yet in the tree, find the minimum-weight edge, and transfer it to the tree.
  3. Repeat step 2 (until all vertices are in the tree).

BFS

(Breadth First Search) Breadth First Search Algorithm(Breadth First Search) Breadth First Search, also quoted as BFS, is a Graph Traversal Algorithm. Time Complexity O(|V| + |E|) where V are the number of vertices and E are the number of edges in the graph. Applications of Breadth First Search are :

  1. Finding shortest path between two vertices say u and v, with path length measured by number of edges (an advantage over depth first search algorithm)
  2. Ford-Fulkerson Method for computing the maximum flow in a flow network.
  3. Testing bipartiteness of a graph.
  4. Cheney's Algorithm, Copying garbage collection.

And there are many more...

DFS

(Depth First Search) Depth First Search Algorithm (Depth First Search) Depth First Search, also quoted as DFS, is a Graph Traversal Algorithm. Time Complexity O(|V| + |E|) where V is number of vertices and E is number of edges in graph. Application of Depth First Search are:

  1. Finding connected components
  2. Finding 2-(edge or vertex)-connected components.
  3. Finding 3-(edge or vertex)-connected components.
  4. Finding the bridges of a graph.
  5. Generating words in order to plot the limit set of a group.
  6. Finding strongly connected components.

And there are many more...

Best First Search

Best First Search Best First Search is a class of search algorithms which traverses the graph by exploring the most promising node chosen according to an evaluation function. The worst-case time complexity is O(n * log n) where n is the number of nodes in the graph.

Cycle Detection

Cycle (graph theory)

The existence of a cycle in directed and undirected graphs can be determined by whether depth-first search (DFS) finds an edge that points to an ancestor of the current vertex (it contains a back edge). All the back edges which DFS skips over are part of cycles. In an undirected graph, the edge to the parent of a node should not be counted as a back edge, but finding any other already visited vertex will indicate a back edge. In the case of undirected graphs, only O(n) time is required to find a cycle in an n-vertex graph, since at most n − 1 edges can be tree edges.

Many topological sorting algorithms will detect cycles too, since those are obstacles for topological order to exist. Also, if a directed graph has been divided into strongly connected components, cycles only exist within the components and not between them, since cycles are strongly connected.

For directed graphs, distributed message based algorithms can be used. These algorithms rely on the idea that a message sent by a vertex in a cycle will come back to itself. Distributed cycle detection algorithms are useful for processing large-scale graphs using a distributed graph processing system on a computer cluster (or supercomputer).

Applications of cycle detection include the use of wait-for graphs to detect deadlocks in concurrent systems.

Bellman-Ford

Bellman-Ford Algorithm can be used to find the shortest distance between a source and a target node. Time Complexity O(|V| . |E|) where V is number of vertices and E is number of edges in graph which is higher than Dijkstra's shortest path algorithm. The time complexity of dijkstra's algorithm is O(|E| + |V| log |v| ). The advantage of bellman-ford over dijkstra is that it can handle graphs with negative edge weights. Further, if the graph contains a negative weight cycle then the algorithm can detect and report the presense of negative cycle.

This video gives a nice overview of the algorithm implementation. This MIT lecture gives a proof of Bellman-Ford's correctness & its ability to detect negative cycles. Applications:

  • Distance‐vector routing protocol
  • Routing Information Protocol (RIP)
  • Interior Gateway Routing Protocol (IGRP)

Floyd Warshall

Floyd Warshall Algorithm

We initialize the solution matrix same as the input graph matrix as a first step. Then we update the solution matrix by considering all vertices as an intermediate vertex. The idea is to one by one pick all vertices and updates all shortest paths which include the picked vertex as an intermediate vertex in the shortest path. When we pick vertex number k as an intermediate vertex, we already have considered vertices {0, 1, 2, .. k-1} as intermediate vertices. For every pair (i, j) of the source and destination vertices respectively, there are two possible cases.

  1. k is not an intermediate vertex in shortest path from i to j. We keep the value of dist[i][j] as it is.
  2. k is an intermediate vertex in shortest path from i to j. We update the value of dist[i][j] as dist[i][k] + dist[k][j] if dist[i][j] > dist[i][k] + dist[k][j]

Transitive Reduction

Transitive Reduction

This algorithm is used to construct a directed graph with the same reachability and satisfies transitive closure, with as few edges as possible. More concretely, it creates a minimum equivalent graph with as few edges as possible, removing "short-circuit" paths through the graph.

This is done by iterating through each node-pair, checking to see if two edges exist that leads out of the first node OR out of the last node, removing the node-pair edge if it exists.

In pseudocode: foreach x in graph.vertices foreach y in graph.vertices foreach z in graph.vertices delete edge xz if edges xy and yz exist

Our implementation has if gates that do early checking for edges in multiple places, which gives it a slightly faster runtime than the cubic pseudocode here.

Kruskal Algorithm

Kruskal Algorithm can be used to find the minimum spanning forest of an undirected edge-weighted graph. Time Complexity O(E log E) = O(E log V) where V is number of vertices and E is number of edges in graph. The main speed limitation for this algorithm is sorting the edges.

For a quick understanding of the algorithm procedure, check this video. Some of the real life applications are:

  • LAN/TV Network
  • Tour Operations
  • Water/gas pipe network
  • Electric grid

Other algorithms to find the minimum spanning forest are Prim's algorithm or Borůvka's algorithm.

Borůvka's Algorithm

Borůvka's Algorithm is a greedy algorithm that can be used for finding a minimum spanning tree in a graph, or a minimum spanning forest in the case of a graph that is not connected.

The algorithm begins by finding the minimum-weight edge incident to each vertex of the graph, and adding all of those edges to the forest. Then, it repeats a similar process of finding the minimum-weight edge from each tree constructed so far to a different tree, and adding all of those edges to the forest. Each repetition of this process reduces the number of trees, within each connected component of the graph, to at most half of this former value, so after logarithmically many repetitions the process finishes. When it does, the set of edges it has added forms the minimum spanning forest.

Borůvka's algorithm can be shown to take O(log V) iterations of the outer loop until it terminates, and therefore to run in time O(E log V), where E is the number of edges, and V is the number of vertices in G (assuming E ≥ V).

Graph Slicing based on connectivity

Mathematical definition of the problem: Let G be the set of nodes in a graph and n be a given node in that set. Let C be the non-strict subset of G containing both n and all nodes reachable from n, and let C' be its complement. There's a third set M, which is the non-strict subset of C containing all nodes that are reachable from any node in C'. The problem consists of finding all nodes that belong to C but not to M.

Currently implemented Algorithm:

  • Use DFS to find all nodes reachable from n. These are elements of set C.
  • Initialize C' to be complement of C (i.e. all nodes - nodes that are in C)
  • For all nodes in C', apply DFS and get the list of reachable nodes. This is set M.
  • Finally removes nodes from C that belong to M. This is our solution.

Application:

This algorithm is used in garbage collection systems to decide which other objects need to be released, given that one object is about to be released.

Ford-Fulkerson Algorithm

Ford-Fulkerson Algorithm is a greedy algorithm for finding a maximum flow in a flow network. The idea behind the algorithm is as follows: as long as there is a path from the source (start node) to the sink (end node), with available capacity on all edges in the path, we send flow along one of the paths. Then we find another path, and so on. A path with available capacity is called an augmenting path.

Kosaraju's Algorithm

Kosaraju's Algorithm is a linear time algorithm to find the strongly connected components of a directed graph. It is based on the idea that if one is able to reach a vertex v starting from vertex u, then one should be able to reach vertex u starting from vertex v and if such is the case, one can say that vertices u and v are strongly connected - they are in a strongly connected sub-graph. Following is an example:

1). Create an empty stack ‘S’ and do DFS traversal of a graph. In DFS traversal, after calling recursive DFS for adjacent vertices of a vertex, push the vertex to stack. 2). Reverse directions of all arcs to obtain the transpose graph. 3). One by one pop a vertex from S while S is not empty. Let the popped vertex be ‘v’. Take v as source and do DFS (call DFSUtil(v)). The DFS starting from v prints strongly connected component of v.

Kahn's Algorithm

Kahn's Algorithm finds topological ordering by iteratively removing nodes in the graph which have no incoming edges. When a node is removed from the graph, it is added to the topological ordering and all its edges are removed allowing for the next set of nodes with no incoming edges to be selected.

Welsh Powell Coloring Algorithms

Welsh Powell Coloring algorithm is a greedy vertex coloring algorithm. This algorithm is also used to find the chromatic number of a graph.

Welsh Powell Algorithm consists of following steps :

  1. Find the degree of each vertex.
  2. List the vertices in order of descending degrees.
  3. Colour the first vertex with color 1.
  4. Move down the list and color all the vertices not connected to the coloured vertex, with the same color.
  5. Repeat step 4 on all uncolored vertices with a new color, in descending order of degrees until all the vertices are coloured. Hi there, I'm creating a pull request to merge the Welsh Powell Coloring algorithm into the master branch.

The algorithm returns a std::map<Node, int> result that assigns each node to a color ordered by integers. Users can also query the minimum chromatic order of the graph by querying the highest value from the resulting map.

std::map<Node, int> result = graph.welshPowellColoring();
auto chromatic_color = std::max_element(result.begin(), result.end(),
                                [](const auto& lhs, const auto& rhs) {
                                    return lhs.second < rhs.second;
                                }

The minimum coloring starts from 1 instead of 0.

The algorithm assumes the graph to be undirected. All sources and inspirations are linked in the declaration of the algorithm and test cases.

Partition Algorithm Explanation

Vertex-Cut

A vertex-cut partitioning divides edges of a graph into equal size partitions. The vertices that hold the endpoints of an edge are also placed in the same partition as the edge itself. However, the vertices are not unique across partitions and might have to be replicated (cut), due to the distribution of their edge across different partitions.

Replication factor quantifies how many vertices are replicated over computers compared with the the number of vertices of the original input graph.

Edge Balanced Vertex-Cut

This Algorithm is a simple vertex-cut in Round-Robin fashion. It takes the original graph edges and assign them to the partitions, dividing it in equal(or similar) size. This algorithm does not take care of optimization in vertex replication ( Replication Factor) but only balance the edge in the partitions.

Greedy Vertex-Cut

Greedy partitioning algorithms uses the entire history of the edge assignments to make the next decision. The algorithm stores the set of partitions A(v) to which each already observed vertex v has been assigned and the current partition sizes. When processing edge e ∈ E connecting vertices vi, vj ∈ V , the greedy algorithm follows this simple set of rules:

  • Rule 1: If neither vi nor vj have been assigned to a partition, then e is placed in the partition with the smallest size in P.
  • Rule 2: If only one of the two vertices has been already assigned (without loss of generality assume that vi is the assigned vertex) then e is placed in the partition with the smallest size in A(vi).
  • Rule 3: If A(vi) ∩ A(vj ) 6= ∅, then edge e is placed in the partition with the smallest size in A(vi) ∩ A(vj).
  • Rule 4: If A(vi) != ∅, A(vj ) != ∅ and A(vi)∩A(vj ) = ∅, then e is placed in the partition with the smallest size in A(vi)∪A(vj) and a new vertex replica is created accordingly.

HDRF

High Degree (are) Replicated First(HDRF) Algorithm is a greedy vertex-cut algorithm as described by this paper. This Algorithm try to optimize Replication Factor by using the history of the edge assignements amd the incremental vertex degree. With a function that take in consideration this two factors calculate the best partition to assign the analyzed edge. The replica created are based on the degree of the verteices, and the vertices replicated are probably a so called "Hub-Node", which are the vertices with higher degree.

EBV

Efficient and Balanced Vertex-cut(EBV) is an offline vertex-cut algorithm as described by this paper. This algorithm try to balance the partitions with respect to the number of edges and vertices of each partitions and the Replication Factor. It apply a formula to evaluate the partition in which assigns the edge that take into consideration also the total number of edges and vertices of the graph. The evaluation formula is the following:

$$Eva(u,v)(i) =I(u ∈ keep[i]) + I(v ∈ keep[i]) +α * \frac{ecount[i]}{(|E|/p)} + β * \frac{vcount[i]}{(|V|/p)}$$

The lowest value is taken as partition Id.

Network Dynamics

Degree Matrix

The Degree Matrix is a square matrix that provides insights into the connectivity of nodes in a graph. For directed graphs, it reflects the number of incoming and outgoing edges for each node, while for undirected graphs, it represents the number of edges incident to each node.

Laplacian Matrix

The Laplacian Matrix is a square matrix derived from the adjacency matrix and degree matrix of a graph. It is instrumental in analyzing various properties of the graph, such as connectedness, the count of spanning trees, and other spectral characteristics.

Transition Matrix

The Transition Matrix is commonly used in the study of Markov Chains and stochastic processes. Within the context of a graph, it denotes the probabilities of transitioning from one node to another, often based on the edge weights or predetermined criteria. This matrix finds applications in various fields such as network analysis, machine learning, and optimization.

How to contribute

GitHub contributors If you want to give your support you can create a pull request GitHub pull-requests or report an issue GitHub issues. If you want to change the code, fix an issue, or implement a new feature please read our CONTRIBUTING Guide.

If you want to discuss new features or you have any questions or suggestions about the library, please open a Discussion or simply chat on Join the chat at https://gitter.im/CXXGraph-Community/community

Stars History

Star History Chart

Site

CXXGraph Site

Contact

E-mail : zigrazor@gmail.com

Join the chat at https://gitter.im/CXXGraph-Community/community

GitHub Profile Profile views

ZigRazor's github stats

Support

To support me, add a Star to the project GitHub stars or follow me GitHub followers

To stay updated, watch the project GitHub watchers

References

We are referenced by:

Credits

Thanks to the community of TheAlgorithms for some algorithm inspiration.

Thanks to GeeksForGeeks for some algorithm inspiration.

Contributors

Thank you to all the people who have already contributed to CXXGraph!

Contributors

Cite Us

If you use this software please follow the CITATION instructions. Thank you!

Hacktoberfest 2k21

We participated at Hacktoberfest 2021. Thank you to all the contributors!

Hacktoberfest 2k22

We participated at Hacktoberfest 2022. Thank you to all the contributors!

Hacktoberfest 2k23

We participated at Hacktoberfest 2023. Thank you to all the contributors!

Other Details

View the Estimated Value of the Project

Author


@ZigRazor

footer

About

Header-Only C++ Library for Graph Representation and Algorithms

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • C++ 97.8%
  • C 1.3%
  • Other 0.9%