English | 中文
nebula-algorithm is a Spark Application based on GraphX with the following Algorithm provided for now:
Name | Use Case |
---|---|
PageRank | page ranking, important node digging |
Louvain | community digging, hierarchical clustering |
KCore | community detection, financial risk control |
LabelPropagation | community detection, consultation propagation, advertising recommendation |
ConnectedComponent | community detection, isolated island detection |
StronglyConnectedComponent | community detection |
ShortestPath | path plan, network plan |
TriangleCount | network structure analysis |
GraphTriangleCount | network structure and tightness analysis |
BetweennessCentrality | important node digging, node influence calculation |
DegreeStatic | graph structure analysis |
You could submit the entire spark application or invoke algorithms in lib
library to apply graph algorithms for DataFrame.
-
Build Nebula Algorithm
$ git clone https://github.com/vesoft-inc/nebula-algorithm.git $ cd nebula-algorithm $ mvn clean package -Dgpg.skip -Dmaven.javadoc.skip=true -Dmaven.test.skip=true
After the above buiding process, the target file
nebula-algorithm-2.0.0.jar
will be placed undernebula-algorithm/target
. -
Download from Maven repo
Alternatively, it could be downloaded from the following Maven repo:
https://repo1.maven.org/maven2/com/vesoft/nebula-algorithm/2.0.0/
Limitation: Due to Nebula Algorithm will not encode string id, thus during the algorithm execution, the source and target of edges must be in Type Int (The vid_type
in Nebula Space could be String, while data must be in Type Int).
-
Option 1: Submit nebula-algorithm package
- Configuration
Refer to the configuration example.
- Submit Spark Application
${SPARK_HOME}/bin/spark-submit --master <mode> --class com.vesoft.nebula.algorithm.Main nebula-algorithm-2.0.0.jar -p application.conf
-
Option2: Call nebula-algorithm interface
Now there are 10 algorithms provided in
lib
fromnebula-algorithm
, which could be invoked in a programming fashion as below:- Add dependencies in
pom.xml
.
<dependency> <groupId>com.vesoft</groupId> <artifactId>nebula-algorithm</artifactId> <version>2.0.0</version> </dependency>
- Instantiate algorithm's config, below is an example for
PageRank
.
val prConfig = new PRConfig(5, 1.0) val louvainResult = PageRankAlgo.apply(spark, data, prConfig, false)
For other algorithms, please refer to test cases.
Note: The first column of DataFrame in the application represents the source vertices, the second represents the target vertices and the third represents edges' weight.
- Add dependencies in
Nebula Algorithm Version | Nebula Version |
---|---|
2.0.0 | 2.0.0, 2.0.1 |
2.1.0 | 2.0.0, 2.0.1 |
2.5.0 | 2.5.0 |
2.5-SNAPSHOT | nightly |
Nebula Algorithm is open source, you are more than welcomed to contribute in the following ways:
- Discuss in the community via the forum or raise issues here.
- Compose or improve our documents.
- Pull Request to help improve the code itself here.