Skip to content

Plume is a code representation benchmarking library with options to extract the AST from Java bytecode and store the result in various graph databases.

License

Notifications You must be signed in to change notification settings

plume-oss/plume

Repository files navigation

Plume is a language front-end to construct ASTs based on the code-property graphs schema from JVM bytecode. Plume is graph database agnostic and can store the graphs to multiple graph databases.

License GitHub Actions

Important

Plume is the original implementation of jimple2cpg. The frontend on Joern project is optimized around OverflowDB and is much more lightweight. This is project focuses on experimenting with incremental dataflow analysis and comparing database backend performance.

Versions < 0.6.3 of Plume were Kotlin based but versions from 1.0.0 onwards have been moved to a Scala implementation for better interfacing with the CPG schema library.

If your project depends on Plume I am happy to still provide maintenance and support, but I recommend any new research to begin on Joern where I also spend time providing help and support.

Tip

A flatgraph-based fork in on dave/flatgraph. This is not merged into the default branch as the current flatgraph.DiffGraphBuilder API is more encapsulated than OverflowDB's.

Quickstart

One can run Plume from the plume binary which will use OverflowDB as the graph database backend if no config is found. If one would like to configure another backend then use the CLI arguments:

Usage: plume [tinkergraph|overflowdb|neo4j|neo4j-embedded|tigergraph|neptune] [options] input-dir

An AST creator for comparing graph databases as static analysis backends.
  -h, --help
  input-dir                The target application to parse.
Command: tinkergraph [options]

  --import-path <value>    The TinkerGraph to import.
  --export-path <value>    The TinkerGraph export path to serialize the result to.
Command: overflowdb [options]

  --storage-location <value>
  --heap-percentage-threshold <value>
  --enable-serialization-stats
Command: neo4j [options]

  --hostname <value>
  --port <value>
  --username <value>
  --password <value>
  --tx-max <value>
Command: neo4j-embedded [options]

  --databaseName <value>
  --databaseDir <value>
  --tx-max <value>
Command: tigergraph [options]

  --hostname <value>
  --restpp-port <value>
  --gsql-port <value>
  --username <value>
  --password <value>
  --timeout <value>
  --tx-max <value>
  --scheme <value>
Command: neptune [options]

  --hostname <value>
  --port <value>
  --key-cert-chain-file <value>
  --tx-max <value>

For more documentation and basic guides, check out the project homepage or the ScalaDoc.

Important: If you are using the TigerGraph driver you need to install the gsql_client.jarand add it to an environment variable called GSQL_CLIENT. Instructions are here e.g.,

curl https://docs.tigergraph.com/tigergraph-server/current/gsql-shell/_attachments/gsql_client.jar --output gsql_client.jar
export GSQL_HOME=`pwd`/gsql_client.jar

Remember to set the tgVersion correctly in the TigerGraphDriver.

Warning

At the time of writing, TigerGraph now ships Docker containers with licenses of restricted lifespans, thus killing access to older versions. Due to this, and some other moves to becoming more proprietary, I have been unable to support newer versions of Tigergraph.

Community

Benchmarks

Plume specifies a benchmark binary which orchestrates running JMH benchmarks for AST creation with various graph database backends. While the binary explains the available functions, the execution should be run within sbt, e.g.

Jmh/runMain com.github.plume.oss.Benchmark overflowdb testprogram -o output -r results --storage-location test.cpg

An automated script to run the benchmarks versus programs from the defects4j dataset is available under runBenchmarks.sc, which can be executed with:

scala runBenchmarks.sc

Known Bugs

  • Due to module encapsulation in Java 17, Kryo serialization for TinkerGraphDriver will not work due to serialization errors. There are ways around this with some additional config, however.
  • When running benchmarks, the classpath is sometimes in an abnormal state, and the mutated JMH classes are missing. this usually resolves itself after re-running the process.

Logging

Plume uses SLF4J as the logging fascade.

Sponsored by

Amazon Science