This is a backend for balboa that uses Apache Accumulo as a storage and query engine. It is quite basic in its feature set and should be considered a starting point or building block in a more refined setup, most likely involving multiple input consumer frontends feeding into multiple backend instances, all connecting to one Accumulo cluster.
- JDK 8 or later
- balboa-backend-java (Maven Central)
- accumulo-core API 2.0
- commons-cli
A self-contained jar can be built, in the source directory, like this:
$ mvn package
This should leave a balboa-backend-accumulo-<VERSION>-jar-with-dependencies.jar
in the target/
subdirectory. Dependencies will be fetched automatically from
Maven Central.
The jar takes a -c
command line parameter specifying the path to a
properties file, which needs to contain at least the necessary Accumulo client
properties
needed to connect to the cluster. For example, a simple development setup using
Uno could be accessed with something along the lines of:
instance.name=uno
instance.zookeepers=uno
auth.type=password
auth.principal=satta
auth.token=satta
balboa.port=4242
The balboa.port
property defines the local port listened on for msgpack TCP connection
from frontends.
The observation data are stored in three tables, optimized for rrname
, rdata
and
reverse rrname
look-ups (used for suffix queries). We store observations redundantly
reduce the number of indirections.
Please make sure these tables are present and read/writable for the user specified in the connection details.
Row ID | Column Family | Column Qualifier | Visibility | Value |
---|---|---|---|---|
rrname-rsensorid-data-rrtype | count | count | public | LONG VARLEN |
rrname-rsensorid-data-rrtype | seen | first | public | LONG VARLEN |
rrname-rsensorid-data-rrtype | seen | last | public | LONG VARLEN |
We use various combiners to aggregate identical observations:
setiter -class org.apache.accumulo.core.iterators.user.MaxCombiner -p 11 -t balboa_by_rrname -all # on seen:last
setiter -class org.apache.accumulo.core.iterators.user.MinCombiner -p 13 -t balboa_by_rrname -all # on seen:first
setiter -class org.apache.accumulo.core.iterators.user.SummingCombiner -p 12 -t balboa_by_rrname -all # on count:count
These need to be set on the following other tables as well:
Row ID | Column Family | Column Qualifier | Visibility | Value |
---|---|---|---|---|
rdata-sensorid-rrname-rrtype | count | count | public | LONG VARLEN |
rdata-sensorid-rrname-rrtype | seen | first | public | LONG VARLEN |
rdata-sensorid-rrname-rrtype | seen | last | public | LONG VARLEN |
Row ID | Column Family | Column Qualifier | Visibility | Value |
---|---|---|---|---|
rev(rrname)-sensorid-rdata-rrtype | count | count | public | LONG VARLEN |
rev(rrname)-sensorid-rdata-rrtype | seen | first | public | LONG VARLEN |
rev(rrname)-sensorid-rdata-rrtype | seen | last | public | LONG VARLEN |
This example run uses balboa's balboa-backend-console
to directly talk to the
backend rather than having to go through the GraphQL frontend.
rrname
full query:
$ balboa-backend-console query -h 127.0.0.1 -p 4242 -r dns.google | head -n 1 | jq
{
"rrname": "dns.google",
"rrtype": "A",
"sensor_id": "foo",
"rdata": "8.8.4.4",
"count": 1,
"first_seen": 1598303837,
"last_seen": 1598303897
}
rrname
suffix query:
$ balboa-backend-console query -h 127.0.0.1 -p 4242 -r %.com.de | head -n 1 | jq
{
"rrname": "www.jabra.com.de",
"rrtype": "A",
"sensor_id": "foo",
"rdata": "152.199.21.175",
"count": 1,
"first_seen": 1603348710,
"last_seen": 1603348770
}
rdata
query:
$ balboa-backend-console query -h 127.0.0.1 -p 4242 -d 9.9.9.10 | jq
{
"rrname": "dns10.quad9.net",
"rrtype": "A",
"sensor_id": "foo",
"rdata": "9.9.9.10",
"count": 1,
"first_seen": 1603892361,
"last_seen": 1603892421
}
- Hard-coded table names and
public
visibility - Wildcard support limited to
rrname
queries - For
rrname
queries, additionalrdata
andsensorid
constraints will be matched anywhere in the row