Skip to content

HowTo MissingMatchPathValueDetector

landauermax edited this page Aug 15, 2022 · 15 revisions

In contrast to the detectors from the Getting Started Tutorial which tell you if something unexpected happens, the MissingMatchPathValueDetector is a convenient way to get informed if something expected did not happen, such as a specific type of event with a specific value at a specific path or paths did not show up within a certain time period.

In this tutorial, we use again the same setup as in the Getting Started Tutorial. So, we are going to parse and process access logs from Apache2, to which we will issue some sorts of requests with wget.

TL;DR -- What do I learn here?

  • Set the AMiner and observed system to a defined state, i.e., clear AMiner’s persistency and clear Apache’s access logs.
  • Configure the MissingMatchPathValueDetector, i.e., pick the path /accesslog/useragent, set the timing intervals in seconds (check_interval=5, realert_interval=30) and enable learn mode.
  • Start the AMiner, issue some requests with wget with the user agent "alice" and let it learn.
  • Shutdown AMiner, disable learn mode, start the AMiner again.
  • Create a short script which issues GET requests with user agent "alice" in periodic time intervals (smaller than 5 seconds), and a second script which does the same for user agent "bob".
  • Kill the "alice script", recognize the alerts (and reoccurring alerts) from the AMiner – telling you that user agent "bob" was spotted, but "alice" expected.
  • Extended Version: Let the AMiner learn multiple values, e.g., "alice" and "bob", and alert you when any of these stop to show up.

Preparation

If you followed the Getting Started Tutorial – Great! We pick you up from there. We still use the same log data source (Apache access logs) and the same parser model (ApacheAccessModel). If you are unfamiliar with these, then we recommend you take a look into the Getting Started Tutorial.

We clean up a bit after our first experiments

  • shutdown the AMiner first with CTRL+C
  • we are making a fresh start to keep this example easy to follow, so clear the persistency: debian@debian10:~$ sudo rm -rf /var/lib/aminer/*
  • delete the old Apache logs: debian@debian10:~$ sudo rm -r /var/log/apache2/access.log
  • restart Apache, so that it creates a new empty logfile: debian@debian10:~$ sudo service apache2 restart

Learn values

Next we configure the AMiner by modifying the config.yml file. In particular we enable the MissingMatchPathValueDetector and set the AMiner to learn mode. Notice that of course you can combine different detectors and configurations, however, in order to demonstrate the power of this detector, we will just enable this one.

LearnMode: True # optional

LogResourceList:
        - 'file:///var/log/apache2/access.log'

Parser:
        - id: 'START'
          start: True
          type: ApacheAccessModel
          name: 'apache'

Input:
        timestamp_paths: "/accesslog/time"

Analysis:
        - type: "MissingMatchPathValueDetector"
          paths: 
            - "/accesslog/combined/combined/user_agent"
          check_interval: 5
          realert_interval: 30
          learn_mode: True

EventHandlers:
        - id: "stpe"
          type: "StreamPrinterEventHandler"

This detector requires three configuration parameters: the paths where to look for values, the interval in which pre-set or learned values need to appear (check_interval) and the interval in which the AMiner should alert us about missing token values (realert_interval). For example, you could use the AMiner to check for a specific value in /accesslog/useragent every 5 seconds and in case the access event with the learned value does not appear it drops an alert message. However, you would not want to be alerted every 5 seconds, but just once, and if the condition of the missing event is still met, you would probably want further alert messages, e.g. every 30 seconds. This is what the realert_interval is meant for, which you would usually set much higher than the check_interval.

Let’s try this out.

We request a webpage from the Apache webserver with a specific user agent "alice"

debian@debian10:~$ wget -qO- -U "alice" http://localhost

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
....

This request produces a log entry in the Apache access logs:

debian@debian10:~$ sudo cat /var/log/apache2/access.log
::1 - - [18/Jul/2020:19:43:18 +0000] "GET / HTTP/1.1" 200 11012 "-" "alice"

Now, fire up the AMiner and let it learn the specific value of the path /accesslog/useragent.

debian@debian10:~$ sudo aminer --config /etc/aminer/config.yml
2020-07-18 21:43:18 New path(es) detected
NewMatchPathDetector: "DefaultNewMatchPathDetector" (1 lines)
  /accesslog: ::1 - - [18/Jul/2020:19:43:18 +0000] "GET / HTTP/1.1" 200 11012 "-" "alice"
  /accesslog/host: ::1
  /accesslog/sp0:  
  /accesslog/ident: -
  /accesslog/sp1:  
  /accesslog/user: -
  /accesslog/sp2:  
  /accesslog/time: 1595101398
  /accesslog/sp3: ] "
  /accesslog/fm/request: GET / HTTP/1.1
    /accesslog/fm/request/method: 0
    /accesslog/fm/request/sp5:  
    /accesslog/fm/request/request: /
    /accesslog/fm/request/sp6:  
    /accesslog/fm/request/version: HTTP/1.1
  /accesslog/sp6: " 
  /accesslog/status: 200
  /accesslog/sp7:  
  /accesslog/size: 11012
  /accesslog/combined:  "-" "alice"
    /accesslog/combined/combined:  "-" "alice"
      /accesslog/combined/combined/sp9:  "
      /accesslog/combined/combined/referer: -
      /accesslog/combined/combined/sp10: " "
      /accesslog/combined/combined/user_agent: alice
      /accesslog/combined/combined/sp11: "
['/accesslog', '/accesslog/host', '/accesslog/sp0', '/accesslog/ident', '/accesslog/sp1', '/accesslog/user', '/accesslog/sp2', '/accesslog/time', '/accesslog/sp3', '/accesslog/fm/request', '/accesslog/sp6', '/accesslog/status', '/accesslog/sp7', '/accesslog/size', '/accesslog/combined', '/accesslog/combined/combined', '/accesslog/combined/combined/sp9', '/accesslog/combined/combined/referer', '/accesslog/combined/combined/sp10', '/accesslog/combined/combined/user_agent', '/accesslog/combined/combined/sp11', '/accesslog/fm/request/method', '/accesslog/fm/request/sp5', '/accesslog/fm/request/request', '/accesslog/fm/request/sp6', '/accesslog/fm/request/version']
::1 - - [18/Jul/2020:19:43:18 +0000] "GET / HTTP/1.1" 200 11012 "-" "alice"

After a couple of seconds exit the AMiner with CTRL+C. Notice, the NewMatchPathDetector is always enabled by default. As in the Getting Started Tutorial we learned now which paths appeared in this single log line. However additionally we also learned unique values of /accesslog/useragent.

Let’s check this out:

debian@debian10:~$ sudo cat /var/lib/aminer/MissingMatchPathValueDetector/Default
{"string:['alice']": [1595101398, 5, 0, "string:['/accesslog/combined/combined/user_agent']"]}

If you are curious, feel free to also check which paths the AMiner has seen in the single access log line:

debian@debian10:~$sudo cat /var/lib/aminer/NewMatchPathDetector/Default
["string:/accesslog/size", "string:/accesslog/sp8", "string:/accesslog/version", "string:/accesslog/method", "string:/accesslog/sp2", "string:/accesslog/sp6", "string:/accesslog/host", "string:/accesslog/sp9", "string:/accesslog/sp1", "string:/accesslog/status", "string:/accesslog/ident", "string:/accesslog/sp5", "string:/accesslog/sp0", "string:/accesslog/useragent", "string:/accesslog/sp4", "string:/accesslog", "string:/accesslog/time", "string:/accesslog/user", "string:/accesslog/sp7", "string:/accesslog/request", "string:/accesslog/sp3"]root@debian10:/home/debian/TryItOut#

Now disable the learn mode again:

LearnMode: False # optional

We are now ready for some experiments.

Detect missing values

To make our lives easier, we create two scripts that periodically send GET requests to the Apache server and thus create some log lines as we experiment. First create a bash script that requests the index page every second with the user agent string "alice":

#!/bin/bash
while true; do
sleep 1
wget -qO- -U "alice" http://localhost
done

Use any text editor, such as vi, to create a file named getAlice.sh with the content above. After saving don’t forget to set the execute flag.

debian@debian10:~$ chmod +x getAlice.sh

Then start the script – one request every second is logged to the Apache access logs. Keep it running:

debian@debian10:~$ ./getAlice.sh

Now let’s see what our AMiner does with the new log lines. Start the AMiner in a separate terminal window while the getAlice.sh script is running:

debian@debian10:~$ sudo aminer --config /etc/aminer/config.yml

NO anomalies. Good! The MissingMatchPathValue detector expects the value "alice" at least every 5 seconds in Apache access log events at the path accesslog/useragent – and we give it exactly that.

Keep the AMiner running!

Create another bash script, now with the user agent "bob"...

#!/bin/bash
while true; do
sleep 1
wget -qO- -U "bob" http://localhost
done

and start it in a separate terminal window:

debian@debian10:~$ chmod +x getBob.sh
debian@debian10:~$ ./getBob.sh

Still no anomalies! Notice, the NewMatchPathValue detector would have notified us by now about a new value "bob", because just "alice" has been learned as user agent. However, we did not enable the NewMatchPathValueDetector. So everything is good.

Let’s check what happens if we stop the getAlice.sh script (Note: leave the script getBob.sh running, because only new arriving log lines trigger the processing pipeline). Press CTRL+C in the respective terminal window and switch to the AMiner output terminal:

debian@debian10:~$ sudo aminer --config /etc/aminer/config.yml
2020-07-18 22:28:13 Interval too large between values
MissingMatchPathValueDetector: "MissingMatchPathValueDetector2" (1 lines)
    ['/accesslog/combined/combined/user_agent']: "['alice']" overdue 1s (interval 5)

Good! Here is our anomaly: 2020-07-18 22:28:13 Interval too large between values. It further tells us what’s wrong:

MissingMatchPathValueDetector: "MissingMatchPathValueDetector2" (1 lines)
    ['/accesslog/combined/combined/user_agent']: "['alice']" overdue 1s (interval 5)

It misses the value "alice" in /accesslog/combined/combined/user_agent. It even tells us how long this is overdue (since the first anomaly is thrown right after the check_interval has passed, this is virtually zero) and the detector also tells us what it got instead of "alice", namely "bob".

After the realert_interval has passed (remember, we configured 30 seconds in the beginning), we get a further alert (with an overdue time 30 seconds larger as before).

2020-07-18 22:28:43 Interval too large between values
MissingMatchPathValueDetector: "MissingMatchPathValueDetector2" (1 lines)
    ['/accesslog/combined/combined/user_agent']: "['alice']" overdue 31s (interval 5)

If we keep it this way, we will receive a new alert every 30 seconds.

2020-07-18 22:29:13 Interval too large between values
MissingMatchPathValueDetector: "MissingMatchPathValueDetector2" (1 lines)
    ['/accesslog/combined/combined/user_agent']: "['alice']" overdue 61s (interval 5)

In order to satisfy the MissingMatchPathValueDetector we can simply restart the getAlice.sh script. No new alerts are generated anymore.

debian@debian10:~$ ./getAlice.sh

Total silence

You might ask, what happens if we kill both scripts and no requests are recorded in Apache’s access log at all. Should not the AMiner alert us about the missing requests from useragent "alice"? The short answer is, yes it should, but it doesn't! The detector is designed to investigate each new log line that matches the configured model and check whether the path values therein comply to the path in the AMiner config (config.yml) and the learned values in the persistency respectively. However, if there are no new events at all, there is nothing to investigate and therefore no alerting takes place. Keep this in mind!

Kill both scripts by simply pressing CTRL+C in the two terminal windows of the getAlice.sh and getBob.sh scripts and wait for one or two minutes. Check the AMiner output – there is no new output, becaue the detector was not triggered anymore.

Finally, what if we send one further request to Apache and trigger the Aminer’s MissingMatchPathValueDetector?

Try it out:

debian@debian10:~$ wget -qO- -U "claire" http://localhost

We receive an alert notifying us that "alice" is overdue, just as expected. But look closely, we waited for 5 minutes after we stopped both scripts to send the request above, and now the overdue time is 302 seconds. So, the AMiner tells us exactly how long it was missing "alice", it just did not notify us until now because no new events matching the configured model were processed.

2020-07-18 22:29:53 Interval too large between values
MissingMatchPathValueDetector: "MissingMatchPathValueDetector2" (1 lines)
    ['/accesslog/combined/combined/user_agent']: "['alice']" overdue 101s (interval 5)

Learn more than one value

Until now we expected just one particular value "alice". But can we do the same with multiple values? Yes we can.

Let’s tidy up:

  • Stop the AMiner (CTRL+C in the respective terminal window)
  • Clear the persistency of the MissingMatchPathValueDetector: debian@debian10:~$ sudo rm -rf /var/lib/aminer/MissingMatchPathValueDetector/* You may keep the learned values of other detectors, such as the NewMatchPathDetector in /var/lib/aminer/NewMatchPathDetector/*
  • Clear apache logs: debian@debian10:~$ sudo rm -r /var/log/apache2/access.log
  • restart Apache, so that it creates a new empty logfile: debian@debian10:~$ sudo service apache2 restart
  • set LearnMode: True in the config.yml
  • start our getAlice.sh and getBob.sh scripts

Check the Apache logs. We can see requests in one second intervals from both user agents "alice" and "bob":

debian@debian10:~$ sudo cat /var/log/apache2/access.log

::1 - - [18/Jul/2020:20:28:37 +0000] "GET / HTTP/1.1" 200 11012 "-" "alice"
::1 - - [18/Jul/2020:20:28:38 +0000] "GET / HTTP/1.1" 200 11012 "-" "bob"
::1 - - [18/Jul/2020:20:28:38 +0000] "GET / HTTP/1.1" 200 11012 "-" "alice"
::1 - - [18/Jul/2020:20:28:39 +0000] "GET / HTTP/1.1" 200 11012 "-" "bob"
::1 - - [18/Jul/2020:20:28:39 +0000] "GET / HTTP/1.1" 200 11012 "-" "alice"
::1 - - [18/Jul/2020:20:28:40 +0000] "GET / HTTP/1.1" 200 11012 "-" "bob"
::1 - - [18/Jul/2020:20:28:40 +0000] "GET / HTTP/1.1" 200 11012 "-" "alice"
::1 - - [18/Jul/2020:20:28:41 +0000] "GET / HTTP/1.1" 200 11012 "-" "bob"

Let’s learn these values. So start the AMiner as before and let it run for some seconds. It will recognize the new values "alice" and "bob". If you deleted just the MissingMatchPathValueDetector persistency, but kept the NewMatchPathDetector persistency, you won’t get any alerts about new paths.

Not let’s stop the AMiner (CTRL+C) and let’s check what it learned. It recognized the strings "bob" and "alice", and additionally the timestamps when it has last seen these values at the configured path /accesslog/useragent:

debian@debian10:~$ sudo cat /var/lib/aminer/MissingMatchPathValueDetector/Default
{"string:['alice']": [1595104087, 5, 1595104141, "string:['/accesslog/combined/combined/user_agent']"], "string:['bob']": [1595104123, 5, 1595104138, "string:['/accesslog/combined/combined/user_agent']"]}

Disable learn mode in the config.yml file and start up the AMiner again. You won’t receive any alerts. The detector expects "alice" and "bob" to reappear at least every 5 seconds at the path /accesslog/useragent. But kill one of these scripts, and it will alert you about a missing value – either "alice" or "bob".

Finally, create a third script getClaire.sh with the user agent "claire" and let it run (Don’t forget to set the execute flag).

#!/bin/bash
while true; do
sleep 1
wget -qO- -U "claire" http://localhost
done

Check the Apache logs. We can see requests in one second intervals from all three user agents "alice", "bob" and "claire":

debian@debian10:~$ sudo cat /var/log/apache2/access.log

::1 - - [18/Jul/2020:20:43:09 +0000] "GET / HTTP/1.1" 200 11012 "-" "alice"
::1 - - [18/Jul/2020:20:43:09 +0000] "GET / HTTP/1.1" 200 11012 "-" "claire"
::1 - - [18/Jul/2020:20:43:10 +0000] "GET / HTTP/1.1" 200 11012 "-" "bob"
::1 - - [18/Jul/2020:20:43:10 +0000] "GET / HTTP/1.1" 200 11012 "-" "alice"
::1 - - [18/Jul/2020:20:43:10 +0000] "GET / HTTP/1.1" 200 11012 "-" "claire"
::1 - - [18/Jul/2020:20:43:11 +0000] "GET / HTTP/1.1" 200 11012 "-" "bob"
::1 - - [18/Jul/2020:20:43:11 +0000] "GET / HTTP/1.1" 200 11012 "-" "alice"
::1 - - [18/Jul/2020:20:43:11 +0000] "GET / HTTP/1.1" 200 11012 "-" "claire"

Stop getAlice.sh and getBob.sh -– what would you expect? (Notice, we need getClaire.sh to trigger the execution of the detector).

2020-07-18 20:45:23 Interval too large between values
MissingMatchPathValueDetector: "MissingMatchPathValueDetector2" (1 lines)
    ['/accesslog/combined/combined/user_agent']: "['alice']" overdue 31s (interval 5)

2020-07-18 20:45:26 Interval too large between values
MissingMatchPathValueDetector: "MissingMatchPathValueDetector2" (1 lines)
    ['/accesslog/combined/combined/user_agent']: "['alice']" overdue 31s (interval 5)

We receive two alerts, one for the missing request from "alice", another one for the missing request from "bob". What was the time difference between stopping the alice script and the bob script? 3 seconds – check the time stamps of the alert.

We expect to be realerted every 30 seconds, so here are the next events:

2020-07-18 20:45:53 Interval too large between values
MissingMatchPathValueDetector: "MissingMatchPathValueDetector2" (1 lines)
    ['/accesslog/combined/combined/user_agent']: "['alice']" overdue 61s (interval 5)

2020-07-18 20:45:56 Interval too large between values
MissingMatchPathValueDetector: "MissingMatchPathValueDetector2" (1 lines)
    ['/accesslog/combined/combined/user_agent']: "['alice']" overdue 61s (interval 5)

Finally, we showed how to trigger two types of alerts, one for "alice" and one for "bob" being overdue – with the same detector.

Conclusion

The MissingMatchPathValueDetector is a convenient way to trigger alerts if expected events, in more detail specific values at specific paths, do not occur. Since you might not know what values are "normal" (or rather "typical") for your observed system, you could let the AMiner learn these values. This way you get notified if an expected process did not run, such as a scanning process, a backup process, or a certificate renewal process and the like.