This repository contains all the relevant information referenced in Log Parsing Literature Survey.
To run the experiments, please navigate to Experiments.
To understand more about the search method of the survey, please navigate to Method.
This section contains detalied information regarding the Experiments section of Log Parsing Literature Survey.
There are two environments available for running the experiments, namely Python 2 and Python 3.
Based on the method that you would like to experiment with, please follow the appropriate setup.
Each method below has been run 10 times for each of the dataset sizes.
AEL
- BGL
-
[1k, ..., 300k] =>
ALL
[1k, 2k, 4k, 10k, 20k, 50k, 100k, 200k, 300k]
-
- HDFS
-
[1k, ..., 500k] =>
ALL
[1k, 2k, 4k, 10k, 20k, 50k, 100k, 200k, 300k, 500k]
-
- OpenSSH
-
[1k, ..., 500k] =>
ALL
[1k, 2k, 4k, 10k, 20k, 50k, 100k, 200k, 300k, 500k]
-
- Thunderbird
-
[1k, ..., 500k] =>
ALL
[1k, 2k, 4k, 10k, 20k, 50k, 100k, 200k, 300k, 500k]
-
- Windows
-
[1k, ..., 500k] =>
ALL
[1k, 2k, 4k, 10k, 20k, 50k, 100k, 200k, 300k, 500k]
-
Spell
- BGL
-
[1k, ..., 300k] =>
NO 10k
[1k, 2k, 4k, 20k, 50k, 100k, 200k, 300k]
-
- HDFS
-
[1k, ..., 1M] =>
ALL
[1k, 2k, 4k, 10k, 20k, 50k, 100k, 200k, 300k, 500k, 1M]
-
- OpenSSH
-
[1k, ..., 500k] =>
NO 10k
[1k, 2k, 4k, 20k, 50k, 100k, 200k, 300k, 500k]
-
- Thunderbird
-
[1k, ..., 1M] =>
ALL
[1k, 2k, 4k, 10k, 20k, 50k, 100k, 200k, 300k, 500k, 1M]
-
- Windows
-
[1k, ..., 20k] =>
NO 50k
[1k, 2k, 4k, 10k, 20k]
-
LogMine
- Android
-
[1k, ..., 20k] =>
NO 10k
[1k, 2k, 4k, 20k]
-
- BGL
-
[1k, ..., 20k] =>
ALL
[1k, 2k, 4k, 10k, 20k]
-
- HDFS
-
[1k, ..., 20k] =>
ALL
[1k, 2k, 4k, 10k, 20k]
-
- Thunderbird
-
[1k, ..., 20k] =>
ALL
[1k, 2k, 4k, 10k, 20k]
-
- Windows
-
[1k, ..., 20k] =>
ALL
[1k, 2k, 4k, 10k, 20k]
-
SLCT
- HDFS
-
[1k, ..., 1M] =>
ALL
[1k, 2k, 4k, 10k, 20k, 50k, 100k, 200k, 300k, 500k, 1M]
-
- Thunderbird
-
[1k, ..., 20k] =>
ALL
[1k, 2k, 4k, 10k, 20k]
-
- Windows
-
[1k, ..., 20k] =>
ALL
[1k, 2k, 4k, 10k, 20k]
-
Drain
- Android
-
[1k, ..., 20k] =>
ALL
[1k, 2k, 4k, 10k, 20k]
-
- BGL
-
[1k, ..., 300k] =>
ALL
[1k, 2k, 4k, 10k, 20k, 50k, 100k, 200k, 300k]
-
- HDFS
-
[1k, ..., 1M] =>
ALL
[1k, 2k, 4k, 10k, 20k, 50k, 100k, 200k, 300k, 500k, 1M]
-
- OpenSSH
-
[1k, ..., 500k] =>
ALL
[1k, 2k, 4k, 10k, 20k, 50k, 100k, 200k, 300k, 500k]
-
- Thunderbird
-
[1k, ..., 1M] =>
ALL
[1k, 2k, 4k, 10k, 20k, 50k, 100k, 200k, 300k, 500k, 1M]
-
- Windows
-
[1k, ..., 1M] =>
ALL
[1k, 2k, 4k, 10k, 20k, 50k, 100k, 200k, 300k, 500k, 1M]
-
IPLoM
- BGL
-
[1k, ..., 300k] =>
ALL
[1k, 2k, 4k, 10k, 20k, 50k, 100k, 200k, 300k]
-
- HDFS
-
[1k, ..., 1M] =>
ALL
[1k, 2k, 4k, 10k, 20k, 50k, 100k, 200k, 300k, 500k, 1M]
-
- OpenSSH
-
[1k, ..., 500k] =>
ALL
[1k, 2k, 4k, 10k, 20k, 50k, 100k, 200k, 300k, 500k]
-
- Thunderbird
-
[1k, ..., 1M] =>
ALL
[1k, 2k, 4k, 10k, 20k, 50k, 100k, 200k, 300k, 500k, 1M]
-
- Windows
-
[1k, ..., 1M] =>
ALL
[1k, 2k, 4k, 10k, 20k, 50k, 100k, 200k, 300k, 500k, 1M]
-
Lenma
- Android
-
[1k, ..., 200k] =>
ALL
[1k, 2k, 4k, 10k, 20k, 50k, 100k, 200k]
-
- BGL
-
[1k, ..., 300k] =>
ALL
[1k, 2k, 4k, 10k, 20k, 50k, 100k, 200k, 300k]
-
- HDFS
-
[1k, ..., 1M] =>
ALL
[1k, 2k, 4k, 10k, 20k, 50k, 100k, 200k, 300k, 500k, 1M]
-
- OpenSSH
-
[1k, ..., 500k] =>
ALL
[1k, 2k, 4k, 10k, 20k, 50k, 100k, 200k, 300k, 500k]
-
- Thunderbird
-
[1k, ..., 1M] =>
ALL
[1k, 2k, 4k, 10k, 20k, 50k, 100k, 200k, 300k, 500k, 1M]
-
- Windows
-
[1k, ..., 1M] =>
ALL
[1k, 2k, 4k, 10k, 20k, 50k, 100k, 200k, 300k, 500k, 1M]
-
MoLFI
- BGL
-
[1k, ..., 50k] =>
ALL
[1k, 2k, 4k, 10k, 20k, 50k]
-
- HDFS
-
[1k, ..., 100k] =>
ALL
[1k, 2k, 4k, 10k, 20k, 50k, 100k]
-
- OpenSSH
-
[1k, ..., 50k] =>
ALL
[1k, 2k, 4k, 10k, 20k, 50k]
-
- Thunderbird
-
[1k, ..., 50k] =>
ALL
[1k, 2k, 4k, 10k, 20k, 50k]
-
- Windows
-
[1k, ..., 50k] =>
ALL
[1k, 2k, 4k, 10k, 20k, 50k]
-
SHISO
- BGL
-
[1k, ..., 300k] =>
ALL
[1k, 2k, 4k, 10k, 20k, 50k, 100k, 200k, 300k]
-
- HDFS
-
[1k, ..., 300k] =>
ALL
[1k, 2k, 4k, 10k, 20k, 50k, 100k, 200k, 300k]
-
- OpenSSH
-
[1k, ..., 300k] =>
ALL
[1k, 2k, 4k, 10k, 20k, 50k, 100k, 200k, 300k]
-
- Thunderbird
-
[1k, ..., 300k] =>
ALL
[1k, 2k, 4k, 10k, 20k, 50k, 100k, 200k, 300k]
-
- Windows
-
[1k, ..., 300k] =>
ALL
[1k, 2k, 4k, 10k, 20k, 50k, 100k, 200k, 300k]
-
LogCluster
- BGL
-
[1k, ..., 300k] =>
ALL
[1k, 2k, 4k, 10k, 20k, 50k, 100k, 200k, 300k]
-
- HDFS
-
[1k, ..., 1M] =>
ALL
[1k, 2k, 4k, 10k, 20k, 50k, 100k, 200k, 300k, 500k, 1M]
-
- OpenSSH
-
[1k, ..., 500k] =>
ALL
[1k, 2k, 4k, 10k, 20k, 50k, 100k, 200k, 500k]
-
- Thunderbird
-
[1k, ..., 1M] =>
ALL
[1k, 2k, 4k, 10k, 20k, 50k, 100k, 200k, 300k, 500k, 1M]
-
- Windows
-
[1k, ..., 1M] =>
ALL
[1k, 2k, 4k, 10k, 20k, 50k, 100k, 200k, 300k, 500k, 1M]
-
LogSig
- OpenSSH
-
[1k, ..., 200k] =>
ALL
[1k, 2k, 4k, 10k, 20k, 50k, 100k, 200k]
-
- Thunderbird
-
[1k, ..., 200k] =>
ALL
[1k, 2k, 4k, 10k, 20k, 50k, 100k, 200k]
-
- Windows
-
[1k, ..., 200k] =>
ALL
[1k, 2k, 4k, 10k, 20k, 50k, 100k, 200k]
-
Spell
- BGL
-
[1k, ..., 300k] =>
NO 10k
[1k, 2k, 4k, 20k, 50k, 100k, 200k, 300k]
-
- HDFS
-
[1k, ..., 1M] =>
ALL
[1k, 2k, 4k, 10k, 20k, 50k, 100k, 200k, 300k, 500k, 1M]
-
- OpenSSH
-
[1k, ..., 500k] =>
NO 10k
[1k, 2k, 4k, 20k, 50k, 100k, 200k, 300k, 500k]
-
- Thunderbird
-
[1k, ..., 1M] =>
ALL
[1k, 2k, 4k, 10k, 20k, 50k, 100k, 200k, 300k, 500k, 1M]
-
- Windows
-
[1k, ..., 20k] =>
NO 50k
[1k, 2k, 4k, 10k, 20k]
-
Python 2 methods
- AEL
- Drain
- IPLoM
- LenMa
- LFA
- LKE
- LogCluster
- LogMine
- LogSig
- SHISO
- SLCT
- Spell
Although implemented, methods with * are not scalable.
Python 3 methods
- MoLFI
- NuLog
This section contains detalied information regarding the Method section of Log Parsing Literature Survey.
The queries used for the survey can be found under Queries.
Overall statistics can be found below.
Databases queried: Google Scholar, Scopus.
Number of queries (Google Scholar): 7
Number of queries (Scopus): 1
Number of papers selected after running queries (Google Scholar): 59
Number of papers selected after running queries (Scopus): 13
Number of papers selected after snowballing (Google Scholar): 34
Number of papers selected after snowballing (Scopus): 0
Total references checked while snowballing (Google Scholar): 1707
Total references checked while snowballing (Scopus): 344
Total references checked while snowballing: 2051
Total number of papers selected for survey: 93
Find below the queries used for the survey.
-
[Query 1]
log parsing
-
Tools and Benchmarks for Automated Log Parsing (57)
- SherLog: Error Diagnosis by Connecting Clues from Run-time Logs
- DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning
- Detecting Large-Scale System Problems by Mining Console Logs
- A Data Clustering Algorithm for Mining Patterns From Event Logs
- LogCluster - A Data Clustering and Pattern Mining Algorithm for Event Logs
- Clustering Event Logs Using Iterative Partitioning
- Length Matters: Clustering System Log Messages using Length of Words
- LogMine: Fast Pattern Recognition for Log Analytics
- Abstracting Log Lines to Log Event Types for Mining Software System Logs
- LogSig: Generating System Events from Raw Textual Logs
- Incremental Mining of System Log Format
- Abstracting Execution Logs to Execution Events for Enterprise Applications (Short Paper)
-
Drain: An Online Log Parsing Approach with Fixed Depth Tree (35)
-
A Directed Acyclic Graph Approach to Online Log Parsing (41)
-
Improving Performances of Log Mining for Anomaly Prediction Through NLP-Based Log Parsing (19)
-
LPV: A Log Parser Based on Vectorization for Offline and Online Log Parsing (21)
-
An Efficient Log Parsing Algorithm Based on Heuristic Rules (30)
-
Paddy: An Event Log Parsing Approach using Dynamic Dictionary (21)
-
A Theoretical Framework for Understanding the Relationship Between Log Parsing and Anomaly Detection (25)
-
Spell: Online Streaming Parsing of Large Unstructured System Logs (36)
-
A Confidence-Guided Evaluation for Log Parsers Inner Quality (48)
-
LogStamp: Automatic Online Log Parsing Based on Sequence Labelling (23)
-
A Review of Unstructured Data Analysis and Parsing Methods (33)
-
OLMPT: Research on Online Log Parsing Method Based on Prefix Tree (13)
-
Unsupervised Noise Detection in Unstructured data for Automatic Parsing (21)
-
-
[Query 2]
log parsing survey
-
[Query 3]
log abstraction
-
Symptom-based Problem Determination Using Log Data Abstraction (37)
-
Unsupervised Event Abstraction using Pattern Abstraction and Local Process Models (15)
-
Automatic Event Log Abstraction to Support Forensic Investigation (28)
-
Event-Log Abstraction using Batch Session Identification and Clustering (20)
-
Practical Multi-pattern Matching Approach for Fast and Scalable Log Abstraction (15)
-
[Query 4]
log abstraction survey
— -
[Query 5]
event log parsing
-
Loghub: A Large Collection of System Log Datasets towards Automated Log Analytics (69)
-
Experience Report: System Log Analysis for Anomaly Detection (49)
-
LOGAIDER: A Tool for Mining Potential Correlations of HPC Log Events (23)
-
A Search-based Approach for Accurate Identification of Log Message Formats (36)
-
[Query 6]
log signature extraction
-
[Query 7]
event log signature extraction
—
- [Query Scopus]
TITLE-ABS-KEY(log AND parsing) OR ((logs OR log OR logging OR events OR "event log" OR "event logs" OR "event logs templates" OR "event log signatures" ) AND (abstractionOR parsing))
- Log and Execution Trace Analytics System (26)
- Virtual Knowledge Graphs for Federated Log Analysis (23)
- The Use of Template Miners and Encryption in Log Message Compression (39)
- LogEA: Log Extraction and Analysis Tool to Support Forensic Investigation of Linux-based System (27)
- On Automatic Parsing of Log Records (36)
- MoniLog: An Automated Log-Based Anomaly Detection System for Cloud Computing Infrastructures (38)
- An Improved KNN-Based Efficient Log Anomaly Detection Method with Automatically Labeled Samples (34)
- An Extensible Parsing Pipeline for Unstructured Data Processing (22)
- A Dynamic Processing Algorithm for Variable Data in Intranet Security Monitoring (14)
- METING: A Robust Log Parser Based on Frequent n-Gram Mining (19)
- Log Parser with One-to-One Markup (36)
- FastLogSim: A Quick Log Pattern Parser Scheme Based on Text Similarity (17)
- AECID-PG: A Tree-Based Log Parser Generator To Enable Log Analysis (13)
—