🔴 IMPORTANT The unsupervised Webhawk is now available as independent projet. Check it out at https://github.com/slrbl/unsupervised-learning-attack-detection-webhawk-catch
Machine Learning based web attacks detection.
Webhawk is an open source machine learning powered Web attack detection tool. It uses your web logs as training data. Webhawk offers a REST API that makes it easy to integrate within your SoC ecosystem. To train a detection model and use it as an extra security level in your organization, follow the following steps.
python -m venv webhawk_venv
source webhawk_venv/bin/activate
pip install --upgrade pip
pip install -r requirements.txt
Copy settings_template.conf file to settings.conf and fill it with the required parameters as the following.
[MODEL]
model:MODELS/the_model_you_will_train.pkl
[FEATURES]
features:length,params_number,return_code,size,upper_cases,lower_cases,special_chars,url_depth
Encoding is automatic for the unsupervised mode. You just need to run the catch.py script. Get inspired from this example:
python catch.py -l ./SAMPLE_DATA/raw-http-logs-samples/may_oct_2022.log -t apache -j 10000 -s 5
python encode.py -a -l ./SAMPLE_DATA/raw-http-logs-samples/aug_sep_oct_2021.log -d ./SAMPLE_DATA/labeled-encoded-data-samples/aug_sep_oct_2021.csv
Please note that two already encoded data files are available in ./SAMPLE_DATA/labeled-encoded-data-samples/, in case you would like to move directly to the next step.
Use the http log data from May to July 2021 to train a model, and test it with the data from August to October 2021.
python train.py -a 'dt' -t ./SAMPLE_DATA/labeled-encoded-data-samples/may_jun_jul_2021.csv -v ./SAMPLE_DATA/labeled-encoded-data-samples/aug_sep_oct_2021.csv
python predict.py -m 'MODELS/the_model_you_will_train.pkl' -t 'apache' -l '198.72.227.213 - - [16/Dec/2018:00:39:22 -0800] "GET /self.logs/access.log.2016-07-20.gz HTTP/1.1" 404 340 "-" "python-requests/2.18.4"'
In order to use the API to need first to launch it's server as the following
python -m uvicorn api:app --reload --host 0.0.0.0 --port 8000
You can use the following code which based on Python 'requests' (the same in test_api.py) to make a prediction using the REST API
import requests
import json
headers = {
'accept': 'application/json',
'Content-Type': 'application/json',
}
data = {
'log_type':'apache',
'http_log_line': '187.167.57.27 - - [15/Dec/2018:03:48:45 -0800] "GET /honeypot/Honeypot%20-%20Howto.pdf HTTP/1.1" 200 1279418 "http://www.secrepo.com/" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/534.24 (KHTML, like Gecko) Chrome/61.0.3163.128 Safari/534.24 XiaoMi/MiuiBrowser/9.6.0-Beta"'
}
response = requests.post('http://127.0.0.1:8000/predict', headers=headers, data=json.dumps(data))
print(response.text)
It will return the following:
{"prediction":"0","confidence":"0.9975490196078431","log_line":"187.167.57.27 - - [15/Dec/2018:03:48:45 -0800] \"GET /honeypot/Honeypot%20-%20Howto.pdf HTTP/1.1\" 200 1279418 \"http://www.secrepo.com/\" \"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/534.24 (KHTML, like Gecko) Chrome/61.0.3163.128 Safari/534.24 XiaoMi/MiuiBrowser/9.6.0-Beta\""}
To launch the prediction server using docker
docker compose build
docker compose up
The data you will find in SAMPLE_DATA folder comes from
https://www.secrepo.com.
https://www.kaggle.com/datasets/eliasdabbas/web-server-access-logs https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/3QBYB5
Details on how this tool is built could be found at http://enigmater.blogspot.fr/2017/03/intrusion-detection-based-on-supervised.html
Nothing for now.
Silhouette Effeciency
https://bioinformatics-training.github.io/intro-machine-learning-2017/clustering.html
Optimal Value of Epsilon
https://towardsdatascience.com/machine-learning-clustering-dbscan-determine-the-optimal-value-for-epsilon-eps-python-example-3100091cfbc
Max curvature point
https://towardsdatascience.com/detecting-knee-elbow-points-in-a-graph-d13fc517a63c
All feedbacks, testing and contribution are very welcome! If you would like to contribute, fork the project, add your contribution and make a pull request.