The integration of information and communication technologies into the power generation, transmission and distribution system provides a new concept called Smart Grid (SG). The wide variety of devices connected to the SG communication infrastructure generates heterogeneous data with different Quality of Service (QoS) requirements and communication technologies. Hence, this project aims to design a robust IDS dealing with anomaly SG data and impose the proper defence algorithm to alert the network. An intrusion Detection System (IDS) is a surveillance system monitoring the traffic flow over the network, seeking any abnormal behaviour to detect possible intrusions or attacks against the SG system.
Data traffic is captured, and passed through the pre-processing stage, where is normalized using either the min-max normalization or the standard scaler technique. The second pre-processing step is to encode non-numeric features using one-hot encoding. The final pre-processing step is either feature selection or dimensionality reduction. The next stage is to classify if the data traffic captured as either normal or malicious, by using either Spiking Neural Networks (SNN) or other traditional machine learning techniques such as Decision Tree (DT), Random Forest (RF), Multi-layer Perceptron (MLP), Gradient-boosting Classifier (GBC) and K nearest neighbors (K-NN). Last step is the evaluation of the classfier.
SNNs are a kind of artificial neural networks, with the difference being that rather than having the traditional artificial neuron (e.g. McCulloch-Pits) they trade it for a spiking neuron. Spiking neurons, produce a weighted sum of inputs but instead of forwarding the result into an activation function (e.g. sigmoid, ReLU), this sum contributes to the membrane potential U(t) of the neuron. The main condition is when U(t) passes a pre-defined threshold the neuron will emit a spike to successive connections. Figure above illustrates the architecture of a single spiking neuron. From the left image we see the implicit recurrence (i.e the decay part) and V (explicit recurrence) is the multiplication of Sout[t] and -θ. The right image shows an unrolled iteration of how the neuron operates.
Jason K. Eshraghian, Max Ward, Emre Neftci, Xinxin Wang, Gregor Lenz, Girish Dwivedi, Mohammed Bennamoun, Doo Seok Jeong, and Wei D. Lu “Training Spiking Neural Networks Using Lessons From Deep Learning”. arXiv preprint arXiv:2109.12894, September 2021.
Starting from the bottom up, at the home are network level (green region), where each household has its own smart meter. Smart meters, gather information concerning the electricity consumption, as well as communicating with other smart appliances installed at home. The user can login the SM to view these information, however an intruder can do the same. Introducing an IDS in that first level, we apply a first layer of defence against attacks.
Another layer of defence can be applied in the neighbourhood area network (blue region). Each NAN has a data concentrator where information from nearby SMs is collected. Thus, an IDS can be deployed in this level in order to analyse the information and detect intrusions that may pass the SM IDS.
The creation of machine learning models demands training the various models using data. Gathering these data can be a difficult task. To be more specific in order to train an IDS that is going to be deployed in a SM there is the need of collecting huge amounts of data traffic from various SMs located in different physical locations. However due to the arising user pri-vacy and data security awareness new ways have to be paved in order to achieve creating an IDS that the data to be used in training are stored as “data islands” in many physical locations.
For this dissertation the concept of HFL (horizontal federated learning) was utilised, where devices have the same feature space but different space samples. Figure illustrates HFL architecture and steps. Each device (client) has its own data, and therefore at the start point each device trains its own model. The weights of every model are encrypted and transmitted to the server where they are combined. The server then sends an aggregated new model to every device. Finally devices update their model with the new weights.
There are multiple ways that the server aggregates the weights. For this dissertation the FedAVG technique was used which is the most usual technique. FedAVG calculates the aver-age of the weights.
HFL can be applied in neighbourhoods of SMs. Most of the smart meters produce data in the same feature space. Finally there is no exchange of data from the SMs which is another advantage since a main concern is data privacy and protection..
Q. Yang, Y. Liu, T. Chen and Y. Tong, “Federated Machine Learning: Concept and Applications,” ACM Transactions on Intelligent Systems and Technology, vol. 10, no. 2, pp. 1-19, 2019.
It is highly recommended to use Colaboratory (Colab) to run the notebooks, because it allows to write and execute Python code in a browser with:
- Zero configuration required
- Free access to GPUs and TPUs
- Most libraries pre-installed
- Only one requirement, a google account
- Most common Machine Learning frameworks pre-installed and ready to use
Note: if you are not going to use Google Colab you will need to make sure that you satisfy the below requirements
- SNNtorch (>= 0.5.1)
- PyTorch (>= 1.11.0)
- Numpy (>= 1.21.6)
- Pandas (>= 1.3.5)
- Seaborn (>= 0.11.2)
- Matplotlib (>= 3.2.2)
- Sklearn (>= 1.0.2)
- Flwr (== 0.19.0)
- Openml (== 0.12.2)
In order to prepare your data follow the steps below:
- Download one of the following scripts depending on the desired experiment binary_classification_std_scaler, binary_classification_minmax_scaler, multiclass_classification_std_scaler, multiclass_classification_minmax_scaler
Note: Alternatively launch the desired script using the launch button
-
If you want to process the NSLKDD dataset in a different way you can download it from here
-
Open Colab and sign in to your Google account. If you do not have a Google account, you can create one here.
-
Go to File > Upload notebook > Choose file and browse to find the downloaded notebook. If you have already uploaded the notebook to Colab you can open it with File > Open notebook and choose the desired notebook.
In order to train a SNN model follow the steps below:
- Download the spiking_neural_network.ipynb.
Note: Alternatively launch the spiking_neural_network.ipynb through the launch button
-
Open Colab and sign in to your Google account. If you do not have a Google account, you can create one here.
-
Go to File > Upload notebook > Choose file and browse to find the downloaded notebook file spiking_neural_network.ipynb. If you have already uploaded the notebook to Colab you can open it with File > Open notebook and choose spkiking_neural_network.ipynb.
-
Once the notebook is loaded, go to Runtime > Change runtime type and from the dropdown menu, under Hardware accelerator, choose GPU and click Save.
-
Now you can begin the experiments. All you have to do is to upload the dataset you want and set the parameters in the cell under Datasets section.
-
To train the model go to Runtime > Run all or click on the first cell and use Shift + Enter to execute each cell one by one.
-
The hyper parameters of the model can be modified in the cell under Set Train Arguments section.
- bsize: Batch Size
- nhidden: Number of hidden nodes
- nsteps: Number of input time steps
- b: beta/decay factor of membrane potential
- learning_rate: Learninig Rate of optimizer
- nepochs: Number of training epochs
-
Download either the binary or multiclass classification python script.
-
Put the correct paths to the test and train datasets.
-
Execute the script.
Note: No need to assign values to hyperparameters. The script uses gridsearchCV using two-fold cross validation to find the best hyperparameters from a given list
Note: For this part it is better to follow the documentation provided by Flower here
- Open the experiment folder you want to recreate: MLP , SNN or LogReg.
- Download the files of the experiment.
Note: The client and server scripts do not contain RSA encryption. For utilising encryption as well follow the documentation provided by flower
- Open the terminal and make sure you satisfy the requirements needed to run the experiments
- Set server variables:
- Set global test path
- MLP: network variables & batch_size
- SNN: network variables & batch_size
- LogReg: alter logistic regression parameters if you wish
- Set client variables:
- Set local train and test paths
- MLP: network variables & batch_size
- SNN: network variable & batch_size
- LogReg: alter logistic regression parameters if you wish
- One terminal is needed for the server and one terminal is needed for every client (alternatively a script can be created)