Model Identification and Data Analysis course project at University of Pavia
Developed in collaboration with: @simoneghiazzi, @riccardocrescenti, @chiarabertocchi and @lucacolombo97
Goal: identification of an annual profile model for the long-term prediction of the Italian energy consumption time series
The provided dataset is composed of Italian energy consumption data for a two-year period.
From initial observations of the available dataset, there is a periodic pattern, both annual and weekly. For this reason, Fourier Series will be used in the development of the model.
The first step is to make the series stationary on average through the operation of detrending
For what concern the training and the validation of the model, the dataset is divided as follows:
- Training: energy consumption data of the first year
- Validation: second year energy consumption data
2 main models have been developed for the 2 different periodicities detected in the data:
- Weekly periodicity model: Phi_settimanale consisting of 6 harmonics, of period 7
- Annual periodicity model: 12 annual models were developed, up to 24 harmonics
A new validation Phi was created for the weekly model, using the weekly days of validation data: 12 final models were created consisting of the sum of the weekly validation model and the annual models created in the training phase.
Using the AIC and Crossvalidation tests, the model that best represents the data is chosen.
AIC Test:
CrossValidation Test:
From the tests we see that the best annual model is model 10, consisting of 20 regressors.
For this model we calculated:
- MSE = 3.836955832
- RMSE = 1.958814904
Validation Data 3D Plot:
Final Model Surface:
An analysis of the error histogram shows a concentration of errors around zero. However, there is an "anomalous" zone between -6 and -10, which represents the errors found in correspondence with the holidays.
As can be seen from the graph, the periods of greatest fluctuation in the validation epsilon (which represents the magnitude of the error) are those in correspondence with holidays, where:
- Blue: Easter
- Yellow: mid-August
- Red: Christmas
The final model was then retrained on the data without the holiday periods (Christmas and mid-August holidays, which have a fixed date) to improve the prediction of "normal" days. An average was made on the data values assumed during these holiday periods, which was then added to the final model as a "correction index".
In this way the validation parameters improve:
- SSR = 1.186772448087091e+03
- MSE = 3.251431364622168
- RMSE = 1.803172583149535
The final function takes in input 2 scalars (day of the year, day of the week) and returns the prediction of energy consumption. It consists of:
- A method for solving null values
- Detrending technique: estimation of the trend of the 2 years
- Identification of the model on the 2 years supplied data
- Generation of the matrix containing all possible combinations day year - day week
- Trend extension: extension of the last value of the trend that is added to the data of forecast
The forecast data is then read from the matrix using the 2 input indices.