GitHub - brpy/ashrae-energy-prediction: A kaggle case study to predict energy consumption.

ASHRAE - Great Energy predictor III

Overview:

Introduction:

Modern buildings use a lot of Electrical energy for various applications like heating, cooling, steam and electrical usage. Modern Electrical retrofits can improve the Electrical power efficiency and reduce Electrical consumption and cost. But these retrofits in the scale of a large building can be quite expensive. So, instead of paying the price for retrofits upfront; customers are billed as if the retrofits were not fit for a certain period; till the costs are recovered. Let's assume these variables to be true_cost and assumed_cost ; where true_cost is the actual electrical consumption after retrofits and assumed_cost is the cost that the building would consume if there were no retrofits. The problem at hand is we don't have the assumed_cost information. This could be due to not having enough historical data for the particular building.

"Retrofitting" - refers to the process of updating older equipment with new technology.

But we have historical data of electrical consumption for over 1000 buildings over a year's period. The task is to predict the assumed_cost for a new building using the historical data provided for 1000 buildings.

This competition (https://www.kaggle.com/c/ashrae-energy-prediction/overview) was conducted on Kaggle by ASHRAE (American Society of Heating and Air-Conditioning Engineers) on October 15 2019.

Business problem:

A accurate model would provide better incentives for customers to switch to retrofits. Predicting accurately will smoothen the customer onboarding process and will result in increase in customers. This is because a good model's predictions will result in customers paying the bill amount almost equal to what they would've payed if there were no retrofits; resulting in customers not having to change their expenditures. There are no strict business requirements other than model being highly accurate. Latency requirements are also not high.

Check full abstract here

Machine learning

For Eda, featurization, check full blog here or the notebook directory.

Due to lack of harware resources and practical reasons, I have decided not to use leak data and not use complex ensembles (ensembles of ensembles) that most top competitors have used.

I have used various models starting from Linear Regression, Decision Tree, Lightgbm, Adaboost and also a Stacking Regressor. Out of these lgbm gave the best performance on unseen test data. It resulted in RMSLE of 1.360 on private LB and 1.146 on public lb. For scores of other models check images directory.

Deployment

The best model lgbm is then deployed on GCP using Flask and waitress. For frontend, I used vue js for form validation and conditional dropdown option of Building Id. The predictions were quick (~200ms) even on 2vcpu, 1GB Ram Debian vm.

Blog

Link to complete blog:

https://brpy.github.io/blog/projects/ashrae-energy-prediction

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
data		data
deployment		deployment
docs		docs
images		images
models		models
notebooks		notebooks
src		src
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ASHRAE - Great Energy predictor III

Overview:

Introduction:

Business problem:

Machine learning

Deployment

Blog

About

Languages

brpy/ashrae-energy-prediction

Folders and files

Latest commit

History

Repository files navigation

ASHRAE - Great Energy predictor III

Overview:

Introduction:

Business problem:

Machine learning

Deployment

Blog

About

Topics

Resources

Stars

Watchers

Forks

Languages