This repository provides regression models to 'estimate' the most likely values for imputation purposes. The models are statistically representative of the German non-domestic building stock and cover DIBS input variables with a larger probability of missing values. For a better understanding, the models' summaries are provided in the .txt files, while the R-based models are provided via .rds files.
Further, also the scripts used for the development of these estimation models are provided. BE_imputation.R is the main script, testing the individual variables vor missing values and describing the general approch of tackeling the imputation. This script utilises the scripts available in the einzel_imputationen folder. Those each include the deveolopment of the variabel esimators and the application for the imputation. Last the Determination-of-full-datasets folder holds the script used to analyse the overall extend of missing variables.
These models were developed using the interview data set of the ENOB:DataNWG Project https://datanwg.de. The development was undertaken within the PhD project of Julian Bischof. Details are published with the Ph.D. thesis (coming soon).
Due to data protection reasons, the full data set on which the model development is based can not be provided. However, the data set can be partly accessed and analysed to a certain extent via https://datanwg.de/forschungsdatenbank/. More information on the Dynamic ISO Building Simulator (DIBS) is available here: https://iwugermany.github.io/dibs/overview
The tables below summarise the applied imputation approaches, dependent on the type of variable, and a list of DIBS input and auxiliary variables with their imputation approach taken.
null | Imputation Type | Use Case | Scale | Pseudo-Code | Example of estimation algorithm in R |
---|---|---|---|---|---|
1 | Calculation based on physical or known statistical relations | Metric variables, directly quantifiable via physical relationships of available variables. | metrical | a = b + c | aw_fl = Fassendfläche(f_fl_geo_iwu) - Fensterfläche(fen_fl) |
2 | Linear (singular and multiple) weighted regression estimation | Metric variables, not directly quantifiable via physical relationships of available variables. | metrical | lm(x~a:b, weights = Y) | lm(n_og ~ as.factor(uk_geb) : geb_f_hoehe_mittel_iwu : as.factor(dachform_be), weights = (HRF/Sum_HRF_BE)*N_Clean) |
3 | Individual - quasi-binominal (quasi-logistic) - regression-based - dice - weighted | Nominal or ordinal scaled variables with only two characteristics. | nominal or ordinal | gml(x~a*b, weights = Y, family = quasibinomial) -> dice | glm(qh3_1 ~ as.factor(hk_geb) * as.factor(bak_grob), data = DB_BE_clean, family=quasibinomial, weights = (HRF/Sum_HRF_BE_clean)*N_Clean) |
4 | Individual - multinominal - binominal (logistic) - regression-based - dice - weighted | Nominal or ordinal scaled variables with more than two characteristics. | nominal or ordinal | multinom(y~a*b, weights = Y) -> dice | multinom (lampenart ~ as.factor(hk_geb) * as.factor(bak_grob), data = DB_BE_clean, weights = (HRF/Sum_HRF_BE_clean) * N_Clean, maxit = 1000, MaxNWts = 1000) |
5 | Cleaning up -8 if non-existent | Metric variables, where "Does not apply" equals a zero value. | metrical | x[x==-8] <- 0 | DB_BE$glasart_1[DB_BE$glasart_1==(-8)] <- 0 |
6 | Assumption of values based on related variable | Variables with available closely related imputed variables, that can only be estimated based on a small training data set, causing great uncertainties. | all | x[x==-7] <- y[x==-7] | DB_BE$unter_hoehe[DB_BE$unter_hoehe==(-7)] <- DB_BE$q66a_1[DB_BE$unter_hoehe==(-7)]/100 |
Interview Phase Variable Name | DIBS Variable Name | Description | Applied Imputation Type |
---|---|---|---|
scr_gebaeude_id | scr_gebaeude_id | Building-ID | None - No Missings |
plz | plz | Postal Code | None - No Missings |
hk_geb | hk_geb | Main Building-Usage-Category | None - No Missings |
uk_geb | uk_geb | Sub Building-Usage-Category | None - No Missings |
bak_grob | bak_grob | Time periode of building construction (rough) | None - No Missings |
geb_f_flaeche_n_iwu | geb_f_flaeche_n_iwu | Northern building facade area | None - No Missings |
geb_f_flaeche_o_iwu | geb_f_flaeche_o_iwu | Eastern building facade area | None - No Missings |
geb_f_flaeche_s_iwu | geb_f_flaeche_s_iwu | Southern building facade area | None - No Missings |
geb_f_flaeche_w_iwu | geb_f_flaeche_w_iwu | Western building facade area | None - No Missings |
geb_f_hoehe_mittel_iwu | building_height | Average building height | None - No Missings |
nrf_2 | net_room_area | Netto room area | None - No Missings |
ebf | energy_ref_area | Energy reference area | None - No Missings |
qd1 | Fen_ant | Window area share of building facade | None - No Missings |
d_fl_be | roof_area | Roof area | None - No Missings |
n_ug | n_UG | Average number of floors below ground | None - No Missings |
lampenart | lampenart_be | Predominant lamp type | 4 |
q25_1 | max_occupancy | Maximal number of building occupants | 2 |
aw_fl | wall_area_og | Building wall area above ground | 1 |
n_og | n_OG | Average number of floors above ground | 2 |
unteraw_fl | wall_area_ug | Building wall area below ground | 1 |
qf1 | qF1 | Predominant lighting type | 4 |
glasart_1 | glass_solar_transmittance | Energy transmittance rate of the glassing | 5 |
aw_konstr_1 | thermal_capacitance | Construction type of facade (thermal capacity) | 4 |
u_aw | u_walls | U-Value of facade (opak) | 2 |
u_fen | u_windows | U-Value of windows | 2 |
qh1 | qH1 | Type of ventilation | 4 |
qh3_1 | qH3 | Heat recovery efficiency of the ventilation system | 3 |
freie_kuehlung | night_flushing_flow | Usage of natural ventilation cooling during the night | 4 |
w_erz_art_et | heating_supply_system | Type of heating supply system | 4 |
qg13 | heating_emission_system | Type of heating emission system | 4 |
k_erz_art_rk | cooling_supply_system | Type of cooling supply system | 4 |
qi11 | cooling_emission_system | Type of cooling emission system | 4 |
u_ug | u_base | U-Value of base plate | 2 |
qd8 | qD8 | Energy transmittance rate of the glassing in case of active sun shading | 4 |
d_u_ges | u_roof | U-Value of roof (transparent + opak) | 2 |
qg21 | dhw_system | Type of hot water supply | 4 |
Interview Phase Variable Name | Used for Imputation of Variable | Description | Applied Imputation Type |
---|---|---|---|
geb_flaeche | bfg; d_fl_wueoa | Net area of the ground projection of the building | None - No Missings |
f_ant_beheizt | ebf | Share of conditioned (heated) building area | None - No Missings |
bak | aw_daemm_staerke_1; aw_flantgedges; u_fen; qh1; f_ant_belueftet; f_ant_gekuehlt; qg13; k_erz_art_rk; qi11; u_ug; d_u_ges | Time periode of building construction | None - No Missings |
geb_f_umfang | unteraw_fl | Circumference of building | None - No Missings |
f_fl_geo_iwu | aw_fl | Ground area covert by building | None - No Missings |
bgf | q25_1 | Gross room base area | None - No Missings |
fen_fl | aw_fl | Window area | 2 |
dachform_be | n_og; d_fl_wueoa | Roof type | 3 |
q66a_1 | unter_hoehe; qf1; | Clear room height ob above ground levels | 2 |
unter_hoehe | unteraw_fl | Clear room height of underground levels | 6 |
aw_daemm_staerke_1 | aw_flantgedges; u_aw | Primary wall construction insulation thickness | 2 |
aw_flantgedges | u_aw | Share of envelope wall insulated | 2 |
qi1 | f_ant_gekuehlt | Possesses the building a central cooling system | 4 |
f_ant_gekuehlt | qi11 | Share of building usage cooled by central cooling system | 2 |
w_erz_art | w_erz_art_et | Main heat generator | 4 |
energietraeger | w_erz_art_et | Main energy carrier | 4 |