Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release v0.14.2 #504

Merged
merged 48 commits into from
Apr 10, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
48 commits
Select commit Hold shift + click to select a range
9f2ce4b
Merge pull request #476 from OpenEnergyPlatform/production
FlorianK13 Nov 23, 2023
a93ba5f
Fix and add URL of example projects in readme
nesnoj Jan 8, 2024
4f4dc43
Add unreleased section to changelog
nesnoj Jan 8, 2024
35a4b72
Update CHANGELOG.md
nesnoj Jan 8, 2024
b6b47fa
Merge pull request #481 from nesnoj/update-urls-example-projects
FlorianK13 Jan 9, 2024
cbd1fd5
Update CHANGELOG.md
chrwm Jan 17, 2024
ac18113
Merge branch 'develop' into hotfix-482-changed-data-types
chrwm Jan 17, 2024
0620a0e
Merge pull request #484 from OpenEnergyPlatform/hotfix-482-changed-da…
chrwm Jan 17, 2024
38da6e2
remove webscraping for URL
Johann150 Feb 24, 2024
e1bfe42
update changelog
Johann150 Feb 25, 2024
123f510
Correct PR number in Changelog
FlorianK13 Mar 6, 2024
db97681
Add a local testing file to gitignore #485
FlorianK13 Mar 8, 2024
247c060
Replace deprecated pandas function #485
FlorianK13 Mar 8, 2024
3a690eb
Update black as dependency #485
FlorianK13 Mar 8, 2024
ffcab27
Update changelog #485
FlorianK13 Mar 8, 2024
0440c93
Update pandas version #485
FlorianK13 Mar 8, 2024
1b55915
consider possibility of URL not being found
Johann150 Mar 9, 2024
13f099c
Merge pull request #488 from Johann150/develop
FlorianK13 Mar 12, 2024
2ad6ebe
Update python version #494
FlorianK13 Mar 12, 2024
5f73124
Correct error handling for wrong xml syntax #494
FlorianK13 Mar 12, 2024
d1cd342
Update Changelog #494
FlorianK13 Mar 12, 2024
354b08c
Merge branch 'develop' into bugfix-485-depcreation-pandasdataframeapp…
FlorianK13 Mar 12, 2024
7728eb4
Merge pull request #495 from OpenEnergyPlatform/bugfix-494-handle-xml…
FlorianK13 Mar 28, 2024
a051050
Merge branch 'develop' into bugfix-485-depcreation-pandasdataframeapp…
FlorianK13 Mar 28, 2024
e9aa3d8
Merge pull request #491 from OpenEnergyPlatform/bugfix-485-depcreatio…
FlorianK13 Mar 28, 2024
933ffa0
Add ReserveartNachDemEnWG
chrwm Apr 2, 2024
d23d4ef
Add DatumUeberfuehrungInReserve #496
chrwm Apr 2, 2024
3eb37f0
Add ReserveartNachDemEnWG #496
chrwm Apr 2, 2024
269bd18
Add DatumUeberfuehrungInReserve #496
chrwm Apr 2, 2024
8c33f84
Add DatumUeberfuehrungInReserve & ReserveartNachDemEnWG to all Extend…
chrwm Apr 2, 2024
39170fa
Rename InAnspruchGenommeneLandwirtschaftlichGenutzteFlaeche #496
chrwm Apr 2, 2024
8cb3fc6
Delete AnzeigeEinerStilllegung #496
chrwm Apr 2, 2024
99b71b7
Delete ArtDerStilllegung #496
chrwm Apr 2, 2024
ca98ae5
Delete DatumBeginnVorlaeufigenOderEndgueltigenStilllegung #496
chrwm Apr 2, 2024
533df9e
Rename ZugeordneteWirkleistungWechselrichter #496
chrwm Apr 2, 2024
acce3c6
Update CHANGELOG.md #496
chrwm Apr 2, 2024
28c5593
Add DatumEndgueltigeStilllegung #496
chrwm Apr 2, 2024
0a3da33
Add DatumEndgueltigeStilllegung #496
chrwm Apr 2, 2024
4443091
Delete unused docker file #500
FlorianK13 Apr 2, 2024
e0ec101
Add WebportalDesNetzbetreibers & RegisternummerPraefix #496
chrwm Apr 2, 2024
0e4baef
Update Changelog #500
FlorianK13 Apr 2, 2024
0997147
Update column translations #496
chrwm Apr 2, 2024
6113ad3
Merge pull request #501 from OpenEnergyPlatform/bugfix-500-remove-unu…
FlorianK13 Apr 2, 2024
bce6325
Merge pull request #499 from OpenEnergyPlatform/feature-496-review-we…
chrwm Apr 4, 2024
c806b9a
Bump to version & update release date
chrwm Apr 4, 2024
935350a
Update CHANGELOG.md
chrwm Apr 4, 2024
0dc9913
Update CHANGELOG.md
chrwm Apr 4, 2024
ef5dc7e
Update release date #504
FlorianK13 Apr 10, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .bumpversion.cfg
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
[bumpversion]
current_version = 0.14.1
current_version = 0.14.2
parse = (?P<major>\d+)\.(?P<minor>\d+)\.(?P<patch>\d+)((?P<release>(a|na))+(?P<build>\d+))?
serialize =
{major}.{minor}.{patch}{release}{build}
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/ci-production.yml
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ jobs:
- name: create package
run: python setup.py sdist
- name: import open-mastr
run: python -m pip install ./dist/open_mastr-0.14.1.tar.gz
run: python -m pip install ./dist/open_mastr-0.14.2.tar.gz
- name: Create credentials file
env:
MASTR_TOKEN: ${{ secrets.MASTR_TOKEN }}
Expand Down
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,6 @@
# own testing files
tmptest.py

# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
Expand Down
2 changes: 1 addition & 1 deletion .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,4 +3,4 @@ repos:
rev: 22.6.0
hooks:
- id: black
language_version: python3.10
language_version: python3.11
12 changes: 11 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,17 @@ For each version important additions, changes and removals are listed here.
The format is inspired from [Keep a Changelog](http://keepachangelog.com/en/1.0.0/)
and the versioning aims to respect [Semantic Versioning](http://semver.org/spec/v2.0.0.html).

## [v0.14.1] Hotfix - 2023-01-17
## [v0.14.2] Maintenance - 2024-04-10
### Changed
- Fix and add URLs of example projects in readme [#481](https://github.com/OpenEnergyPlatform/open-MaStR/pull/481)
- No longer require web scraping for bulk download [#488](https://github.com/OpenEnergyPlatform/open-MaStR/pull/488)
- Replace deprecated pandas map function [#491](https://github.com/OpenEnergyPlatform/open-MaStR/pull/491)
- Fix the handling of corrupted xml syntax in the downloaded files [#494](https://github.com/OpenEnergyPlatform/open-MaStR/pull/494)
- Implement relevant API WSDL Patchnotes V24.1.128 [#499](https://github.com/OpenEnergyPlatform/open-MaStR/pull/499)
### Removed
- Remove unused Docker File [#501](https://github.com/OpenEnergyPlatform/open-MaStR/pull/501)

## [v0.14.1] Hotfix - 2024-01-17
### Changed
- Change data type of NetzbetreiberpruefungStatus to string [#483](https://github.com/OpenEnergyPlatform/open-MaStR/pull/483)

Expand Down
4 changes: 2 additions & 2 deletions CITATION.cff
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ authors:
title: "open-MaStR"
type: software
license: AGPL-3.0
version: 0.14.1
version: 0.14.2
doi:
date-released: 2024-01-17
date-released: 2024-04-10
url: "https://github.com/OpenEnergyPlatform/open-MaStR/"
4 changes: 2 additions & 2 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -110,8 +110,8 @@ changes in a `Pull Request <https://github.com/OpenEnergyPlatform/open-MaStR/pul

- `PV- und Windflächenrechner <https://www.agora-energiewende.de/service/pv-und-windflaechenrechner/>`_
- `Wasserstoffatlas <https://wasserstoffatlas.de/>`_
- `EE-Status App <https://ee-status.herokuapp.com/>`_

- `EE-Status App <https://ee-status.de/>`_
- `Digiplan Anhalt <https://digiplan.rl-institut.de/>`_


Collaboration
Expand Down
7 changes: 0 additions & 7 deletions open_mastr/utils/Dockerfile.postgis

This file was deleted.

6 changes: 5 additions & 1 deletion open_mastr/utils/constants.py
Original file line number Diff line number Diff line change
Expand Up @@ -310,7 +310,7 @@
"AuflagenAbschaltungTierschutz": "requirementShutdownAnimalProtection",
"AnlagenkennzifferAnlagenregister_nv": "plantIdentificationNumberRegister_nv",
"BiogasDatumLeistungserhoehung": "biogasCapacityIncreaseDate",
"InAnspruchGenommeneAckerflaeche": "areaOfAgriculturalLandInUse",
"InAnspruchGenommeneLandwirtschaftlichGenutzteFlaeche": "areaOfAgriculturalLandInUse",
"Aktenzeichen": "fileReference",
"NetzbetreiberpruefungStatus": "gridOperatorCheckStatus",
"AnlageBetriebsstatus": "plantOperatingStatus",
Expand Down Expand Up @@ -511,4 +511,8 @@
"MastrNummer": "mastrNumber",
"Kuestenentfernung": "distanceToCoast",
"eegAusschreibungZuschlag": "eegAuctionBidAward",
"DatumUeberfuehrungInReserve": "dateTransferToReserve",
"ReserveartNachDemEnWG": "typeOfReserveFromEnWG",
"WebportalDesNetzbetreibers": "webPortalGridOperator",
"RegisternummerPraefix": "registerNumberPrefix",
}
18 changes: 8 additions & 10 deletions open_mastr/utils/orm.py
Original file line number Diff line number Diff line change
Expand Up @@ -134,6 +134,8 @@ class Extended(object):
PraequalifiziertFuerRegelenergie = Column(Boolean)
GenMastrNummer = Column(String)
Netzbetreiberzuordnungen = Column(String)
ReserveartNachDemEnWG = Column(String)
DatumUeberfuehrungInReserve = Column(Date)
# from bulk download
Hausnummer_nv = Column(Boolean)
Weic_nv = Column(Boolean)
Expand Down Expand Up @@ -185,7 +187,7 @@ class SolarExtended(Extended, ParentAllTables, Base):
NebenausrichtungNeigungswinkel = Column(String)
InAnspruchGenommeneFlaeche = Column(Float)
ArtDerFlaeche = Column(String)
InAnspruchGenommeneAckerflaeche = Column(Float)
InAnspruchGenommeneLandwirtschaftlichGenutzteFlaeche = Column(Float)
Nutzungsbereich = Column(String)
Buergerenergie = Column(Boolean)
EegMastrNummer = Column(String)
Expand All @@ -202,16 +204,12 @@ class BiomassExtended(Extended, ParentAllTables, Base):
EegMastrNummer = Column(String)
KwkMastrNummer = Column(String)


class CombustionExtended(Extended, ParentAllTables, Base):
__tablename__ = "combustion_extended"

NameKraftwerk = Column(String)
NameKraftwerksblock = Column(String)
DatumBaubeginn = Column(Date)
AnzeigeEinerStilllegung = Column(Boolean)
ArtDerStilllegung = Column(String)
DatumBeginnVorlaeufigenOderEndgueltigenStilllegung = Column(Date)
SteigerungNettonennleistungKombibetrieb = Column(Float)
AnlageIstImKombibetrieb = Column(Boolean)
MastrNummernKombibetrieb = Column(String)
Expand All @@ -230,7 +228,6 @@ class CombustionExtended(Extended, ParentAllTables, Base):
Technologie = Column(String)
AusschliesslicheVerwendungImKombibetrieb = Column(Boolean)


class GsgkExtended(Extended, ParentAllTables, Base):
__tablename__ = "gsgk_extended"

Expand All @@ -244,9 +241,6 @@ class HydroExtended(Extended, ParentAllTables, Base):

NameKraftwerk = Column(String)
ArtDerWasserkraftanlage = Column(String)
AnzeigeEinerStilllegung = Column(Boolean)
ArtDerStilllegung = Column(String)
DatumBeginnVorlaeufigenOderEndgueltigenStilllegung = Column(Date)
MinderungStromerzeugung = Column(Boolean)
BestandteilGrenzkraftwerk = Column(Boolean)
NettonennleistungDeutschland = Column(Float)
Expand Down Expand Up @@ -274,7 +268,7 @@ class StorageExtended(Extended, ParentAllTables, Base):
Notstromaggregat = Column(Boolean)
BestandteilGrenzkraftwerk = Column(Boolean)
NettonennleistungDeutschland = Column(Float)
ZugeordnenteWirkleistungWechselrichter = Column(Float)
ZugeordneteWirkleistungWechselrichter = Column(Float)
NutzbareSpeicherkapazitaet = Column(Float)
SpeMastrNummer = Column(String)
EegMastrNummer = Column(String)
Expand Down Expand Up @@ -510,6 +504,7 @@ class GasStorageExtended(ParentAllTables, Base):
DatumBeginnVoruebergehendeStilllegung = Column(Date)
DatumDesBetreiberwechsels = Column(Date)
DatumRegistrierungDesBetreiberwechsels = Column(Date)
DatumEndgueltigeStilllegung = Column(Date)


class StorageUnits(ParentAllTables, Base):
Expand Down Expand Up @@ -570,6 +565,7 @@ class GasProducer(ParentAllTables, Base):
FlurFlurstuecknummern = Column(String)
GeplantesInbetriebnahmedatum = Column(Date)
DatumBeginnVoruebergehendeStilllegung = Column(Date)
DatumEndgueltigeStilllegung = Column(Date)


class GasConsumer(ParentAllTables, Base):
Expand Down Expand Up @@ -734,6 +730,8 @@ class MarketActors(ParentAllTables, Base):
Stromgrosshaendler = Column(Boolean)
MarktakteurVorname = Column(String)
MarktakteurNachname = Column(String)
WebportalDesNetzbetreibers = Column(String)
RegisternummerPraefix = Column(String)


class Grids(ParentAllTables, Base):
Expand Down
2 changes: 1 addition & 1 deletion open_mastr/xml_download/utils_cleansing_bulk.py
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ def replace_mastr_katalogeintraege(
.apply(lambda x: x.str.strip())
.replace("", None)
.astype("Int64")
.applymap(katalogwerte.get)
.map(katalogwerte.get)
.agg(lambda d: ",".join(i for i in d if isinstance(i, str)), axis=1)
.replace("", None)
)
Expand Down
83 changes: 69 additions & 14 deletions open_mastr/xml_download/utils_download_bulk.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,30 +5,73 @@

import numpy as np
import requests
from bs4 import BeautifulSoup
from tqdm import tqdm

# setup logger
from open_mastr.utils.config import setup_logger

log = setup_logger()

def gen_version(when: time.struct_time = time.localtime()) -> str:
"""
Generates the current version.

The version number is determined according to a fixed release cycle,
which is by convention in sync with the changes to other german regulatory
frameworks of the energysuch as GeLI Gas and GPKE.

The release schedule is twice per year on 1st of April and October.
The version number is determined by the year of release and the running
number of the release, i.e. the release on April 1st is release 1,
while the release in October is release 2.

def get_url_from_Mastr_website() -> str:
"""Get the url of the latest MaStR file from markstammdatenregister.de.
Further, the release happens during the day, so on the day of the
changeover, the exported data will still be in the old version/format.

The file and the corresponding url are updated once per day.
The url has a randomly generated string appended, so it has to be
grabbed from the marktstammdatenregister.de homepage.
For further details visit https://www.marktstammdatenregister.de/MaStR/Datendownload
see <https://www.marktstammdatenregister.de/MaStRHilfe/files/webdienst/Release-Termine.pdf>

Examples:
2024-01-01 = version 23.2
2024-04-01 = version 23.2
2024-04-02 = version 24.1
2024-09-30 = version 24.1
2024-10-01 = version 24.1
2024-10-02 = version 24.2
2024-31-12 = version 24.2
"""

html = requests.get("https://www.marktstammdatenregister.de/MaStR/Datendownload")
soup = BeautifulSoup(html.text, "lxml")
# find the download button element on the website
element = soup.find_all("a", "btn btn-primary text-right")[0]
# extract the url from the html element
return str(element).split('href="')[1].split('" title')[0]
year = when.tm_year
release = 1

if when.tm_mon < 4 or (when.tm_mon == 4 and when.tm_mday == 1):
year = year - 1
release = 2
elif when.tm_mon > 10 or (when.tm_mon == 10 and when.tm_mday > 1):
release = 2

# only the last two digits of the year are used
year = str(year)[-2:]

return f'{year}.{release}'

def gen_url(when: time.struct_time = time.localtime()) -> str:
"""
Generates the download URL for the specified date.

Note that not all dates are archived on the website.
Normally only today is available, the export is usually made
between 02:00 and 04:00, which means before 04:00 the current data may not
yet be available and the download could fail.

Note also that this function will not be able to generate URLs for dates
before 2024 because a different URL scheme was used then which had some random
data embedded in the name to make it harder to automate downloads.
"""

version = gen_version(when)
date = time.strftime("%Y%m%d", when)

return f'https://download.marktstammdatenregister.de/Gesamtdatenexport_{date}_{version}.zip'


def download_xml_Mastr(
Expand Down Expand Up @@ -69,9 +112,21 @@ def download_xml_Mastr(
" You may want to download it another time."
)
print(print_message)
url = get_url_from_Mastr_website()

now = time.localtime()
url = gen_url(now)

time_a = time.perf_counter()
r = requests.get(url, stream=True)
if r.status_code == 404:
# presumably todays download is not ready yet, retry with yesterdays date
log.warning("Download file was not found. Assuming that the new file was not published yet and retrying with yesterday.")
now = time.localtime(time.mktime(now) - (24 * 60 * 60)) # subtract 1 day from the date
r = requests.get(url, stream=True)
if r.status_code == 404:
log.error("Could not download file: download URL not found")
return

total_length = int(18000 * 1024 * 1024)
with open(save_path, "wb") as zfile, tqdm(
desc=save_path, total=(total_length / 1024 / 1024), unit=""
Expand Down
63 changes: 35 additions & 28 deletions open_mastr/xml_download/utils_write_to_database.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@
from open_mastr.utils.helpers import data_to_include_tables
from open_mastr.utils.orm import tablename_mapping
from open_mastr.xml_download.utils_cleansing_bulk import cleanse_bulk_data
from io import StringIO


def write_mastr_xml_to_database(
Expand Down Expand Up @@ -156,7 +157,7 @@ def preprocess_table_for_writing_to_database(
try:
df = pd.read_xml(data, encoding="UTF-16", compression="zip")
except lxml.etree.XMLSyntaxError as err:
df = handle_xml_syntax_error(data, err)
df = handle_xml_syntax_error(data.decode("utf-16"), err)

df = add_zero_as_first_character_for_too_short_string(df)
df = change_column_names_to_orm_format(df, xml_tablename)
Expand Down Expand Up @@ -335,19 +336,19 @@ def add_missing_columns_to_table(
)


def delete_wrong_xml_entry(err: Error, df: pd.DataFrame) -> None:
def delete_wrong_xml_entry(err: Error, df: pd.DataFrame) -> pd.DataFrame:
delete_entry = str(err).split("«")[0].split("»")[1]
print(f"The entry {delete_entry} was deleted due to its false data type.")
df = df.replace(delete_entry, np.nan)
return df.replace(delete_entry, np.nan)


def handle_xml_syntax_error(data: bytes, err: Error) -> pd.DataFrame:
def handle_xml_syntax_error(data: str, err: Error) -> pd.DataFrame:
"""Deletes entries that cause an xml syntax error and produces DataFrame.

Parameters
-----------
data : bytes
Unzipped xml data
data : str
Decoded xml file as one string
err : ErrorMessage
Error message that appeared when trying to use pd.read_xml on invalid xml file.

Expand All @@ -356,25 +357,31 @@ def handle_xml_syntax_error(data: bytes, err: Error) -> pd.DataFrame:
df : pandas.DataFrame
DataFrame which is read from the changed xml data.
"""
wrong_char_position = int(str(err).split()[-4])
decoded_data = data.decode("utf-16")
loop_condition = True

shift = 0
while loop_condition:
evaluated_string = decoded_data[wrong_char_position + shift]
if evaluated_string == ">":
start_char = wrong_char_position + shift + 1
break
else:
shift -= 1
loop_condition_2 = True
while loop_condition_2:
evaluated_string = decoded_data[start_char]
if evaluated_string == "<":
break
else:
decoded_data = decoded_data[:start_char] + decoded_data[start_char + 1 :]
df = pd.read_xml(decoded_data)
print("One invalid xml expression was deleted.")
return df

def find_nearest_brackets(xml_string: str, position: int) -> tuple[int, int]:
left_bracket_position = xml_string.rfind(">", 0, position)
right_bracket_position = xml_string.find("<", position)
return left_bracket_position, right_bracket_position

data = data.splitlines()

for _ in range(100):
# check for maximum of 100 syntax errors, otherwise return an error
wrong_char_row, wrong_char_column = err.position
row_with_error = data[wrong_char_row - 1]

left_bracket, right_bracket = find_nearest_brackets(
row_with_error, wrong_char_column
)
data[wrong_char_row - 1] = (
row_with_error[: left_bracket + 1] + row_with_error[right_bracket:]
)
try:
print("One invalid xml expression was deleted.")
df = pd.read_xml(StringIO("\n".join(data)))
return df
except lxml.etree.XMLSyntaxError as e:
err = e
continue

raise Error("An error occured when parsing the xml file. Maybe it is corrupted?")
Loading
Loading