Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add climatology features #38

Open
wants to merge 19 commits into
base: main
Choose a base branch
from
Open

Add climatology features #38

wants to merge 19 commits into from

Conversation

louisPoulain
Copy link
Collaborator

No description provided.

@louisPoulain
Copy link
Collaborator Author

Before merging we will need to update the workflow to ensure this works

@louisPoulain louisPoulain marked this pull request as ready for review November 14, 2024 10:31
@louisPoulain
Copy link
Collaborator Author

@dnerini could you have a look at the code in its current state ? I moved the calculation of the rolling mean to mlpp-features, as discussed.

Comment on lines +275 to +312
@pytest.fixture
def clim_dataset():
"""Create climatology dataset as if loaded from zarr files, still unprocessed."""

def _data():

variables = [
"cloud_area_fraction",
]

stations = _stations_dataframe()
times = pd.date_range("2000-01-01T00", "2000-01-02T00", freq="1h")

n_times = len(times)
n_stations = len(stations)

var_shape = (n_times, n_stations)
ds = xr.Dataset(
None,
coords={
"time": times,
"station": stations.index,
"longitude": ("station", stations.longitude),
"latitude": ("station", stations.latitude),
"height_masl": ("station", stations.height_masl),
"owner_id": ("station", np.random.randint(1, 5, stations.shape[0])),
"pole_height": ("station", np.random.randint(5, 15, stations.shape[0])),
"roof_height": ("station", np.zeros(stations.shape[0])),
},
)
for var in variables:
measurements = np.random.randn(*var_shape)
nan_idx = [np.random.randint(0, d, 60) for d in var_shape]
measurements[nan_idx[0], nan_idx[1]] = np.nan
ds[var] = (("time", "station"), measurements)
return ds

return _data
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we move to using the "obs" part of the variable dictionnary, we likely don't need this anymore

Comment on lines +331 to +344
try:
rolling_mean_day = (
rolling_mean_hour.where(
rolling_mean_hour["dayofyear"].isin(days_range), drop=True
)
.groupby("time.hour")
.mean()
)
except ValueError as e:
if "hour must not be empty" in str(e):
days_list.remove(day)
continue
else:
raise e
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sort of fix to pass the pytest.
It fails (without that fix) if one the days window is empty.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant