-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Isolate postprocessing data creation methods #638
Comments
Sounds good. We just need a way to trigger preprocessing steps when running calliope from the command line. |
I've been thinking about adding a It would allow the results to be cleaned/expanded in whatever way the user has configured. A question remains as to whether this should return a post-processed result array or to update the results in-place. If updating in-place, the original results would still be available in an interactive session (they are extracted from optimisation backend objects, which we could just do again). We have had problems in the past with returning a result array that is independent of a Calliope object. Namely, it makes saving the file with relevant metadata a pain. |
Hmm... could you elaborate on what those saving issues were @brynpickering ? If we make sure that postprocessing is only for data transformations, then returning arrays which can be saved seems best to me... For example, calculating capacity factors does not really alter data. It just depends on it. In my mind, these steps should always return the same postprocess result:
|
So, the issue is that if you postprocess the results and it returns an xarray dataset, that dataset won't have the config, math, or inputs attached. So you would have a dataset and can call If we kept postprocessing to just generating auxiliary variables (LCOE etc.) then a user would probably still want to save that data along with the rest of their model, for completeness, which requires that data being available within the calliope model object. |
Seems like a "pick your poison" situation... but I think we should go for a solution that is pragmatic and does not over-complicate the software. I'm pro trusting users on the best way to save stuff (and lending them a hand if they don't). Holding all the post-processed data within the model object does not feel right to me because:
How about following these requirements?
If you do not like this idea, we can go with the internal approach, but we have to be careful on how we go about it. |
I just think that an xarray dataset (which you would save to netcdf) that contains a few data variables from postprocessing (LCOE, capacity factors, maybe curtailment? #295) is a bit of a pain to deal with, when you would then have another netcdf with all the other information (inputs, direct results from the optimisation, applied math dictionary). So, by returning the result of postprocessing outside the model object, you are separating your data from each other in a way that can easily lead them to becoming out of sync (you save your data from postprocessing, then re-run the model with some changed but forget to postprocess the new results so now your postprocessed data file is out of sync with your inputs+primary results data file). Calling |
I like the idea of separation. It's clearer down the line and easier to debug. One last question, though. One of the ideas behind this PR is letting users call the post-processing functions they desire, without extras. This is to avoid bloating the dataset with stuff you might not need. |
Yeah, I'd keep a |
Ok then. Here's a summary of the requirements/discussion. Please let me know if I missed anything!
|
I agree that this makes sense, though I think the default for |
What can be improved?
At the moment,
model.solve
will automatically run several post processing methods.However, this might not be desired in some circumstances, and is a side-effect. It's much more valuable to give users the option to run these commands at request instead of forcing them.
I recommend the following:
postprocessing
so that it's a module users can call and passModel
objects to.postprocessing
methods inmodel.py
.clean_results
?) intomodel.py
.This should result in leaner code and more value for users. Let me know what you think!
Version
v0.7
The text was updated successfully, but these errors were encountered: