-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use of native sparse array support in xarray / pandas / netCDF #634
Comments
We've investigated sparse arrays a few times in the past. The problem is the data types that can be handled sparsely. If it can now handle pyomo/gurobi objects then that's great! |
Can you elaborate on the incompatibility? |
I did some tests on this locally and it doesn't work out-of-the-box. Operations between sparse arrays don't work when the data types are objects, e.g.: import calliope
import xarray
m = calliope.examples.urban_scale()
m.build(force=True)
# you can't transform an existing array into sparse, so this is a quick hack.
foo = xr.DataArray.from_series(m.backend.variables.flow_cap.to_series(), sparse=True)
bar = xr.DataArray.from_series(m.backend.parameters.flow_cap_max.to_series(), sparse=True)
foo * bar
[Out] TypeError: Implicit conversion of Pyomo numeric value (parameters[flow_cap_max][12]*variables[flow_cap][0]) to float is disabled.
This error is often the result of using Pyomo components as arguments to
one of the Python built-in math module functions when defining
expressions. Avoid this error by using Pyomo-provided math functions or
explicitly resolving the numeric value using the Pyomo value() function. I've also tried setting the fill value of File ~/miniforge3/envs/calliope/lib/python3.12/site-packages/sparse/_umath.py:542, in _Elemwise._get_fill_value(self)
540 # Store dtype separately if needed.
541 if self.dtype is not None:
--> 542 fill_value = fill_value.astype(self.dtype)
544 self.fill_value = fill_value
545 self.dtype = self.fill_value.dtype
AttributeError: 'float' object has no attribute 'astype' File ~/miniforge3/envs/calliope/lib/python3.12/site-packages/sparse/_umath.py:524, in _Elemwise._get_fill_value(self)
521 fill_value_array = self.func(*np.broadcast_arrays(*zero_args), **self.kwargs)
523 try:
--> 524 fill_value = fill_value_array[(0,) * fill_value_array.ndim]
525 except IndexError:
526 zero_args = tuple(
527 arg.fill_value if isinstance(arg, COO) else _zero_of_dtype(arg.dtype) for arg in self.args
528 )
AttributeError: 'float' object has no attribute 'ndim' File ~/miniforge3/envs/calliope/lib/python3.12/site-packages/sparse/_umath.py:545, in _Elemwise._get_fill_value(self)
542 fill_value = fill_value.astype(self.dtype)
544 self.fill_value = fill_value
--> 545 self.dtype = self.fill_value.dtype
AttributeError: 'float' object has no attribute 'dtype' It works fine for operations on purely numeric data, just not when working with object arrays. |
@brynpickering after messing around with the backend, I see what you mean... this one will be tough. The big issue here is that Pyomo does support sparsity natively... but I do not yet know the backend well enough to know how/what needs to change. Algorithmically, a solution should be possible if we have full determinism when flattening matrixes. i.e., we always know the following:
Not sure if we currently have this, but it should allow us to lazily fill in sparse vectors and then drop them to the backend. At least in theory... |
We're already effectively using dense arrays as far as Pyomo is concerned. It's the application of operations across N dimensions (incl. broadcasting capabilities) that we benefit from by also representing those objects in NaN-filled arrays. We came from this full determinism in v0.6 to what we have now because it is very messy to ensure this in a generalised way, especially when math components don't share the exact same dimensions. |
Yep, I understand that the issue is not in the backend. The increase in memory will only happen on our side... import itertools
dims = ["techs", "nodes", "steps", "carriers", "foo", "bar", "perrito"]
all_unique_combinations_sorted = set()
for i in range(len(dims)):
for group in itertools.combinations(dims, i+1):
all_unique_combinations_sorted.add(".".join(sorted(group)).lower())
all_unique_combinations_sorted.add(["GLOBALS"]) # no idea if this is even needed This contains all possible combinations of our dimensions, always in order (127 + Lookups are easy: just sort and join the dimensions you want. Similarly, the large number of keys does not matter too much. Sparse data has little memory impact by design, and you can easily erase empty combinations if you wish. Would this work, or am I saying something silly? |
I'm closing this as not an issue that we plan to address, given how small our datasets are (even with all their NaNs) compared to peak memory consumption in the optimisation step (see this comment). |
@brynpickering I'm ok in closing this. Some closing remarks: my worry actually relates to the size of the optimisation problem.
However, I should've done more checks on this, so I will only re-open this if I identify it as a problem that reaches the actual optimisation. |
What can be improved?
Calliope should be more memory efficient! <- Finally this one is applicable
While checking how to better support sparsity in our ecosystem, I found about
sparse
. It is quite literally focused on the common use-case of super sparse data in ESOMs.After some investigation, it seems like
xarray
either has, or is planning to, roll out support for this library: pydata/xarray#3213.pandas
seems to have also rolled out support.I propose to evaluate how to integrate this into
calliope
, with two key design goals in mind:xarray
setups.dim
itself in our constraints for cases were they are ordered. Two use cases (for pathways):Version
v0.7
The text was updated successfully, but these errors were encountered: