Write data cube guide #26

m-mohr · 2021-01-06T14:03:57Z

It became obvious several times in openEO history that people often are not aware of how data cubes and their methods (reduce, apply, ...) work. So I was thinking that a guide how to work with data cubes would help the understanding, step by step with examples.

Discussion in Open-EO/openeo-processes#215 (comment) have shown that the document should say that it's usually not a good idea to change data types in apply/reduce/... and probably also list other pitfalls and potential limitations.

jonathom · 2021-01-09T09:32:16Z

In the search for a good visual representation, here are some first ideas:

I like the way things are displayed in the R stars package:

It also holds a good representation of what vector cubes are:

(as mentioned before, images taken from https://r-spatial.github.io/stars/)

I have another idea that, in my view, is able to explain the sort of data that is held in DataCubes and therefore can show that DataCubes are n-dimensional (here: time, 3 bands and x, y). A first sketch (ignoring the structure on the bottom right):

Obviously this needs some improvement, e.g. could the raster be displayed as shown above, and the earth's surface be depictd in more detail.

jonathom · 2021-01-11T08:16:12Z

Another possibility: Display as an actual cube, have z = time dimension and indicate different bands. However people might take the "Cube" too literally (as DataCubes can also contain 3, 5, etc. bands). With this it might be easier to graphically represent the cube operations.

edzer · 2021-01-11T08:58:03Z

Great sketches! That last one suggests that B2, B3, B4 and B8 are distributed over two dimensions, which is not very intuitive IMO, but showing that dimensions can be exchanged makes some sense. I put the R scripts that generated above figures at https://gist.github.com/edzer/5f1b0faa3e93073784e01d5a4bb60eca

m-mohr · 2021-01-11T09:18:21Z

Yeah, I think your first sketch works very well with some more details. Spatial are x and y, z is the bands and could be visualized with different colors (e.g. different shades of the color per pixel, one band red, one band green, one band blue) and then have each timestamp be part of your timeline.

Vector cubes in openEO are not really a thing at the moment so we could skip that part for now, but if you have good ideas, feel free to write them down anyway and we can have them in a separate markdown file for now.

jonathom · 2021-01-20T19:36:50Z

Thank you for the feedback @m-mohr and also for the code @edzer, here's a first implementation of the idea:

Still missing a representation of the surface (also not 100% sure if needed).

Please let me know any feedback.
Sketches and/or graphs on the processes will follow.

m-mohr · 2021-01-21T11:11:46Z

I like that a lot, well done! Could you change the pink color to yellow or so? I find it hard to distinguish from the red above... or change the order of the colors to not have red and pink directly after each other.

jonathom · 2021-02-26T18:25:01Z

This is a figure representing temporal resampling. I decided to not represent the resampling process itself (calculation of new time steps). Let me know if you disagree.
I have a question regarding the date "2020-09-28" in the upsampling process: I am guessing that the resulting datacube just doesn't contain an image for dates that lie before the first date of the original cube. Is that correct? Would it be appropriate to delete the entry for "2020-09-28" on the timeline for the "output" (but keeping it at the "resample" timeline to show the difference)?

m-mohr · 2021-03-01T11:15:55Z

Whether 2020-09-28 has data or not depends on the upsampling method you use. Would it make sense to just remove the empty timestamp as indeed it would likely not be in the resulting data cube (or at least would be there with no-data).

I think I'm fine with not giving more details on the resampling, but maybe it's easier to understand if you change the label "resample" to "resample to"?

All the images look the same, which may confuse some, but overall I like the image. 👍

jonathom · 2021-03-01T11:56:54Z

it would likely not be in the resulting data cube (or at least would be there with no-data).

Yeah, this is the tricky part because I think if it's there with no data, then the current image is exactly right. But if this is dependent on the resampling function I will delete the point for the first date, it's more intuitive.

"resample" to "resample to"?

sure! good idea.

All the images look the same

Yes, I will change this. Downsampling method will then be "mean" if that's alright. EDIT: things won't look so different then I'm afraid. Ideas to change that?

2nd EDIT: input is actually already displaying different time steps. Is the difference too subtle at this scale?

jonathom · 2021-03-01T12:09:18Z

like so

m-mohr · 2021-03-01T12:55:21Z

Yeah, I now see that there's a subtle difference, but you need to look very closely to figure it out. Not sure whether that is actually an issue though. I guess we can leave it as it is for now. Changes in times series are often pretty subtle...

Other than that, the image looks good to me, thanks! 👍

jonathom · 2021-03-05T16:42:09Z

I have some questions about the spatial aggregation processes:

The specification currently states that only a 3D cube (x,y + one other) can be processed. The topic is also discussed here: openeo/processes#126. Is this expected to change at some point? I would favor leaving this restriction out of the graphic if that's alright.
Just our of curiosity, I don't really get what exactly aggregate_spatial_binary is doing. Instead of a list it only gets passed two values. Which two values and what's the advantage of that?

Regarding the previous discussion

Changes in times series are often pretty subtle...

Indeed. I think that in most graphics these very subtle changes are ok (as you say, we can always change that later on). They also result from the fact that breaks are set automatically for each raster. In the case where this is important (apply graphics, looking at single pixel values), I manually set breaks (so far only for third graphic).

edit @m-mohr

m-mohr · 2021-03-09T16:46:46Z

Is this expected to change at some point?

Not sure. I think not in the next 6 months at least.

I would favor leaving this restriction out of the graphic if that's alright.

Yes, I think that is fine for me.

Just our of curiosity, I don't really get what exactly aggregate_spatial_binary is doing.

It is basically the same, just the way it reduces the values is different.

Instead of a list it only gets passed two values. Which two values and what's the advantage of that?

binary uses a reducer (see e.g. the JS reduce operation) which works on two values, which allows reducing of very large lists that would otherwise exceed the memory. The list variant (i.e. non-binary) works on a list directly. So it's mostly a thing to optimize the operation for very large data.

m-mohr · 2021-04-12T13:59:38Z

@jonathom In this thread Open-EO/openeo-processes#215 (comment) we discussed that we should add some guidance that data cubes (child) processes should be careful with data type changes. Like if it gets an array of numbers in a reducer, should also return a number and not e.g. a string or array. Could you add that somewhere in the general data cube descriptions, please? cc for review: @soxofaan

jonathom · 2021-04-12T15:08:08Z

@m-mohr I'm not entirely sure if I understand what's going on, so let's discuss in next meeting. First thought: Maybe this is something for the cookbook (#16), since it is much more "how to do" instead of "how does it work"? Also, the cookbook could then just have a whole first section dedicated to "how to work with datacubes" to be a further reference after the datacube guide (not only because of this, just generally).

soxofaan · 2021-04-13T07:58:51Z

Nice diagrams!

Some feedback/ideas:

on the downsample part: the resample to of "2020-10-29" results in an output for "2010-10-30". I guess this is a typo?
about the very subtle differences between cubes at different time stamps: maybe you could add a clouded area in the middle input?
I would add a bit more space between the band layers, it will make the structure a bit more legible I think (especially for small sizes)
It looks a bit weird that there is no slice for 2020-09-28 in the upsample example. I understand that this depends on the upsampling technique, but I would ignore that implementation detail for the diagram. The diagram itself, without background info looks broken now.
In the current diagram there is room to use full titles "Temporal Downsampling" and "Temporal Upsampling". It's probably nitpicking but the pyramid shape might suggest to some people that there is also spatial down/upsampling going on otherwise,

jonathom · 2021-04-13T11:21:44Z

Thank you for the feedback @soxofaan! The datacube guide with much more graphics is already online and a version with some of your corrections (type, title change) can be seen here. I'd be happy if you want to have a look and leave some more feedback!

Regarding two points from above:

a clouded area is a good idea, however I think a lot of operations that are explained here wouldn't be executed on non-ARD. Might be confusing then.
space between the layers: Because the graphics are about different sampling processes, visibility of the single cubes isn't in focus in these graphics. However if you think other graphics in the datacube guide could use space / enlargement etc., let me know!

soxofaan · 2021-04-13T12:40:21Z

these online docs look very pretty, nice improvement!

m-mohr · 2021-04-14T08:27:33Z

@jonathom We also forgot to remove the Data Cube desction from the glossary: https://openeo.org/documentation/1.0/glossary.html

Another thing we should talk about in the "Dimensions" section is that the dimensions can have special characteristics, e.g. spatial and temporal are expected to have a natural order, temporal are by default Gregorian calendar, ...

jonathom · 2021-04-15T09:53:45Z

We also forgot to remove the Data Cube desction from the glossary

done, collecting these fixes in branch "dcguide". I added the old glossary datacube md in datacubes/.scripts for later reference.

additional note to myself: also forgot to talk about crs as dimension, as in old glossary

m-mohr · 2021-04-15T11:25:05Z

I added the old glossary datacube md in datacubes/.scripts for later reference.

I don't think this is required, we have version control for this. Let's discuss later

m-mohr · 2021-04-20T08:57:52Z

This is all done, right @jonathom ? Feel free to close then.

clausmichele · 2021-06-03T07:38:43Z

Thank you for the feedback @soxofaan! The datacube guide with much more graphics is already online and a version with some of your corrections (type, title change) can be seen here. I'd be happy if you want to have a look and leave some more feedback!

Regarding two points from above:
* a clouded area is a good idea, however I think a lot of operations that are explained here wouldn't be executed on non-ARD. Might be confusing then.

* space between the layers: Because the graphics are about different sampling processes, visibility of the single cubes isn't in focus in these graphics. However if you think other graphics in the datacube guide could use space / enlargement etc., let me know!

Really nice guide! I've just seen it and it will be super useful for many others.

m-mohr mentioned this issue Jan 6, 2021

Adds return value schemas for child processes Open-EO/openeo-processes#215

Merged

m-mohr assigned m-mohr and jonathom Jan 8, 2021

m-mohr linked a pull request Mar 11, 2021 that will close this issue

Data Cube Documentation ✨ #31

Merged

4 tasks

m-mohr closed this as completed in #31 Mar 30, 2021

m-mohr reopened this Apr 12, 2021

m-mohr removed their assignment Apr 13, 2021

m-mohr closed this as completed Apr 21, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Write data cube guide #26

Write data cube guide #26

m-mohr commented Jan 6, 2021

jonathom commented Jan 9, 2021

jonathom commented Jan 11, 2021 •

edited by edzer

Loading

edzer commented Jan 11, 2021

m-mohr commented Jan 11, 2021

jonathom commented Jan 20, 2021

m-mohr commented Jan 21, 2021 •

edited

Loading

jonathom commented Feb 26, 2021

m-mohr commented Mar 1, 2021 •

edited

Loading

jonathom commented Mar 1, 2021 •

edited

Loading

jonathom commented Mar 1, 2021

m-mohr commented Mar 1, 2021 •

edited

Loading

jonathom commented Mar 5, 2021 •

edited

Loading

m-mohr commented Mar 9, 2021

m-mohr commented Apr 12, 2021

jonathom commented Apr 12, 2021

soxofaan commented Apr 13, 2021

jonathom commented Apr 13, 2021

soxofaan commented Apr 13, 2021

m-mohr commented Apr 14, 2021

jonathom commented Apr 15, 2021

m-mohr commented Apr 15, 2021

m-mohr commented Apr 20, 2021 •

edited

Loading

clausmichele commented Jun 3, 2021

Write data cube guide #26

Write data cube guide #26

Comments

m-mohr commented Jan 6, 2021

jonathom commented Jan 9, 2021

jonathom commented Jan 11, 2021 • edited by edzer Loading

edzer commented Jan 11, 2021

m-mohr commented Jan 11, 2021

jonathom commented Jan 20, 2021

m-mohr commented Jan 21, 2021 • edited Loading

jonathom commented Feb 26, 2021

m-mohr commented Mar 1, 2021 • edited Loading

jonathom commented Mar 1, 2021 • edited Loading

jonathom commented Mar 1, 2021

m-mohr commented Mar 1, 2021 • edited Loading

jonathom commented Mar 5, 2021 • edited Loading

m-mohr commented Mar 9, 2021

m-mohr commented Apr 12, 2021

jonathom commented Apr 12, 2021

soxofaan commented Apr 13, 2021

jonathom commented Apr 13, 2021

soxofaan commented Apr 13, 2021

m-mohr commented Apr 14, 2021

jonathom commented Apr 15, 2021

m-mohr commented Apr 15, 2021

m-mohr commented Apr 20, 2021 • edited Loading

clausmichele commented Jun 3, 2021

jonathom commented Jan 11, 2021 •

edited by edzer

Loading

m-mohr commented Jan 21, 2021 •

edited

Loading

m-mohr commented Mar 1, 2021 •

edited

Loading

jonathom commented Mar 1, 2021 •

edited

Loading

m-mohr commented Mar 1, 2021 •

edited

Loading

jonathom commented Mar 5, 2021 •

edited

Loading

m-mohr commented Apr 20, 2021 •

edited

Loading