You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
tensionhead opened this issue
Mar 28, 2022
· 0 comments
Labels
DesignProposal to investigate/change code structure/layoutExploreExamine novel functionality/proposed changes etc. Does not necessarily involve coding things.
At the moment we have an overloaded time axis: holding both the actual samples of a trial which actually has a time axis, and the trials even for data which does not have a time axis:
The time axis of the spec object is not a real time axis, but just holds the trials, whereas for ad it does both.
This leads to an overkill of index gymnastics and therefore mental load which leads to error prone and not very readable/maintainable code. This is also reflected in our bug history, where the confusion of sample indices and trials led to numerous bugs, for example: #180, #207, #239 and #240
Proposed Solution
Using h5 Region References allows for clearly and elegantly decouple the time axis from the trials, yet still allows for the same stacking and overlapping in the backing hdf5 dataset:
importh5pymyfile=h5py.File("test", driver='core', mode='w')
myds=myfile.create_dataset('dset', (5, 2))
# define pseudo trials myds[:3] =1myds[2:] =2# indicate overlapping region myds[2:3] =12# define (overlapping) trials via region references trl1=myds.regionref[:3]
trl2=myds.regionref[2:]
Implementing this in our dataclasses would decrease code opaqueness, and vastly improve maintanability and the speed of feature additions. Also downstream from the pure dataclass implementations, streaming trials to the actual computations would be now almost trivial.
Implementation
Rather than creating these references only once, and then storing them along the actual data within the hdf containers, I suggest creating them on the fly from the trialdefinition provided by any Syncopy dataset. Something like this is anyways also happening in the current implementation to populate attributes like .trials. This would allow for a seamless integration into the current code base, and does not break compatibility to Syncopy data which is already saved somewhere.
I know that one argument for the current state is, that also without Syncopy the (raw) data is still somewhat directly accessible, IF ppl are able to parse the trialdefinition provided in the .info json file. I would argue that this isn't changing, as we still would provide the very same trialdefinitions in the respective .info files, and the actual arrays holding the data would not change a bit (pun intended) on disc compared to the current implementation :)
The text was updated successfully, but these errors were encountered:
tensionhead
added
Design
Proposal to investigate/change code structure/layout
Explore
Examine novel functionality/proposed changes etc. Does not necessarily involve coding things.
labels
Mar 28, 2022
DesignProposal to investigate/change code structure/layoutExploreExamine novel functionality/proposed changes etc. Does not necessarily involve coding things.
Current Situation
At the moment we have an overloaded
time
axis: holding both the actual samples of a trial which actually has a time axis, and the trials even for data which does not have a time axis:Now looking at the shapes we get:
The
time
axis of thespec
object is not a real time axis, but just holds the trials, whereas forad
it does both.This leads to an overkill of index gymnastics and therefore mental load which leads to error prone and not very readable/maintainable code. This is also reflected in our bug history, where the confusion of sample indices and trials led to numerous bugs, for example: #180, #207, #239 and #240
Proposed Solution
Using h5 Region References allows for clearly and elegantly decouple the time axis from the trials, yet still allows for the same stacking and overlapping in the backing hdf5 dataset:
Indexing is now straightforward:
Implementing this in our dataclasses would decrease code opaqueness, and vastly improve maintanability and the speed of feature additions. Also downstream from the pure dataclass implementations, streaming trials to the actual computations would be now almost trivial.
Implementation
Rather than creating these references only once, and then storing them along the actual data within the hdf containers, I suggest creating them on the fly from the
trialdefinition
provided by any Syncopy dataset. Something like this is anyways also happening in the current implementation to populate attributes like.trials
. This would allow for a seamless integration into the current code base, and does not break compatibility to Syncopy data which is already saved somewhere.I know that one argument for the current state is, that also without Syncopy the (raw) data is still somewhat directly accessible, IF ppl are able to parse the trialdefinition provided in the
.info
json file. I would argue that this isn't changing, as we still would provide the very sametrialdefinitions
in the respective.info
files, and the actual arrays holding the data would not change a bit (pun intended) on disc compared to the current implementation :)The text was updated successfully, but these errors were encountered: