-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Nexus transforms improvements #126
Conversation
May need revisting later, but currently this is too much guesswork.
16955ab
to
853544d
Compare
@@ -192,10 +192,10 @@ class Filename(sciline.Scope[RunType, Path], Path): ... | |||
|
|||
|
|||
@dataclass | |||
class PulseSelection(Generic[RunType]): | |||
class TimeInterval(Generic[RunType]): | |||
"""Range of neutron pulses to load from NXevent_data or NXdata groups.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only pulses or also logs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the logs in NXtransformations it is loading the full log. The time interval is later used to determine which values are relevant.
The workflow is not loading any other logs currently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But is your intention to use TimeInterval
for other logs, too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really at the moment, since the label-based slicing in Scipp and ScippNeXus does not do what we need (include the previous value). We thus want to load "more" than the naive slice says. Unless we get very large logs it seems easier to just load everything, and then move events to log values.
src/ess/reduce/nexus/workflow.py
Outdated
# "end" time in the files. We add a dummy end so we can use Scipp's label- | ||
# based indexing for histogram data. | ||
time = t.value.coords['time'] | ||
delta = sc.scalar(86_400_000, unit='s', dtype='int64').to(unit=time.unit) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where does this number come from? Can't you just use np.iinfo('int64').max
as the last value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that is tricky, since we don't know the input dtype (could be signed or unsigned, or a datetime). There probably is a way (can you think of a simple one?), but just adding 1000 days seemed "safe".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is safe enough. I was more surprised by the concrete number and wondered whether it has some significance because it is not simply 10**10
or something like that.
It one or more transformations in the chain are time-dependent, the time interval | ||
is used to select a specific time point. If the interval is not a single time point, | ||
an error is raised. This may be extended in the future to a more sophisticated | ||
mechanism, e.g., averaging over the interval to remove noise. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The last sentence is not really usage documentation. If you want to track work on this, I would say it should be an issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel it kind of is usage documentation: Someone will look for a way of processing the time-series, and this tells them it is not implemented.
# If the NXdetector in the file is not 1-D, we want to match the order of dims. | ||
# zip_pixel_offsets otherwise yields a vector with dimensions in the order given | ||
# by the x/y/z offsets. | ||
offsets = snx.zip_pixel_offsets(da.coords).transpose(da.dims).copy() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why copy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I felt there was little to lose, whereas we still run into some Scipp operations that do not handle non-contiguous data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I see copy()
somewhere, my assumption is that it has some significance. E.g., that the result will be modified in-place. So I went looking but didn't find anything.
Essentially, it increases 'noise' for the reader. But leave it or remove it, whichever you prefer.
This collects a number of small necessary improvements I ran into when trying to use
GenericNeXusWorkflow
on NMX files. I recommend looking at the individual commit messages.Related: #96 (solving the simplest case).