Rethinking timeseries dimensionality #6090
Replies: 6 comments 15 replies
-
CC @pymc-devs/dev-core |
Beta Was this translation helpful? Give feedback.
-
I don't think I really have a good overview of this yet, but just a couple of thoughts so far...
y = GRW(drift=1, shape=100)
logps = y.logp(np.random.randn(100)) What does
|
Beta Was this translation helpful? Give feedback.
-
(Drift) parameters aside, I have a very strong prior that we should stick to the BTS result shape interpretation.
This is in line with our univariate RVs by placing the support dimensions at the end. Did I understand correctly that the reason for the inconvenient need to broadcast the parameters internally is the fact that NumPy treats |
Beta Was this translation helpful? Give feedback.
-
From a pure programming point of view, since most of the more complex time series model rely on |
Beta Was this translation helpful? Give feedback.
-
Do we want to consider what a mixture of timeseries would mean? That could give strong opinions on TBS vs BTS and whether we should consider time core or not. If we can mix within time, it must not be core, otherwise we can only mix across the whole timeseries. Note that TBS really becomes problematic if time is core, because there's an implicit contract that core dimensions are always to the right. Although there was some discussion at Aesara of allowing this to be parametrized (for different reasons): aesara-devs/aesara#1040 (comment) |
Beta Was this translation helpful? Give feedback.
-
I am not sure how relevant to this specific discussion this is, I would guess it isn't, but given the issue has been closed and I have been tagged, here are my five cents on pointwise logps for timeseries. The initial post defines "batch", "time" and "support" dimensions. There is also mention to "core" dimensions which I am not completely sure I understand. To me core dimensions are purely conceptual and model dependent; mostly related to how we do model comparison. A timeseries unrelated example. We could have a In time series, the time dimension can be a MvNormal or a GP, but it can also be a AR or even a simple linear regression. In time series the logical cv strategy is leave future out, which can't be approximated as well with PSIS, but it we are able to compute the pointwise log likelihood conditioned on past data we can estimate LFO with say 4 refits instead of the 200 that would be needed for brute force LFO-CV. Reference: https://doi.org/10.1080/00949655.2020.1783262 (Table 1 has empirical results about refits needed in multiple cases). I want for this data to be easily accessible (maybe even retrieved directly by the converter to inferencedata), because even if there are many cases where this can't be used, there are many were it can and it allows for much faster and cheaper results. |
Beta Was this translation helpful? Give feedback.
-
If a single univariate gaussian random walk (GRW) with 100 time steps has a shape of
(100,)
, what is the shape of three such GRWs?(3, 100)
or(100, 3)
? PyMC (V4 at least) says(3, 100)
.What about a 2D multivariate gaussian random walk (MvGRW)? I assume it would have a shape of
(100, 2)
. And three of them?(100, 3, 2)
or(3, 100, 2)
? I think PyMC right now would say(3, 100, 2)
, but we haven't refactored it yet.Let's abbreviate the current PyMC approach as BTS (batched dimensions, time dimension, support dimensions) and the alternative one as TBS (time dimension, batched dimensions, support dimensions).
Why does it matter? Due to broadcasting to the left BTS leads one to consider the time dimension as a support dimension, which means parameters cannot change over time. Let's see why.
The univariate case
Just to refresh, the simplest case is
We can create batches by doing
Note that
[0, 1, 2]
can't broadcast to(3, 100)
naturally, but because we consider time a core dimension we never attempt to do that. Under the hood we add a degenerate dimension to the right so that drift has shape(3, 1)
pymc/pymc/distributions/timeseries.py
Lines 209 to 213 in 8f02bea
And we have to do that everywhere...
pymc/pymc/distributions/timeseries.py
Lines 299 to 301 in 8f02bea
pymc/pymc/distributions/timeseries.py
Lines 318 to 320 in 8f02bea
Why don't we let parameters change over time? If the following worked
Then it would make batching with constant parameters cumbersome, or at least less intuitive. To obtain the same result as in the batching example above, the user would now need to manually add the degenerate dimension themselves
What about TBS? Due to broadcasting automatically to the left, both cases would be relatively intuitive
The multivariate case
Spoiler: it's the same as in the univariate case, you can skip this section if multivariates don't confuse you
The advantage of TBS over BTS is similar in the MvGRW case.
Again the simplest case (which, for the user, is the same in BTS and TBS) is
Assuming we don't allow parameters to change over the time dimension, creating batches in BTS is relatively intuitive.
Note again, that under the hood we have to add a degenerate dimension at
axis=-2
, to be able to take(3, 100, 2)
draws.Instead, if we were to allow parameters to change over time like this
Users would need to manually add the degenerate dimension for batching with constant drift
In contrast, in TBS the two cases would simply look like
This applies to all timeseries
This distinction is also important for other timeseries. For instance, in
AR
we use Scan which naturally accumulates results to the left. In order to keep BTS, we have to shuffle and add degenerate dimensions all over the place.pymc/pymc/distributions/timeseries.py
Lines 558 to 567 in 8f02bea
pymc/pymc/distributions/timeseries.py
Line 641 in 8f02bea
pymc/pymc/distributions/timeseries.py
Lines 622 to 632 in 8f02bea
With TBS, we could again allow for parameters (rho, sigma) to change over time without making batching with constant parameters more cumbersome.
Logp considerations
A slightly more technical consequence of BTS, when we consider time a support dimension (again to avoid cumbersome batching with constant parameters), is that we must also collapse the logp across that dimension. Otherwise, other distributions that rely on this contract like Mixture would eventually fail.
But this means we never have access to the logp per time step, which may be useful for comparison of timeseries models.
This also means we cannot simply use the logp derived automatically by Aeppl for timeseries graphs. If we followed TBS, the following hack in #6072 would not be needed:
Steps parameter
Not so important, feel free to skip
The steps argument, on its own, is useless in TBS. We would not know if it is supposed to match the first dimension of the parameters or if it is supposed to batch them. If it always matched the first dimension, batching with constant parameters would become cumbersome in TBS, because users would to add a degenerate dimension again. If it never matched we would lose support for time-varying parameters. In order to safely interpret steps, we need to know the explict
shape
/size
, in which case we no longer need steps since it's simplyshape[0]
/size[0]
.Conclusion
TBS has some advantages but the downside that 3 GRWs of 100 steps would now have a shape of
(100, 3)
, and 3 2D-MvGRW of 100 steps would have shape(100, 3, 2)
. @aseyboldt suggests it may also be less performant when computing the logp due to memory contiguity. @junpenglao mentions that scan based timeseries will always need inputs to be TBS internally anyway.Otherwise we could keep using BTS but force users to always define the time dimension in the parameters (i.e. cumbersome/error-prone [citation needed] batching of constant parameters). That sounds reasonable as well but it will hurt users at first. @aseyboldt thinks this is intuitive. I have changed my preference for this approach.
Or just keep BTS without the possibility of defining time-varying parameters like we do now. Sounds unnecessarily restrictive.
What do you think?
This was brought up in #5741, #5972 and #6072 (comment)
Beta Was this translation helpful? Give feedback.
All reactions