-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FR] Estimations of ε
[$100]
#67
Comments
Yes, we were also already discussing it in our group. We have some ideas and we can share them with you if you like.
… Am 26.01.2019 um 15:53 schrieb George Datseris ***@***.***>:
Would be great if we had a function with various methods to estimate ε , the threshold of a recurrence plot.
This could work similarly with estimate_delay which gives delay times for timeseries.
Different methods are described in the book from N. Marwan and citations within, see chapter 1 (we cite the book in most docs). Some of these methods are actually very easy and straightforward and would be a great issue for newcomer contributor.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub <#67>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AA7WkNx9HQle89RxGN_GpVENH8dafiL0ks5vHGvwgaJpZM4aUNLp>.
|
Awesome, please do so! |
This issue now has a 100$ bounty on it. (Which means anyone who solves it gets the 100$) |
Note: The https://aip.scitation.org/doi/10.1063/1.5024914 paper is extremely wordy, and basically says that "counting the shortest N% of distances in the recurrence plot as recurring" is a good way to ensure some Recurrence Quantification Analysis less sensitive to embedding dimensions. That is, keeping recurrence rate constant as embedding dimensions increase allows other RQA measures to be stabler. |
I read chapter 1 of Webber and Marwan. It seems that this question is solved? Just pass kwarg to the integrator, and it's done. The only notable Thus possibly this bounty is outdated. |
@yuxiliu1995 can you please give more detail? What we are talking about here is having a method (a function) that you give in your dataset and then function returns an optimal value for |
I read Chapter 1 of Marwan, https://aip.scitation.org/doi/10.1063/1.5024914 , as well as many cited papers. It seems there is no good way to do this other than to try a few and see what works. It is just a collection of gut feelings, case studies, and appeal to numerical simulation. Not only is it unmathematical, it is also simple. There are only a few methods for choosing ε:
Relevant parts of Chapter 1 of Marwan:
Nothing new here. |
The first five methods are easily implemented in
Of these, The std and median are easy to implement like function _computedistances(x, y, metric::Metric)
if x===y
distlist = Vector{Real}(undef, length(x)*(length(x)-1)/2)
index = 1
@inbounds for i in 1:length(x)-1, j=(i+1):length(x)
distlist[index] = evaluate(metric, x[i], y[j])
index += 1
end
else
distlist = Vector{Real}(undef, length(x)*length(y))
index = 1
@inbounds for xi in x, yj in y
distlist[index] += evaluate(metric, xi, yj)
index += 1
end
end
return distlist
end
function _computescale(scale::typeof(var), x, y, metric::Metric)
return Statistics.var(_computedistances(x, y, metric))
end
function _computescale(scale::typeof(median), x, y, metric::Metric)
return Statistics.mean(_computedistances(x, y, metric))
end
This one requires the user to know how their measurement device is. It cannot be done in general. However, suppose the user can assure the program that the noise in measurement device is significantly bigger than the evolution of the system over a short timescale, the program can then use a high-pass filter to get what's presumably pure noise from the measurement error, then calculate
This one can be implemented in a standalone function. The "significant peaks" might be found by this peak finding algorithm.
This might be a bit risky to automate, but if the user wants to use this method, it would be useful to have a function that draws the |
I just implemented the last two, but they are very slow and don't improve (in fact, they give worse epsilons). I don't think they are worth it, but if you want, I'll put them into the pull commit too. |
@pucicu are there any other methods we should consider here, besides what @yuxiliu1995 already wrote? @yuxiliu1995 yes, include everything and put a comment for methods that you think are not good enough after you've tested them. |
Yuxi, thanks for summarizing all these methods.
First, I would like to emphasize that the selection of the threshold depends on the research question. There is no “perfect rule” valid for everything. For example, whereas the estimation of dynamical invariants requires a very small threshold (but also very long time series), the reconstruction of a time series from the recurrence matrix or the creation of twin surrogates require much larger thresholds.
@ALL, I suggest to not include all of these methods. The most suitable method for most of the research questions is using the quantile of the distance matrix. This is the same as pre-selecting the recurrence rate. In particular when working with sliding windows, this approach is the preferred one.
The next-to-last method in Yuxi’s list ("ε = argmin |1 − Np(ε)/Nn(ε)|,”) is also worth to be implemented.
The methods using the diameter or the loglog-plot of RR vs. ε are not so optimal. The first one is too sensitive to outliers. The loglog-plot method can not be really justified, why this criterion would make sense. A similar approach using the turning point of RR vs. ε was suggested, but critically reviewd by
R. V. Donner et al, Ambiguities in recurrence-based complex network representations of time series, Physical Review E, 81, 015101(R)p. (2010). DOI:10.1103/PhysRevE.81.015101 <http://doi.org/10.1103/PhysRevE.81.015101>
An alternative and still missing method in Yuxi's list is the one we published here:
D. Eroglu et al, Finding recurrence networks' threshold adaptively for a specific time series, Nonlinear Processes in Geophysics, 21, 1085–1092p. (2014). DOI:10.5194/npg-21-1085-2014 <http://doi.org/10.5194/npg-21-1085-2014>
Although the focus is on recurrence networks, this approach gives the best threshold for reconstruction of a time series from the recurrence matrix (because the reconstruction is based on a graph).
I hope this helps.
… Am 12.02.2020 um 15:58 schrieb George Datseris ***@***.***>:
@pucicu <https://github.com/pucicu> are there any other methods we should consider here, besides what @yuxiliu1995 <https://github.com/yuxiliu1995> already wrote?
@yuxiliu1995 <https://github.com/yuxiliu1995> yes, include everything and put a comment for methods that you think are not good enough after you've tested them.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub <#67?email_source=notifications&email_token=AAHNNEHQN5Y2CTXPZJSNANLRCQFBBA5CNFSM4GSQ2LU2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOELRCIJA#issuecomment-585245732>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAHNNEGKPSIAVH2IPVV2ROLRCQFBBANCNFSM4GSQ2LUQ>.
|
The next-to-the-last one would be really helpful. Most of the others methods can be easily implemented by the user of the package, but this one would be more difficult. Therefore, if the package could provide this, it would be great. |
Some work on this has started in #89 , anyone willing feel free to take over. |
Would be great if we had a function with various methods to estimate
ε
, the threshold of a recurrence plot.This could work similarly with
estimate_delay
which gives delay times for timeseries.Different methods are described in the book from N. Marwan and citations within, see chapter 1 (we cite the book in most docs). Some of these methods are actually very easy and straightforward and would be a great issue for newcomer contributor.
There is a $100 open bounty on this issue. Add to the bounty at Bountysource.
The text was updated successfully, but these errors were encountered: