Skip to content

Commit

Permalink
Feature - CFD-Autoperiod (#16)
Browse files Browse the repository at this point in the history
* Create CFDAutoperiod as a copy of Autoperiod and implement period hints density clustering

* Add tests to CFDAutoperiod

* Implement most of the periodicity hint validation, simplify detrending options, and tweak periodogram calls

* Finish the preliminary implementation of CFD-Autoperiod and update related tests

* Update CFDAutoperiod lowpass filter application logic and add more docstrings

* Update Autoperiod tests

* Refactor power throshold computation logic to tools package

* Update CFDAutoperiod tests

* Update CFDAutoperiod tests yet again

* Improve CFDAutoperiod performance significantly

* Improve CFDAutoperiod performance even further!

* Clean CFDAutoperiod a little further

* Remove some tests for Autoperiod

* Replace SciPy's linear regression with fitting a NumPy first degree Polynomial to improve performance

* Tweak some docstrings

* Improve Autoperiod performance even further!

* Fix indexing in Autoperiod hint validation

* Tweak ACF and FFT periodicity detector tests

* Simplify detrending options in ACFPeriodicityDetector and FFTPeriodicityDetector

* Clean tools code

* Remove unused code

* Address an possible edge case in hint clustering and make small tweaks in CFDAutoperiod

* Add more tests

* Update README

* Improve tests efficiency through the use of fixtures to load the data
  • Loading branch information
iskandergaba authored Oct 11, 2024
1 parent 23a8b21 commit 879641c
Show file tree
Hide file tree
Showing 14 changed files with 727 additions and 339 deletions.
31 changes: 13 additions & 18 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,8 @@
## About Pyriodicity
Pyriodicity provides intuitive and easy-to-use Python implementation for periodicity (seasonality) detection in univariate time series. Pyriodicity supports the following detection methods:
- [Autocorrelation Function (ACF)](https://otexts.com/fpp3/acf.html)
- [Autoperiod]( https://doi.org/10.1137/1.9781611972757.40)
- [Autoperiod](https://doi.org/10.1137/1.9781611972757.40)
- [CFD-Autoperiod](https://doi.org/10.1007/978-3-030-39098-3_4)
- [Fast Fourier Transform (FFT)](https://otexts.com/fpp3/useful-predictors.html#fourier-series)

## Installation
Expand All @@ -23,35 +24,28 @@ pip install pyriodicity
```

## Example
Start by loading a the `co2` timeseries emissions sample data from [`statsmodels`](https://www.statsmodels.org)
Start by loading a the `co2` emissions sample time series data from [`statsmodels`](https://www.statsmodels.org)
```python
from statsmodels.datasets import co2
data = co2.load().data
>>> from statsmodels.datasets import co2
>>> data = co2.load().data
```

You can then resample the data to whatever frequency you want. In this example, we downsample the data to a monthly frequency
```python
data = data.resample("ME").mean().ffill()
>>> data = data.resample("ME").mean().ffill()
```

Use `Autoperiod` to find the list of periods based in this data (if any).
```python
from pyriodicity import Autoperiod
autoperiod = Autoperiod(data)
periods = autoperiod.fit()
>>> from pyriodicity import Autoperiod
>>> autoperiod = Autoperiod(data)
>>> autoperiod.fit()
array([12])
```

There are multiple parameters you can play with should you wish to. For example, you can specify a lower percentile value for a more lenient detection
```python
autoperiod.fit(percentile=90)
```

Or increase the number of random data permutations for a better power threshold estimation
```python
autoperiod.fit(k=300)
```
The detected periodicity length is 12 which suggests a strong yearly seasonality given that the data has a monthly frequency.

Alternatively, you can use other periodicity detection methods such as `ACFPeriodicityDetector` and `FFTPeriodicityDetector` and compare results and performances.
You can also use `CFDAutoperiod` variant of `Autoperiod` or any other supported periodicity detection method such as `ACFPeriodicityDetector` and `FFTPeriodicityDetector` and compare results and performances.

## Development Environment Setup
This project is built and published using [Poetry](https://python-poetry.org). To setup a development environment for this project you can follow these steps:
Expand Down Expand Up @@ -85,3 +79,4 @@ poetry export --with test --output requirements-dev.txt
## References
- [1] Hyndman, R.J., & Athanasopoulos, G. (2021) Forecasting: principles and practice, 3rd edition, OTexts: Melbourne, Australia. [OTexts.com/fpp3](https://otexts.com/fpp3). Accessed on 09-15-2024.
- [2] Vlachos, M., Yu, P., & Castelli, V. (2005). On periodicity detection and Structural Periodic similarity. Proceedings of the 2005 SIAM International Conference on Data Mining. [doi.org/10.1137/1.9781611972757.40](https://doi.org/10.1137/1.9781611972757.40).
- [3] Puech, T., Boussard, M., D'Amato, A., & Millerand, G. (2020). A fully automated periodicity detection in time series. In Advanced Analytics and Learning on Temporal Data: 4th ECML PKDD Workshop, AALTD 2019, Würzburg, Germany, September 20, 2019, Revised Selected Papers 4 (pp. 43-54). Springer International Publishing. [doi.org/10.1007/978-3-030-39098-3_4](https://doi.org/10.1007/978-3-030-39098-3_4).
8 changes: 7 additions & 1 deletion pyriodicity/__init__.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,13 @@
from .detectors import (
ACFPeriodicityDetector,
Autoperiod,
CFDAutoperiod,
FFTPeriodicityDetector,
)

__all__ = ["ACFPeriodicityDetector", "Autoperiod", "FFTPeriodicityDetector"]
__all__ = [
"ACFPeriodicityDetector",
"Autoperiod",
"CFDAutoperiod",
"FFTPeriodicityDetector",
]
2 changes: 2 additions & 0 deletions pyriodicity/detectors/__init__.py
Original file line number Diff line number Diff line change
@@ -1,9 +1,11 @@
from .acf import ACFPeriodicityDetector
from .autoperiod import Autoperiod
from .cfd_autoperiod import CFDAutoperiod
from .fft import FFTPeriodicityDetector

__all__ = [
"ACFPeriodicityDetector",
"Autoperiod",
"CFDAutoperiod",
"FFTPeriodicityDetector",
]
24 changes: 14 additions & 10 deletions pyriodicity/detectors/acf.py
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
from typing import Callable, Optional, Union
from typing import Optional, Union

from numpy.typing import ArrayLike, NDArray
from scipy.signal import argrelmax
from scipy.signal import argrelmax, detrend

from pyriodicity.tools import acf, apply_window, detrend, to_1d_array
from pyriodicity.tools import acf, apply_window, to_1d_array


class ACFPeriodicityDetector:
Expand Down Expand Up @@ -55,7 +55,7 @@ def __init__(self, endog: ArrayLike):
def fit(
self,
max_period_count: Optional[int] = None,
detrend_func: Optional[Union[str, Callable[[ArrayLike], NDArray]]] = "linear",
detrend_func: Optional[str] = "linear",
window_func: Optional[Union[str, float, tuple]] = None,
correlation_func: Optional[str] = "pearson",
) -> NDArray:
Expand All @@ -66,10 +66,9 @@ def fit(
----------
max_period_count : int, optional, default = None
Maximum number of periods to look for.
detrend_func : str, callable, default = None
The kind of detrending to be applied on the series. It can either be
'linear' or 'constant' if it the parameter is of 'str' type, or a
custom function that returns a detrended series.
detrend_func : str, default = 'linear'
The kind of detrending to be applied on the signal. It can either be
'linear' or 'constant'.
window_func : float, str, tuple optional, default = None
Window function to be applied to the time series. Check
'window' parameter documentation for scipy.signal.get_window
Expand Down Expand Up @@ -99,13 +98,18 @@ def fit(
List of detected periods.
"""
# Detrend data
self.y = self.y if detrend_func is None else detrend(self.y, detrend_func)
self.y = self.y if detrend_func is None else detrend(self.y, type=detrend_func)

# Apply window on data
self.y = self.y if window_func is None else apply_window(self.y, window_func)

# Compute the ACF
acf_arr = acf(self.y, len(self.y) // 2, correlation_func)
acf_arr = acf(
self.y,
lag_start=0,
lag_stop=len(self.y) // 2,
correlation_func=correlation_func,
)

# Find the local argmax of the first half of the ACF array
local_argmax = argrelmax(acf_arr)[0]
Expand Down
170 changes: 88 additions & 82 deletions pyriodicity/detectors/autoperiod.py
Original file line number Diff line number Diff line change
@@ -1,11 +1,10 @@
from typing import Callable, Optional, Union
from typing import Optional, Union

import numpy as np
from numpy.typing import ArrayLike, NDArray
from scipy.signal import argrelmax, periodogram
from scipy.stats import linregress
from scipy.signal import argrelmax, detrend, periodogram

from pyriodicity.tools import acf, apply_window, detrend, to_1d_array
from pyriodicity.tools import acf, apply_window, power_threshold, to_1d_array


class Autoperiod:
Expand All @@ -22,9 +21,9 @@ class Autoperiod:
References
----------
.. [1] Vlachos, M., Yu, P., & Castelli, V. (2005).
On periodicity detection and Structural Periodic similarity.
Proceedings of the 2005 SIAM International Conference on Data Mining.
https://doi.org/10.1137/1.9781611972757.40
On periodicity detection and Structural Periodic similarity.
Proceedings of the 2005 SIAM International Conference on Data Mining.
https://doi.org/10.1137/1.9781611972757.40
Examples
--------
Expand Down Expand Up @@ -58,7 +57,7 @@ def fit(
self,
k: int = 100,
percentile: int = 95,
detrend_func: Optional[Union[str, Callable[[ArrayLike], NDArray]]] = "linear",
detrend_func: Optional[str] = "linear",
window_func: Optional[Union[str, float, tuple]] = None,
correlation_func: Optional[str] = "pearson",
) -> NDArray:
Expand All @@ -73,10 +72,9 @@ def fit(
percentile : int, optional, default = 95
Percentage for the percentile parameter used in computing the power
threshold. Value must be between 0 and 100 inclusive.
detrend_func : str, callable, default = None
The kind of detrending to be applied on the series. It can either be
'linear' or 'constant' if it the parameter is of 'str' type, or a
custom function that returns a detrended series.
detrend_func : str, default = 'linear'
The kind of detrending to be applied on the signal. It can either be
'linear' or 'constant'.
window_func : float, str, tuple optional, default = None
Window function to be applied to the time series. Check
'window' parameter documentation for scipy.signal.get_window
Expand Down Expand Up @@ -105,101 +103,104 @@ def fit(
List of detected periods.
"""
# Detrend data
self.y = self.y if detrend_func is None else detrend(self.y, detrend_func)
self.y = self.y if detrend_func is None else detrend(self.y, type=detrend_func)
# Apply window on data
self.y = self.y if window_func is None else apply_window(self.y, window_func)

# Compute the power threshold
p_threshold = self._power_threshold(self.y, k, percentile)
p_threshold = power_threshold(self.y, detrend_func, k, percentile)

# Find period hints
freq, power = periodogram(self.y, window=None, detrend=None)
period_hints = np.array(
freq, power = periodogram(self.y, window=None, detrend=False)
hints = np.array(
[
1 / f
for f, p in zip(freq, power)
if f >= 1 / len(freq) and p >= p_threshold
]
)

# Compute the ACF
length = len(self.y)
acf_arr = acf(self.y, nlags=length, correlation_func=correlation_func)

# Validate period hints
period_hints_valid = []
for p in period_hints:
q = length / p
start = np.floor((p + length / (q + 1)) / 2 - 1).astype(int)
end = np.ceil((p + length / (q - 1)) / 2 + 1).astype(int)

splits = [
self._split(np.arange(len(acf_arr)), acf_arr, start, end, i)
for i in range(start + 2, end)
]
line1, line2, _ = splits[
np.array([error for _, _, error in splits]).argmin()
]

if line1.slope > 0 > line2.slope:
period_hints_valid.append(p)
valid_hints = [
h for h in hints if self._is_hint_valid(self.y, h, correlation_func)
]

period_hints_valid = np.array(period_hints_valid)

# Return the closest ACF peak for each valid period hint
local_argmax = argrelmax(acf_arr)[0]
# Return the closest ACF peak to each valid period hint
length = len(self.y)
hint_ranges = [
np.arange(
np.floor((h + length / (length / h + 1)) / 2 - 1),
np.ceil((h + length / (length / h - 1)) / 2 + 1),
dtype=int,
)
for h in valid_hints
]
acf_arrays = [
acf(
self.y,
lag_start=r[0],
lag_stop=r[-1],
correlation_func=correlation_func,
)
for r in hint_ranges
]
return np.array(
list(
{
min(local_argmax, key=lambda x: abs(x - p))
for p in period_hints_valid
r[0] + min(argrelmax(arr)[0], key=lambda x: abs(x - h))
for h, r, arr in zip(valid_hints, hint_ranges, acf_arrays)
}
)
)

@staticmethod
def _power_threshold(y: ArrayLike, k: int, p: int) -> float:
def _is_hint_valid(
y: ArrayLike,
hint: float,
correlation_func: str,
) -> bool:
"""
Compute the power threshold as the p-th percentile of the maximum
power values of the periodogram of k permutations of the data.
Validate the period hint.
Parameters
----------
y : array_like
Data to be investigated. Must be squeezable to 1-d.
k : int
The number of times the data is randomly permuted to compute
the maximum power values.
p : int
The percentile value used to compute the power threshold.
It determines the cutoff point in the sorted list of the maximum
power values from the periodograms of the permuted data.
Value must be between 0 and 100 inclusive.
See Also
--------
scipy.signal.periodogram
Estimate power spectral density using a periodogram.
hint : float
The period hint to be validated.
correlation_func : str, default = 'pearson'
The correlation function to be used to calculate the ACF of the series
or the signal. Possible values are ['pearson', 'spearman', 'kendall'].
Returns
-------
float
Power threshold of the target data.
bool
Whether the period hint is valid.
"""
max_powers = []
while len(max_powers) < k:
_, power_p = periodogram(
np.random.permutation(y), window=None, detrend=None
)
max_powers.append(power_p.max())
max_powers.sort()
return np.percentile(max_powers, p)
length = len(y)
hint_range = np.arange(
np.floor((hint + length / (length / hint + 1)) / 2 - 1),
np.ceil((hint + length / (length / hint - 1)) / 2 + 1),
dtype=int,
)
acf_arr = acf(
y,
lag_start=hint_range[0],
lag_stop=hint_range[-1],
correlation_func=correlation_func,
)
splits = [
Autoperiod._split(hint_range, acf_arr, 0, len(hint_range), i)
for i in range(1, len(hint_range) - 1)
]
line1, line2, _ = splits[np.array([error for _, _, error in splits]).argmin()]
return line1.coef[-1] > 0 > line2.coef[-1]

@staticmethod
def _split(x: ArrayLike, y: ArrayLike, start: int, end: int, split: int) -> tuple:
"""
Approximate a function at [start, end] with two line segments at
[start, split - 1] and [split, end].
[start, split] and [split, end].
Parameters
----------
Expand All @@ -221,22 +222,27 @@ def _split(x: ArrayLike, y: ArrayLike, start: int, end: int, split: int) -> tupl
Returns
-------
linregress
numpy.polynomial.Polynomial
The first line segment.
linregress
numpy.polynomial.Polynomial
The second line segment.
float
The error of the approximation.
The approximation error.
"""
if not start < split < end:
raise ValueError(
"Invalid start, split, and end values ({}, {}, {})".format(
start, split, end
)
)
x1, y1, x2, y2 = (
x[start:split],
y[start:split],
x[split : end + 1],
y[split : end + 1],
)
line1 = linregress(x1, y1)
line2 = linregress(x2, y2)
error = np.sum(np.abs(y1 - (line1.intercept + line1.slope * x1))) + np.sum(
np.abs(y2 - (line2.intercept + line2.slope * x2))
x[start : split + 1],
y[start : split + 1],
x[split:end],
y[split:end],
)
return line1, line2, error
line1, stats1 = np.polynomial.Polynomial.fit(x1, y1, deg=1, full=True)
line2, stats2 = np.polynomial.Polynomial.fit(x2, y2, deg=1, full=True)
resid1 = 0 if len(stats1[0]) == 0 else stats1[0][0]
resid2 = 0 if len(stats2[0]) == 0 else stats2[0][0]
return line1.convert(), line2.convert(), resid1 + resid2
Loading

0 comments on commit 879641c

Please sign in to comment.