Detecting spikes and troughs in time series data
Spikes and troughs in time series data can be detected using a class of algorithms called change point detection algorithms [1], [2].
A Python package that implements several change point detection algorithms is ruptures.
If you want to start with a simple and interpretable algorithm before moving on to more advanced ones, the rolling z-score heuristic, also known as a Shewhart individuals control chart, could be a good start.
The rolling z-score heuristic works as follows:
- For each observation in the time series compute the mean and standard deviation of the previous N observations, where N is the considered window size.
- If the observation is more than M (typically 3) standard deviations away from the mean, then report the observation as a change point.
A Python implementation of the rolling z-score heuristic for detecting spikes and troughs is given below.
import polars as pl
df = pl.DataFrame({'value': [0.8, 0.7, 0.9, 0.6, 0.4, 32.2, 31.9, 32.7]})
# Rolling mean and std
window = 4
weights = (window-1) * [1] + [0.1]
df = df.with_columns([
pl.col('value').rolling_mean(window_size=window, weights=weights).alias("mean"),
pl.col('value').rolling_std(window_size=window, weights=weights).alias("std")
])
# Compute z-score: (x - mean) / std
df = df.with_columns(
((pl.col('value') - pl.col('mean')) / pl.col('std')).alias("zscore")
)
# Filter only spikes or troughs
z_thresh = 3
print(df.filter(pl.col('zscore').abs() > z_thresh))