ARAR Model¶

The ARAR model applies a memory-shortening transformation if the underlying process of a given time series $Y_{t}, t = 1, 2, . . ., n$ is "long-memory" then it fits an autoregressive model.

Memory Shortening¶

The model follows five steps to classify $Y_{t}$ and take one of the following three actions:

L: declare $Y_{t}$ as long memory and form $Y_{t}$ by ${\tilde{Y}}_{t} = Y_{t} - \hat{ϕ} Y_{t - \hat{τ}}$
M: declare $Y_{t}$ as moderately long memory and form $Y_{t}$ by ${\tilde{Y}}_{t} = Y_{t} - {\hat{ϕ}}_{1} Y_{t - 1} - {\hat{ϕ}}_{2} Y_{t - 2}$
S: declare $Y_{t}$ as short memory.

If $Y_{t}$ declared to be $L$ or $M$ then the series $Y_{t}$ is transformed again until. The transformation process continuous until the transformed series is classified as short memory. However, the maximum number of transformation process is three, it is very rare a time series require more than 2.

1. For each $τ = 1, 2, . . ., 15$ , we find the value $\hat{ϕ (τ)}$ of $\hat{ϕ}$ that minimizes $E R R (ϕ, τ) = \frac{\sum_{t = τ + 1}^{n} [Y_{t} - ϕ Y_{t - τ}]^{2}}{\sum_{t = τ + 1}^{n} Y_{t}^{2}}$ then define $E r r (τ) = E R R (\hat{ϕ (τ), τ})$ and choose the lag $\hat{τ}$ to be the value of $τ$ that minimizes $E r r (τ)$ .
1. If $E r r (\hat{τ}) \leq 8 / n$ , $Y_{t}$ is a long-memory series.
1. If $\hat{ϕ} (\hat{τ}) \geq 0.93$ and $\hat{τ} > 2$ , $Y_{t}$ is a long-memory series.
1. If $\hat{ϕ} (\hat{τ}) \geq 0.93$ and $\hat{τ} = 1$ or $2$ , $Y_{t}$ is a long-memory series.
1. If $\hat{ϕ} (\hat{τ}) < 0.93$ , $Y_{t}$ is a short-memory series.

Subset Autoregressive Model:¶

In the following we will describe how ARAR algorithm fits an autoregressive process to the mean-corrected series $X_{t} = S_{t} - \bar{S}$ , $t = k + 1, . . ., n$ where $S_{t}, t = k + 1, . . ., n$ is the memory-shortened version of $Y_{t}$ which derived from the five steps we described above and $\bar{S}$ is the sample mean of $S_{k + 1}, . . ., S_{n}$ .

The fitted model has the following form:

$X_{t} = ϕ_{1} X_{t - 1} + ϕ_{1} X_{t - l_{1}} + ϕ_{1} X_{t - l_{1}} + ϕ_{1} X_{t - l_{1}} + Z$

where $Z \sim W N (0, σ^{2})$ . The coefficients $ϕ_{j}$ and white noise variance $σ^{2}$ can be derived from the Yule-Walker equations for given lags $l_{1}, l_{2},$ and $l_{3}$ :

$[\begin{matrix} 1 & \hat{ρ} (l_{1} - 1) & \hat{ρ} (l_{2} - 1) & \hat{ρ} (l_{3} - 1) \\ \hat{ρ} (l_{1} - 1) & 1 & \hat{ρ} (l_{2} - l_{1}) & \hat{ρ} (l_{3} - l_{1}) \\ \hat{ρ} (l_{2} - 1) & \hat{ρ} (l_{2} - l_{1}) & 1 & \hat{ρ} (l_{2} - l_{2}) \\ \hat{ρ} (l_{3} - 1) & \hat{ρ} (l_{3} - l_{1}) & \hat{ρ} (l_{3} - l_{1}) & 1 \end{matrix}] * [\begin{matrix} ϕ_{1} \\ ϕ_{l_{1}} \\ ϕ_{l_{2}} \\ ϕ_{l_{3}} \end{matrix}] = [\begin{matrix} \hat{ρ} (1) \\ \hat{ρ} (l_{1}) \\ \hat{ρ} (l_{2}) \\ \hat{ρ} (l_{3}) \end{matrix}]$

$[\begin{matrix} 1 & \hat{ρ} (l_{1} - 1) & \hat{ρ} (l_{2} - 1) & \hat{ρ} (l_{3} - 1) \\ \hat{ρ} (l_{1} - 1) & 1 & \hat{ρ} (l_{2} - l_{1}) & \hat{ρ} (l_{3} - l_{1}) \\ \hat{ρ} (l_{2} - 1) & \hat{ρ} (l_{2} - l_{1}) & 1 & \hat{ρ} (l_{2} - l_{2}) \\ \hat{ρ} (l_{3} - 1) & \hat{ρ} (l_{3} - l_{1}) & \hat{ρ} (l_{3} - l_{1}) & 1 \end{matrix}] \cdot [\begin{matrix} ϕ_{1} \\ ϕ_{l_{1}} \\ ϕ_{l_{2}} \\ ϕ_{l_{3}} \end{matrix}] = [\begin{matrix} \hat{ρ} (1) \\ \hat{ρ} (l_{1}) \\ \hat{ρ} (l_{2}) \\ \hat{ρ} (l_{3}) \end{matrix}]$

and $σ^{2} = \hat{γ} (0) [1 - ϕ_{1} \hat{ρ} (1)] - ϕ_{l_{1}} \hat{ρ} (l_{1})] - ϕ_{l_{2}} \hat{ρ} (l_{2})] - ϕ_{l_{3}} \hat{ρ} (l_{3})]$ , where $\hat{γ} (j)$ and $\hat{ρ} (j), j = 0, 1, 2, . . .,$ are the sample autocovariances and autocorelations of the series $X_{t}$ .

The algorithm computes the coefficients of $ϕ (j)$ for each set of lags where $1 < l_{1} < l_{2} < l_{3} \leq m$ where m chosen to be 13 or 26. The algorithm selects the model that the Yule-Walker estimate of $σ^{2}$ is minimal.

Forecasting¶

If short-memory filter found in first step it has coefficients $Ψ_{0}, Ψ_{1}, . . ., Ψ_{k} (k \geq 0)$ where $Ψ_{0} = 1$ . In this case the transforemed series can be expressed as $S_{t} = Ψ (B) Y_{t} = Y_{t} + Ψ_{1} Y_{t - 1} + . . . + Ψ_{k} Y_{t - k},$ where $Ψ (B) = 1 + Ψ_{1} B + . . . + Ψ_{k} B^{k}$ is polynomial in the back-shift operator.

If the coefficients of the subset autoregression found in the second step it has coefficients $ϕ_{1}, ϕ_{l_{1}}, ϕ_{l_{2}}$ and $ϕ_{l_{3}}$ then the subset AR model for $X_{t} = S_{t} - \bar{S}$ is

$ϕ (B) X_{t} = Z_{t},$

where $Z_{t}$ is a white-noise series with zero mean and constant variance and $ϕ (B) = 1 - ϕ_{1} B - ϕ_{l_{1}} B^{l_{1}} - ϕ_{l_{2}} B^{l_{2}} - ϕ_{l_{3}} B^{l_{3}}$ . From equation (1) and (2) one can obtain

$ξ (B) Y_{t} = ϕ (1) \bar{S} + Z_{t},$ where $ξ (B) = Ψ (B) ϕ (B)$ .

Assuming the fitted model in equation (3) is an appropriate model, and $Z_{t}$ is uncorrelated with $Y_{j}, j < t$ $\forall t \in T$ , one can determine minimum mean squared error linear predictors $P_{n} Y_{n + h}$ of $Y_{n + h}$ in terms of $1, Y_{1}, . . ., Y_{n}$ for $n > k + l_{3}$ , from recursions

$P_{n} Y_{n + h} = - \sum_{j = 1}^{k + l_{3}} ξ P_{n} Y_{n + h - j} + ϕ (1) \bar{S}, h \geq 1,$ with the initial conditions $P_{n} Y_{n + h} = Y_{n + h}$ , for $h \leq 0$ .

Ref: Brockwell, Peter J, and Richard A. Davis. Introduction to Time Series and Forecasting. Springer (2016)

ℹ️ Note

The python implementation of the ARAR algorithm in skforecast is based on the Julia package Durbyn.jl develop by Resul Akay.

Libraries and data¶

In [1]:

Copied!





# Libraries
# ==============================================================================
import matplotlib.pyplot as plt
from skforecast.stats import Arar
from skforecast.recursive import ForecasterStats
from skforecast.model_selection import TimeSeriesFold, backtesting_stats
from skforecast.datasets import fetch_dataset
from skforecast.plot import set_dark_theme
# Libraries
# ==============================================================================
import matplotlib.pyplot as plt
from skforecast.stats import Arar
from skforecast.recursive import ForecasterStats
from skforecast.model_selection import TimeSeriesFold, backtesting_stats
from skforecast.datasets import fetch_dataset
from skforecast.plot import set_dark_theme

In [2]:

Copied!





# Download data
# ==============================================================================
data = fetch_dataset(name='fuel_consumption', raw=False)
data = data.loc[:'1990-01-01 00:00:00']
y = data['Gasolinas'].rename('y').rename_axis('date')
y
# Download data
# ==============================================================================
data = fetch_dataset(name='fuel_consumption', raw=False)
data = data.loc[:'1990-01-01 00:00:00']
y = data['Gasolinas'].rename('y').rename_axis('date')
y

╭──────────────────────────────── fuel_consumption ────────────────────────────────╮
│ Description:                                                                     │
│ Monthly fuel consumption in Spain from 1969-01-01 to 2022-08-01.                 │
│                                                                                  │
│ Source:                                                                          │
│ Obtained from Corporación de Reservas Estratégicas de Productos Petrolíferos and │
│ Corporación de Derecho Público tutelada por el Ministerio para la Transición     │
│ Ecológica y el Reto Demográfico. https://www.cores.es/es/estadisticas            │
│                                                                                  │
│ URL:                                                                             │
│ https://raw.githubusercontent.com/skforecast/skforecast-                         │
│ datasets/main/data/consumos-combustibles-mensual.csv                             │
│                                                                                  │
│ Shape: 644 rows x 5 columns                                                      │
╰──────────────────────────────────────────────────────────────────────────────────╯

Out[2]:

date
1969-01-01    166875.2129
1969-02-01    155466.8105
1969-03-01    184983.6699
1969-04-01    202319.8164
1969-05-01    206259.1523
                 ...     
1989-09-01    687649.2852
1989-10-01    669889.1602
1989-11-01    601413.8867
1989-12-01    663568.1055
1990-01-01    610241.2461
Freq: MS, Name: y, Length: 253, dtype: float64

ARAR¶

Skforecast provides the class ARAR to facilitate the implementation of ARAR models in Python, allowing users to easily fit and forecast time series data using this approach.

In [3]:

Copied!





# ARAR model
# ==============================================================================
model = Arar()
model.fit(y)
# ARAR model
# ==============================================================================
model = Arar()
model.fit(y)

Out[3]:

Arar(max_ar_depth=26, max_lag=40)

In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

Once de model is fitted, future observations can be forecasted using the predict and predict_interval methods.

In [4]:

Copied!

# Prediction
# ==============================================================================
model.predict(steps=10)
# Prediction
# ==============================================================================
model.predict(steps=10)

Out[4]:

array([576270.29065713, 711350.90294941, 645064.14251878, 699974.70526107,
       693641.4876215 , 813391.3131971 , 849840.34223407, 728834.11322404,
       698899.25161967, 640834.1450568 ])

In [5]:

Copied!

# Prediction interval
# ==============================================================================
model.predict_interval(steps=10, level=[95])
# Prediction interval
# ==============================================================================
model.predict_interval(steps=10, level=[95])

Out[5]:

	mean	lower_95	upper_95
step
1	576270.290657	535285.283204	617255.298110
2	711350.902949	669971.979006	752729.826893
3	645064.142519	597182.272797	692946.012240
4	699974.705261	651641.922678	748307.487844
5	693641.487621	643148.019307	744134.955936
6	813391.313197	762568.607694	864214.018700
7	849840.342234	798207.532180	901473.152288
8	728834.113224	677001.238600	780666.987848
9	698899.251620	646741.565713	751056.937527
10	640834.145057	588566.126391	693102.163723

ForecasterStats¶

The previous section introduced the construction of ARAR models. In order to seamlessly integrate these models with the various functionalities provided by skforecast, the next step is to encapsulate the skforecast ARAR model within a ForecasterStats object. This encapsulation harmonizes the intricacies of the model and allows for the coherent use of skforecast's extensive capabilities.

In [6]:

Copied!





# Create and fit ForecasterStats
# ==============================================================================
forecaster = ForecasterStats(estimator=Arar())
forecaster.fit(y=y)
forecaster
# Create and fit ForecasterStats
# ==============================================================================
forecaster = ForecasterStats(estimator=Arar())
forecaster.fit(y=y)
forecaster

Out[6]:

ForecasterStats

General Information

Estimator: Arar
Window size: 1
Series name: y
Exogenous included: False
Creation date: 2025-11-26 15:05:12
Last fit date: 2025-11-26 15:05:12
Skforecast version: 0.19.0
Python version: 3.12.11
Forecaster id: None

Exogenous Variables

None

Data Transformations

Transformer for y: None
Transformer for exog: None

Training Information

Training range: [Timestamp('1969-01-01 00:00:00'), Timestamp('1990-01-01 00:00:00')]
Training index type: DatetimeIndex
Training index frequency: MS

Estimator Parameters

{'max_ar_depth': 26, 'max_lag': 40, 'safe': True}

Fit Kwargs

{}

🛈 API Reference 🗎 User Guide

In [7]:

Copied!

# Feature importances
# ==============================================================================
forecaster.get_feature_importances()
# Feature importances
# ==============================================================================
forecaster.get_feature_importances()

Out[7]:

	feature	importance
0	lag_2	0.568527
1	lag_14	0.318155
2	lag_1	0.138978
3	lag_12	-0.351038

Prediction¶

In [8]:

Copied!





# Predict
# ==============================================================================
predictions = forecaster.predict(steps=10)
predictions.head(3)
# Predict
# ==============================================================================
predictions = forecaster.predict(steps=10)
predictions.head(3)

Out[8]:

1990-02-01    576270.290657
1990-03-01    711350.902949
1990-04-01    645064.142519
Freq: MS, Name: pred, dtype: float64

In [9]:

Copied!





# Predict intervals
# ==============================================================================
predictions = forecaster.predict_interval(steps=36, alpha=0.05)
predictions.head(3)
# Predict intervals
# ==============================================================================
predictions = forecaster.predict_interval(steps=36, alpha=0.05)
predictions.head(3)

Out[9]:

	pred	lower_bound	upper_bound
1990-02-01	576270.290657	535285.283204	617255.298110
1990-03-01	711350.902949	669971.979006	752729.826893
1990-04-01	645064.142519	597182.272797	692946.012240

Backtesting¶

ARAR and other statistical models, once integrated in a ForecasterStats object, can be evaluated using any of the backtesting strategies implemented in skforecast.

In [10]:

Copied!





# Backtesting
# ==============================================================================
cv = TimeSeriesFold(
    initial_train_size = 150,
    steps              = 12,
    refit              = True,
)

metric, predictions = backtesting_stats(
    y               = y,
    forecaster      = forecaster,
    cv              = cv,
    interval        = [2.5, 97.5],
    metric          = 'mean_absolute_error',
    verbose         = False
)
# Backtesting
# ==============================================================================
cv = TimeSeriesFold(
    initial_train_size = 150,
    steps              = 12,
    refit              = True,
)

metric, predictions = backtesting_stats(
    y               = y,
    forecaster      = forecaster,
    cv              = cv,
    interval        = [2.5, 97.5],
    metric          = 'mean_absolute_error',
    verbose         = False
)

  0%|          | 0/9 [00:00<?, ?it/s]

In [11]:

Copied!

# Backtest predictions
# ==============================================================================
predictions.head(4)
# Backtest predictions
# ==============================================================================
predictions.head(4)

Out[11]:

	pred	lower_bound	upper_bound
1981-07-01	585006.456464	548872.543529	621140.369400
1981-08-01	632872.256680	596247.977571	669496.535788
1981-09-01	515431.057548	474418.134356	556443.980739
1981-10-01	523423.286271	481982.529292	564864.043250

In [12]:

Copied!





# Plot predictions
# ==============================================================================
set_dark_theme()
fig, ax = plt.subplots(figsize=(7, 4))
y.loc[predictions.index].plot(ax=ax, label='y')
predictions['pred'].plot(ax=ax, label='predictions')
ax.fill_between(
        predictions.index,
        predictions['lower_bound'],
        predictions['upper_bound'],
        label='prediction interval',
        color='gray',
        alpha=0.6,
        zorder=1
    )
plt.legend()
plt.show()
# Plot predictions
# ==============================================================================
set_dark_theme()
fig, ax = plt.subplots(figsize=(7, 4))
y.loc[predictions.index].plot(ax=ax, label='y')
predictions['pred'].plot(ax=ax, label='predictions')
ax.fill_between(
        predictions.index,
        predictions['lower_bound'],
        predictions['upper_bound'],
        label='prediction interval',
        color='gray',
        alpha=0.6,
        zorder=1
    )
plt.legend()
plt.show()

No description has been provided for this image