Basics of Prophetverse API

Prophetverse is a powerful tool for building customized and glass-box forecasting and mix models. In Prophetverse, we define each component of the model as a separate effect, making this library extremely flexible to attend your specific needs.

In this page, we will:

  1. Understand the structure of y (the target) and X (media & control variables) using the sktime interface.
  2. Understand the hyperparameters of Prophetverse
  3. Fit your first Bayesian MMM and generate forecasts.

1. Data Structures (y and X)

Prophetverse uses the sktime forecasting API. The essentials:

  • y: a pandas DataFrame indexed by a time index (pd.DatetimeIndex or pd.PeriodIndex). Single column for univariate MMM (e.g. revenue):
revenue
2020-01-01 0.284627
2020-01-02 0.652842
2020-01-03 0.860751
2020-01-04 0.802704
2020-01-05 0.285768

For panel datasets (e.g. in the case of multiple products or regions), use a MultiIndex, where the first index level is the entity (e.g. product or region) and the second level is the time.

revenue
product
product_a 2020-01-01 0.878795
2020-01-02 0.989028
2020-01-03 0.059886
2020-01-04 0.212564
2020-01-05 0.688541
... ... ...
product_c 2020-12-27 0.569344
2020-12-28 0.683621
2020-12-29 0.189618
2020-12-30 0.883356
2020-12-31 0.158255

1098 rows × 1 columns

  • X: a pandas DataFrame aligned on the same index containing exogenous variables (media spend, price, promotions, macro, etc.). Columns are arbitrary names.

The index type should always be the same for y and X, and every dataframe you use. After choosing Datetime or Period index for y, use the same type for X.

Example of dataset

Here we load a synthetic dataset:

from prophetverse.datasets._mmm.dataset1 import get_dataset

(y, X, *_) = get_dataset()


y.head()
/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm
2000-01-01    10815512.0
2000-01-02    11120677.0
2000-01-03    11260387.0
2000-01-04    11322533.0
2000-01-05    11321180.0
Freq: D, dtype: float32

The X looks like this:

X.head()
ad_spend_search ad_spend_social_media
2000-01-01 89076.191178 98587.488958
2000-01-02 88891.993106 99066.321168
2000-01-03 89784.955064 97334.106903
2000-01-04 89931.220681 101747.300585
2000-01-05 89184.319596 93825.221809

We will split the dataset into training and testing sets.

from sktime.split import temporal_train_test_split

y_train, y_test, X_train, X_test = temporal_train_test_split(y, X, test_size=0.2)

2. Prophetverse model

Think of Prophetverse as the conerstone of you MMM model. It is a flexible class that allows you to define the trend, seasonality, and custom exogenous effects of your model.

Simple Prophetverse model

We can use a simple Prophetverse model with Linear effects and a seasonality component:

from prophetverse import Prophetverse, LinearEffect, LinearFourierSeasonality

from prophetverse.utils.regex import starts_with, no_input_columns


seasonality_effect = LinearFourierSeasonality(
    sp_list=[365.25, 7],
    fourier_terms_list=[10, 3],
    prior_scale=0.1,
    freq="D",
    effect_mode="additive",
)

ad_spend_effect = LinearEffect()

model = Prophetverse(
    exogenous_effects=[
        ("ad_spend", ad_spend_effect, starts_with("ad")),
        ("seasonality", seasonality_effect, no_input_columns),
    ],
)

model.fit(y=y_train, X=X_train)
Prophetverse(exogenous_effects=[('ad_spend', LinearEffect(), '^(?:ad)'),
                                ('seasonality',
                                 LinearFourierSeasonality(fourier_terms_list=[10,
                                                                              3],
                                                          freq='D',
                                                          prior_scale=0.1,
                                                          sp_list=[365.25, 7]),
                                 '^$')])
Please rerun this cell to show the HTML repr or trust the notebook.

By default, the model will run a MCMC inference to obtain the parameters. We can, however, easily switch to a MAP inference by setting inference_engine=MAPInferenceEngine() in the model constructor. The MAP inference is generally faster but provides point estimates of the parameters.

To run in-sample and out-of-sample forecasts of total revenue, we can simply call predict. We need to pass a “forecasting horizon” (fh) object, that should preferably be an index of the type of our y and X’s index. Since we want to forecast for both train and test timepoints, we use y.index as fh, and pass the full X as exogenous variables.

fh = y.index

y_pred = model.predict(fh=fh, X=X)

y_pred
2000-01-01    14483120.0
2000-01-02    14349842.0
2000-01-03    14425632.0
2000-01-04    14234289.0
2000-01-05    14376405.0
                 ...    
2004-12-28    32365182.0
2004-12-29    32117918.0
2004-12-30    32005472.0
2004-12-31    32258116.0
2005-01-01    32360856.0
Freq: D, Length: 1828, dtype: float32
import matplotlib.pyplot as plt


def plot_forecasts(y_pred):
    fig, ax = plt.subplots(figsize=(10,5))

    ax.plot(y.index.to_timestamp(), y)
    ax.plot(y_pred.index, y_pred)
    ax.axvline(y_train.index.max().to_timestamp(), color="black", linestyle="--", label="Train/Test split")
    fig.show()

plot_forecasts(y_pred)

Getting the components

To obtain the contribution of each component, you can use the predict_components method:

components = model.predict_components(fh=fh, X=X)
components.head()
ad_spend mean obs seasonality trend
2000-01-01 1951345.875 14483120.0 14500401.0 614.704651 12531161.0
2000-01-02 1917232.500 14349842.0 14266047.0 -50484.121094 12483091.0
2000-01-03 2016962.625 14425632.0 14459742.0 -26353.626953 12435021.0
2000-01-04 1874735.000 14234289.0 14181142.0 -27398.041016 12386951.0
2000-01-05 2079018.125 14376405.0 14344686.0 -41493.980469 12338882.0

If you want to obtain all the sample to compute, for example, probabilistic intervals and measure the risk, you can use the predict_component_samples method:

samples = model.predict_component_samples(fh=fh, X=X)
samples
ad_spend mean obs seasonality trend
sample
0 2000-01-01 1182749.500 13707321.0 12335327.0 -15333.103516 12539905.0
2000-01-02 1158945.250 13661526.0 13223348.0 -23272.130859 12525853.0
2000-01-03 1249228.625 13742996.0 12519516.0 -18033.396484 12511801.0
2000-01-04 1125272.125 13619009.0 14577379.0 -4012.927979 12497750.0
2000-01-05 1321673.125 13799107.0 12858874.0 -6265.105469 12483698.0
... ... ... ... ... ... ...
999 2004-12-28 5269823.000 31613836.0 30836816.0 -16992.765625 26361004.0
2004-12-29 4986981.000 31321890.0 31899526.0 -45524.140625 26380432.0
2004-12-30 4852499.000 31175388.0 31748786.0 -76963.437500 26399854.0
2004-12-31 5050830.500 31449106.0 31127286.0 -21041.710938 26419318.0
2005-01-01 5125169.000 31548410.0 33430684.0 -15503.842773 26438742.0

1828000 rows × 5 columns