Basics of Prophetverse API

Prophetverse is a powerful tool for building customized and glass-box forecasting and mix models. In Prophetverse, we define each component of the model as a separate effect, making this library extremely flexible to attend your specific needs.

In this page, we will:

  1. Understand the structure of y (the target) and X (media & control variables) using the sktime interface.
  2. Understand the hyperparameters of Prophetverse
  3. Fit your first Bayesian MMM and generate forecasts.

1. Data Structures (y and X)

Prophetverse uses the sktime forecasting API. The essentials:

  • y: a pandas DataFrame indexed by a time index (pd.DatetimeIndex or pd.PeriodIndex). Single column for univariate MMM (e.g. revenue):
revenue
2020-01-01 0.996449
2020-01-02 0.328868
2020-01-03 0.690418
2020-01-04 0.226343
2020-01-05 0.807851

For panel datasets (e.g. in the case of multiple products or regions), use a MultiIndex, where the first index level is the entity (e.g. product or region) and the second level is the time.

revenue
product
product_a 2020-01-01 0.317748
2020-01-02 0.320420
2020-01-03 0.531044
2020-01-04 0.701452
2020-01-05 0.533921
... ... ...
product_c 2020-12-27 0.615222
2020-12-28 0.085124
2020-12-29 0.656010
2020-12-30 0.119299
2020-12-31 0.843835

1098 rows × 1 columns

  • X: a pandas DataFrame aligned on the same index containing exogenous variables (media spend, price, promotions, macro, etc.). Columns are arbitrary names.

The index type should always be the same for y and X, and every dataframe you use. After choosing Datetime or Period index for y, use the same type for X.

Example of dataset

Here we load a synthetic dataset:

from prophetverse.datasets._mmm.dataset1 import get_dataset

(y, X, *_) = get_dataset()


y.head()
/opt/hostedtoolcache/Python/3.11.13/x64/lib/python3.11/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm
2000-01-01    10815512.0
2000-01-02    11120677.0
2000-01-03    11260387.0
2000-01-04    11322533.0
2000-01-05    11321180.0
Freq: D, dtype: float32

The X looks like this:

X.head()
ad_spend_search ad_spend_social_media
2000-01-01 89076.191178 98587.488958
2000-01-02 88891.993106 99066.321168
2000-01-03 89784.955064 97334.106903
2000-01-04 89931.220681 101747.300585
2000-01-05 89184.319596 93825.221809

We will split the dataset into training and testing sets.

from sktime.split import temporal_train_test_split

y_train, y_test, X_train, X_test = temporal_train_test_split(y, X, test_size=0.2)

2. Prophetverse model

Think of Prophetverse as the conerstone of you MMM model. It is a flexible class that allows you to define the trend, seasonality, and custom exogenous effects of your model.

Simple Prophetverse model

We can use a simple Prophetverse model with Linear effects and a seasonality component:

from prophetverse import Prophetverse, LinearEffect, LinearFourierSeasonality

from prophetverse.utils.regex import starts_with, no_input_columns


seasonality_effect = LinearFourierSeasonality(
    sp_list=[365.25, 7],
    fourier_terms_list=[10, 3],
    prior_scale=0.1,
    freq="D",
    effect_mode="additive",
)

ad_spend_effect = LinearEffect()

model = Prophetverse(
    exogenous_effects=[
        ("ad_spend", ad_spend_effect, starts_with("ad")),
        ("seasonality", seasonality_effect, no_input_columns),
    ],
)

model.fit(y=y_train, X=X_train)
Prophetverse(exogenous_effects=[('ad_spend', LinearEffect(), '^(?:ad)'),
                                ('seasonality',
                                 LinearFourierSeasonality(fourier_terms_list=[10,
                                                                              3],
                                                          freq='D',
                                                          prior_scale=0.1,
                                                          sp_list=[365.25, 7]),
                                 '^$')])
Please rerun this cell to show the HTML repr or trust the notebook.

By default, the model will run a MCMC inference to obtain the parameters. We can, however, easily switch to a MAP inference by setting inference_engine=MAPInferenceEngine() in the model constructor. The MAP inference is generally faster but provides point estimates of the parameters.

To run in-sample and out-of-sample forecasts of total revenue, we can simply call predict. We need to pass a “forecasting horizon” (fh) object, that should preferably be an index of the type of our y and X’s index. Since we want to forecast for both train and test timepoints, we use y.index as fh, and pass the full X as exogenous variables.

fh = y.index

y_pred = model.predict(fh=fh, X=X)

y_pred
2000-01-01    14443361.0
2000-01-02    14313327.0
2000-01-03    14390056.0
2000-01-04    14201023.0
2000-01-05    14345439.0
                 ...    
2004-12-28    32365100.0
2004-12-29    32114084.0
2004-12-30    32007728.0
2004-12-31    32247330.0
2005-01-01    32348978.0
Freq: D, Length: 1828, dtype: float32
import matplotlib.pyplot as plt


def plot_forecasts(y_pred):
    fig, ax = plt.subplots(figsize=(10,5))

    ax.plot(y.index.to_timestamp(), y)
    ax.plot(y_pred.index, y_pred)
    ax.axvline(y_train.index.max().to_timestamp(), color="black", linestyle="--", label="Train/Test split")
    fig.show()

plot_forecasts(y_pred)

Getting the components

To obtain the contribution of each component, you can use the predict_components method:

components = model.predict_components(fh=fh, X=X)
components.head()
ad_spend mean obs seasonality trend
2000-01-01 1923415.375 14443361.0 14460642.0 -12597.905273 12532545.0
2000-01-02 1889635.250 14313327.0 14229537.0 -62331.843750 12486025.0
2000-01-03 1990429.000 14390056.0 14424170.0 -39879.089844 12439505.0
2000-01-04 1847318.375 14201023.0 14147878.0 -39278.902344 12392987.0
2000-01-05 2054552.375 14345439.0 14313717.0 -55579.269531 12346465.0

If you want to obtain all the sample to compute, for example, probabilistic intervals and measure the risk, you can use the predict_component_samples method:

samples = model.predict_component_samples(fh=fh, X=X)
samples
ad_spend mean obs seasonality trend
sample
0 2000-01-01 1310761.000 13826540.0 12454546.0 -27482.208984 12543262.0
2000-01-02 1286857.375 13774537.0 13336360.0 -36641.351562 12524322.0
2000-01-03 1374479.375 13850060.0 12626579.0 -29800.537109 12505381.0
2000-01-04 1255226.625 13722347.0 14680718.0 -19320.796875 12486441.0
2000-01-05 1441231.000 13884066.0 12943833.0 -24664.720703 12467500.0
... ... ... ... ... ... ...
999 2004-12-28 5203957.000 31752398.0 30975378.0 -21350.986328 26569792.0
2004-12-29 4921914.500 31459324.0 32036960.0 -52057.492188 26589464.0
2004-12-30 4792451.000 31325616.0 31899014.0 -76011.125000 26609176.0
2004-12-31 4990637.500 31583308.0 31261488.0 -36191.675781 26628860.0
2005-01-01 5061657.000 31679072.0 33561344.0 -31159.398438 26648572.0

1828000 rows × 5 columns