revenue | |
---|---|
2020-01-01 | 0.996449 |
2020-01-02 | 0.328868 |
2020-01-03 | 0.690418 |
2020-01-04 | 0.226343 |
2020-01-05 | 0.807851 |
Basics of Prophetverse API
Prophetverse is a powerful tool for building customized and glass-box forecasting and mix models. In Prophetverse, we define each component of the model as a separate effect, making this library extremely flexible to attend your specific needs.
In this page, we will:
- Understand the structure of
y
(the target) andX
(media & control variables) using the sktime interface.
- Understand the hyperparameters of Prophetverse
- Fit your first Bayesian MMM and generate forecasts.
1. Data Structures (y
and X
)
Prophetverse uses the sktime forecasting API. The essentials:
y
: a pandas DataFrame indexed by a time index (pd.DatetimeIndex
orpd.PeriodIndex
). Single column for univariate MMM (e.g.revenue
):
For panel datasets (e.g. in the case of multiple products or regions), use a MultiIndex, where the first index level is the entity (e.g. product or region) and the second level is the time.
revenue | ||
---|---|---|
product | ||
product_a | 2020-01-01 | 0.317748 |
2020-01-02 | 0.320420 | |
2020-01-03 | 0.531044 | |
2020-01-04 | 0.701452 | |
2020-01-05 | 0.533921 | |
... | ... | ... |
product_c | 2020-12-27 | 0.615222 |
2020-12-28 | 0.085124 | |
2020-12-29 | 0.656010 | |
2020-12-30 | 0.119299 | |
2020-12-31 | 0.843835 |
1098 rows × 1 columns
X
: a pandas DataFrame aligned on the same index containing exogenous variables (media spend, price, promotions, macro, etc.). Columns are arbitrary names.
The index type should always be the same for y
and X
, and every dataframe you use. After choosing Datetime or Period index for y
, use the same type for X
.
Example of dataset
Here we load a synthetic dataset:
from prophetverse.datasets._mmm.dataset1 import get_dataset
*_) = get_dataset()
(y, X,
y.head()
/opt/hostedtoolcache/Python/3.11.13/x64/lib/python3.11/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
from .autonotebook import tqdm as notebook_tqdm
2000-01-01 10815512.0
2000-01-02 11120677.0
2000-01-03 11260387.0
2000-01-04 11322533.0
2000-01-05 11321180.0
Freq: D, dtype: float32
The X
looks like this:
X.head()
ad_spend_search | ad_spend_social_media | |
---|---|---|
2000-01-01 | 89076.191178 | 98587.488958 |
2000-01-02 | 88891.993106 | 99066.321168 |
2000-01-03 | 89784.955064 | 97334.106903 |
2000-01-04 | 89931.220681 | 101747.300585 |
2000-01-05 | 89184.319596 | 93825.221809 |
We will split the dataset into training and testing sets.
from sktime.split import temporal_train_test_split
= temporal_train_test_split(y, X, test_size=0.2) y_train, y_test, X_train, X_test
2. Prophetverse model
Think of Prophetverse as the conerstone of you MMM model. It is a flexible class that allows you to define the trend, seasonality, and custom exogenous effects of your model.
Simple Prophetverse model
We can use a simple Prophetverse model with Linear effects and a seasonality component:
from prophetverse import Prophetverse, LinearEffect, LinearFourierSeasonality
from prophetverse.utils.regex import starts_with, no_input_columns
= LinearFourierSeasonality(
seasonality_effect =[365.25, 7],
sp_list=[10, 3],
fourier_terms_list=0.1,
prior_scale="D",
freq="additive",
effect_mode
)
= LinearEffect()
ad_spend_effect
= Prophetverse(
model =[
exogenous_effects"ad_spend", ad_spend_effect, starts_with("ad")),
("seasonality", seasonality_effect, no_input_columns),
(
],
)
=y_train, X=X_train) model.fit(y
Prophetverse(exogenous_effects=[('ad_spend', LinearEffect(), '^(?:ad)'), ('seasonality', LinearFourierSeasonality(fourier_terms_list=[10, 3], freq='D', prior_scale=0.1, sp_list=[365.25, 7]), '^$')])Please rerun this cell to show the HTML repr or trust the notebook.
Prophetverse(exogenous_effects=[('ad_spend', LinearEffect(), '^(?:ad)'), ('seasonality', LinearFourierSeasonality(fourier_terms_list=[10, 3], freq='D', prior_scale=0.1, sp_list=[365.25, 7]), '^$')])
PiecewiseLinearTrend()
LinearEffect()
LinearFourierSeasonality(fourier_terms_list=[10, 3], freq='D', prior_scale=0.1, sp_list=[365.25, 7])
MCMCInferenceEngine()
By default, the model will run a MCMC inference to obtain the parameters. We can, however, easily switch to a MAP inference by setting inference_engine=MAPInferenceEngine()
in the model constructor. The MAP inference is generally faster but provides point estimates of the parameters.
To run in-sample and out-of-sample forecasts of total revenue, we can simply call predict. We need to pass a “forecasting horizon” (fh) object, that should preferably be an index of the type of our y
and X
’s index. Since we want to forecast for both train and test timepoints, we use y.index
as fh, and pass the full X
as exogenous variables.
= y.index
fh
= model.predict(fh=fh, X=X)
y_pred
y_pred
2000-01-01 14443361.0
2000-01-02 14313327.0
2000-01-03 14390056.0
2000-01-04 14201023.0
2000-01-05 14345439.0
...
2004-12-28 32365100.0
2004-12-29 32114084.0
2004-12-30 32007728.0
2004-12-31 32247330.0
2005-01-01 32348978.0
Freq: D, Length: 1828, dtype: float32
import matplotlib.pyplot as plt
def plot_forecasts(y_pred):
= plt.subplots(figsize=(10,5))
fig, ax
ax.plot(y.index.to_timestamp(), y)
ax.plot(y_pred.index, y_pred)max().to_timestamp(), color="black", linestyle="--", label="Train/Test split")
ax.axvline(y_train.index.
fig.show()
plot_forecasts(y_pred)
Getting the components
To obtain the contribution of each component, you can use the predict_components
method:
= model.predict_components(fh=fh, X=X)
components components.head()
ad_spend | mean | obs | seasonality | trend | |
---|---|---|---|---|---|
2000-01-01 | 1923415.375 | 14443361.0 | 14460642.0 | -12597.905273 | 12532545.0 |
2000-01-02 | 1889635.250 | 14313327.0 | 14229537.0 | -62331.843750 | 12486025.0 |
2000-01-03 | 1990429.000 | 14390056.0 | 14424170.0 | -39879.089844 | 12439505.0 |
2000-01-04 | 1847318.375 | 14201023.0 | 14147878.0 | -39278.902344 | 12392987.0 |
2000-01-05 | 2054552.375 | 14345439.0 | 14313717.0 | -55579.269531 | 12346465.0 |
If you want to obtain all the sample to compute, for example, probabilistic intervals and measure the risk, you can use the predict_component_samples
method:
= model.predict_component_samples(fh=fh, X=X)
samples samples
ad_spend | mean | obs | seasonality | trend | ||
---|---|---|---|---|---|---|
sample | ||||||
0 | 2000-01-01 | 1310761.000 | 13826540.0 | 12454546.0 | -27482.208984 | 12543262.0 |
2000-01-02 | 1286857.375 | 13774537.0 | 13336360.0 | -36641.351562 | 12524322.0 | |
2000-01-03 | 1374479.375 | 13850060.0 | 12626579.0 | -29800.537109 | 12505381.0 | |
2000-01-04 | 1255226.625 | 13722347.0 | 14680718.0 | -19320.796875 | 12486441.0 | |
2000-01-05 | 1441231.000 | 13884066.0 | 12943833.0 | -24664.720703 | 12467500.0 | |
... | ... | ... | ... | ... | ... | ... |
999 | 2004-12-28 | 5203957.000 | 31752398.0 | 30975378.0 | -21350.986328 | 26569792.0 |
2004-12-29 | 4921914.500 | 31459324.0 | 32036960.0 | -52057.492188 | 26589464.0 | |
2004-12-30 | 4792451.000 | 31325616.0 | 31899014.0 | -76011.125000 | 26609176.0 | |
2004-12-31 | 4990637.500 | 31583308.0 | 31261488.0 | -36191.675781 | 26628860.0 | |
2005-01-01 | 5061657.000 | 31679072.0 | 33561344.0 | -31159.398438 | 26648572.0 |
1828000 rows × 5 columns