| revenue | |
|---|---|
| 2020-01-01 | 0.539791 |
| 2020-01-02 | 0.386945 |
| 2020-01-03 | 0.846947 |
| 2020-01-04 | 0.868224 |
| 2020-01-05 | 0.000316 |
Basics of Prophetverse API
Prophetverse is a powerful tool for building customized and glass-box forecasting and mix models. In Prophetverse, we define each component of the model as a separate effect, making this library extremely flexible to attend your specific needs.
In this page, we will:
- Understand the structure of
y(the target) andX(media & control variables) using the sktime interface.
- Understand the hyperparameters of Prophetverse
- Fit your first Bayesian MMM and generate forecasts.
1. Data Structures (y and X)
Prophetverse uses the sktime forecasting API. The essentials:
y: a pandas DataFrame indexed by a time index (pd.DatetimeIndexorpd.PeriodIndex). Single column for univariate MMM (e.g.revenue):
For panel datasets (e.g. in the case of multiple products or regions), use a MultiIndex, where the first index level is the entity (e.g. product or region) and the second level is the time.
| revenue | ||
|---|---|---|
| product | ||
| product_a | 2020-01-01 | 0.941121 |
| 2020-01-02 | 0.087908 | |
| 2020-01-03 | 0.271994 | |
| 2020-01-04 | 0.859156 | |
| 2020-01-05 | 0.851053 | |
| ... | ... | ... |
| product_c | 2020-12-27 | 0.542822 |
| 2020-12-28 | 0.405629 | |
| 2020-12-29 | 0.159127 | |
| 2020-12-30 | 0.788819 | |
| 2020-12-31 | 0.252980 |
1098 rows × 1 columns
X: a pandas DataFrame aligned on the same index containing exogenous variables (media spend, price, promotions, macro, etc.). Columns are arbitrary names.
The index type should always be the same for y and X, and every dataframe you use. After choosing Datetime or Period index for y, use the same type for X.
Example of dataset
Here we load a synthetic dataset:
from prophetverse.datasets._mmm.dataset1 import get_dataset
(y, X, *_) = get_dataset()
y.head()/opt/hostedtoolcache/Python/3.11.13/x64/lib/python3.11/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
from .autonotebook import tqdm as notebook_tqdm
2000-01-01 10815512.0
2000-01-02 11120677.0
2000-01-03 11260387.0
2000-01-04 11322533.0
2000-01-05 11321180.0
Freq: D, dtype: float32
The X looks like this:
X.head()| ad_spend_search | ad_spend_social_media | |
|---|---|---|
| 2000-01-01 | 89076.191178 | 98587.488958 |
| 2000-01-02 | 88891.993106 | 99066.321168 |
| 2000-01-03 | 89784.955064 | 97334.106903 |
| 2000-01-04 | 89931.220681 | 101747.300585 |
| 2000-01-05 | 89184.319596 | 93825.221809 |
We will split the dataset into training and testing sets.
from sktime.split import temporal_train_test_split
y_train, y_test, X_train, X_test = temporal_train_test_split(y, X, test_size=0.2)2. Prophetverse model
Think of Prophetverse as the conerstone of you MMM model. It is a flexible class that allows you to define the trend, seasonality, and custom exogenous effects of your model.
Simple Prophetverse model
We can use a simple Prophetverse model with Linear effects and a seasonality component:
from prophetverse import Prophetverse, LinearEffect, LinearFourierSeasonality
from prophetverse.utils.regex import starts_with, no_input_columns
seasonality_effect = LinearFourierSeasonality(
sp_list=[365.25, 7],
fourier_terms_list=[10, 3],
prior_scale=0.1,
freq="D",
effect_mode="additive",
)
ad_spend_effect = LinearEffect()
model = Prophetverse(
exogenous_effects=[
("ad_spend", ad_spend_effect, starts_with("ad")),
("seasonality", seasonality_effect, no_input_columns),
],
)
model.fit(y=y_train, X=X_train)Prophetverse(exogenous_effects=[('ad_spend', LinearEffect(), '^(?:ad)'),
('seasonality',
LinearFourierSeasonality(fourier_terms_list=[10,
3],
freq='D',
prior_scale=0.1,
sp_list=[365.25, 7]),
'^$')])Please rerun this cell to show the HTML repr or trust the notebook.Prophetverse(exogenous_effects=[('ad_spend', LinearEffect(), '^(?:ad)'),
('seasonality',
LinearFourierSeasonality(fourier_terms_list=[10,
3],
freq='D',
prior_scale=0.1,
sp_list=[365.25, 7]),
'^$')])PiecewiseLinearTrend()
LinearEffect()
LinearFourierSeasonality(fourier_terms_list=[10, 3], freq='D', prior_scale=0.1,
sp_list=[365.25, 7])MCMCInferenceEngine()
By default, the model will run a MCMC inference to obtain the parameters. We can, however, easily switch to a MAP inference by setting inference_engine=MAPInferenceEngine() in the model constructor. The MAP inference is generally faster but provides point estimates of the parameters.
To run in-sample and out-of-sample forecasts of total revenue, we can simply call predict. We need to pass a “forecasting horizon” (fh) object, that should preferably be an index of the type of our y and X’s index. Since we want to forecast for both train and test timepoints, we use y.index as fh, and pass the full X as exogenous variables.
fh = y.index
y_pred = model.predict(fh=fh, X=X)
y_pred2000-01-01 14511778.0
2000-01-02 14377447.0
2000-01-03 14452430.0
2000-01-04 14251792.0
2000-01-05 14395013.0
...
2004-12-28 32311316.0
2004-12-29 32069892.0
2004-12-30 31963730.0
2004-12-31 32207748.0
2005-01-01 32308128.0
Freq: D, Length: 1828, dtype: float32
import matplotlib.pyplot as plt
def plot_forecasts(y_pred):
fig, ax = plt.subplots(figsize=(10,5))
ax.plot(y.index.to_timestamp(), y)
ax.plot(y_pred.index, y_pred)
ax.axvline(y_train.index.max().to_timestamp(), color="black", linestyle="--", label="Train/Test split")
fig.show()
plot_forecasts(y_pred)
Getting the components
To obtain the contribution of each component, you can use the predict_components method:
components = model.predict_components(fh=fh, X=X)
components.head()| ad_spend | mean | obs | seasonality | trend | |
|---|---|---|---|---|---|
| 2000-01-01 | 1981369.125 | 14511778.0 | 14529062.0 | -1219.737671 | 12531630.0 |
| 2000-01-02 | 1947268.500 | 14377447.0 | 14293655.0 | -52385.074219 | 12482567.0 |
| 2000-01-03 | 2046009.250 | 14452430.0 | 14486543.0 | -27080.095703 | 12433500.0 |
| 2000-01-04 | 1905338.875 | 14251792.0 | 14198645.0 | -37978.394531 | 12384432.0 |
| 2000-01-05 | 2106310.750 | 14395013.0 | 14363289.0 | -46668.078125 | 12335369.0 |
If you want to obtain all the sample to compute, for example, probabilistic intervals and measure the risk, you can use the predict_component_samples method:
samples = model.predict_component_samples(fh=fh, X=X)
samples| ad_spend | mean | obs | seasonality | trend | ||
|---|---|---|---|---|---|---|
| sample | ||||||
| 0 | 2000-01-01 | 1254598.500 | 13776006.0 | 12404012.0 | -19438.560547 | 12540847.0 |
| 2000-01-02 | 1230125.625 | 13728084.0 | 13289906.0 | -24919.484375 | 12522878.0 | |
| 2000-01-03 | 1320823.250 | 13806783.0 | 12583303.0 | -18948.203125 | 12504908.0 | |
| 2000-01-04 | 1196263.750 | 13666452.0 | 14624822.0 | -16750.462891 | 12486939.0 | |
| 2000-01-05 | 1391929.125 | 13847936.0 | 12907703.0 | -12964.346680 | 12468970.0 | |
| ... | ... | ... | ... | ... | ... | ... |
| 999 | 2004-12-28 | 5246423.500 | 31560616.0 | 30783596.0 | -20673.238281 | 26334866.0 |
| 2004-12-29 | 4964901.500 | 31274182.0 | 31851820.0 | -44955.566406 | 26354236.0 | |
| 2004-12-30 | 4830923.000 | 31137578.0 | 31710976.0 | -66945.757812 | 26373602.0 | |
| 2004-12-31 | 5028303.500 | 31400522.0 | 31078702.0 | -20773.695312 | 26392994.0 | |
| 2005-01-01 | 5102370.000 | 31495128.0 | 33377402.0 | -19621.417969 | 26412376.0 |
1828000 rows × 5 columns