Factorial design with empirical Bayes and Thompson Sampling
This tutorial illustrates how to run a factorial experiment. In such an experiment, each parameter (factor) can be assigned one of multiple discrete values (levels). A full-factorial experiment design explores all possible combinations of factors and levels.
For instance, consider a banner with a title and an image. We are considering two different titles and three different images. A full-factorial experiment will compare all 2*3=6 possible combinations of title and image, to see which version of the banner performs the best.
In this example, we first run an exploratory batch to collect data on all possible combinations. Then we use empirical Bayes to model the data and shrink noisy estimates toward the mean. Next, we use Thompson Sampling to suggest a set of arms (combinations of factors and levels) on which to collect more data. We repeat the process until we have identified the best performing combination(s).
import sys
in_colab = 'google.colab' in sys.modules
if in_colab:
%pip install ax-platform
import numpy as np
import pandas as pd
import sklearn as skl
from typing import Dict, Optional, Tuple, Union
from ax import (
Arm,
ChoiceParameter,
Models,
ParameterType,
SearchSpace,
Experiment,
OptimizationConfig,
Objective,
)
from ax.plot.scatter import plot_fitted
from ax.utils.notebook.plotting import render, init_notebook_plotting
from ax.utils.stats.statstools import agresti_coull_sem
import plotly.io as pio
init_notebook_plotting()
if in_colab:
pio.renderers.default = "colab"
1. Define the search space
First, we define our search space. A factorial search space contains a ChoiceParameter for each factor, where the values of the parameter are its levels.
search_space = SearchSpace(
parameters=[
ChoiceParameter(
name="factor1",
parameter_type=ParameterType.STRING,
values=["level11", "level12", "level13"],
),
ChoiceParameter(
name="factor2",
parameter_type=ParameterType.STRING,
values=["level21", "level22"],
),
ChoiceParameter(
name="factor3",
parameter_type=ParameterType.STRING,
values=["level31", "level32", "level33", "level34"],
),
]
)
2. Define a custom metric
Second, we define a custom metric, which is responsible for computing the mean and standard error of a given arm.
In this example, each possible parameter value is given a coefficient. The higher the level, the higher the coefficient, and the higher the coefficients, the greater the mean.
The standard error of each arm is determined by the weight passed into the evaluation function, which represents the size of the population on which this arm was evaluated. The higher the weight, the greater the sample size, and thus the lower the standard error.
from ax import Data, Metric
from ax.utils.common.result import Ok
import pandas as pd
from random import random
one_hot_encoder = skl.preprocessing.OneHotEncoder(
categories=[par.values for par in search_space.parameters.values()],
)
class FactorialMetric(Metric):
def fetch_trial_data(self, trial):
records = []
for arm_name, arm in trial.arms_by_name.items():
params = arm.parameters
batch_size = 10000
noise_level = 0.0
weight = trial.normalized_arm_weights().get(arm, 1.0)
coefficients = np.array([0.1, 0.2, 0.3, 0.1, 0.2, 0.1, 0.2, 0.3, 0.4])
features = np.array(list(params.values())).reshape(1, -1)
encoded_features = one_hot_encoder.fit_transform(features)
z = (
coefficients @ encoded_features.T
+ np.sqrt(noise_level) * np.random.randn()
)
p = np.exp(z) / (1 + np.exp(z))
plays = np.random.binomial(batch_size, weight)
successes = np.random.binomial(plays, p)
records.append(
{
"arm_name": arm_name,
"metric_name": self.name,
"trial_index": trial.index,
"mean": float(successes) / plays,
"sem": agresti_coull_sem(successes, plays),
}
)
return Ok(value=Data(df=pd.DataFrame.from_records(records)))