While Ax can be used in as a fully automated service, generating and deploying candidates Ax can be also used in a trial-by-trial fashion, allowing for human oversight.
Typically, human intervention in Ax is necessary when there are clear tradeoffs between multiple metrics of interest. Condensing multiple outcomes of interest into a single scalar quantity can be really challenging. Instead, it can be useful to specify an objective and constraints, and tweak these based on the information from the experiment.
To facilitate this, Ax provides the following key features:
In this tutorial, we'll demonstrate how Ax enables users to explore these tradeoffs. With an understanding of the tradeoffs present in our data, we'll then make use of the constrained optimization utilities to generate candidates from multiple different optimization objectives, and create a conglomerate batch, with all of these candidates in together in one trial.
For this tutorial, we will assume our experiment has already been created.
from ax import Data, Metric, OptimizationConfig, Objective, OutcomeConstraint, ComparisonOp, load
from ax.modelbridge.cross_validation import cross_validate
from ax.modelbridge.factory import get_GPEI
from ax.plot.diagnostic import tile_cross_validation
from ax.plot.scatter import plot_multiple_metrics, tile_fitted
from ax.utils.notebook.plotting import render, init_notebook_plotting
import pandas as pd
init_notebook_plotting()
[INFO 09-24 15:01:49] ipy_plotting: Injecting Plotly library into cell. Do not overwrite or delete cell.
experiment = load('hitl_exp.json')
Bayesian Optimization experiments almost always begin with a set of random points. In this experiment, these points were chosen via a Sobol sequence, accessible via the ModelBridge
factory.
A collection of points run and analyzed together form a BatchTrial
. A Trial
object provides metadata pertaining to the deployment of these points, including details such as when they were deployed, and the current status of their experiment.
Here, we see an initial experiment has finished running (COMPLETED status).
experiment.trials[0]
BatchTrial(experiment_name='human_in_the_loop_tutorial', index=0, status=TrialStatus.COMPLETED)
experiment.trials[0].time_created
datetime.datetime(2019, 3, 29, 18, 10, 6)
# Number of arms in first experiment, including status_quo
len(experiment.trials[0].arms)
65
# Sample arm configuration
experiment.trials[0].arms[0]
Arm(name='0_0', parameters={'x_excellent': 0.9715802669525146, 'x_good': 0.8615524768829346, 'x_moderate': 0.7668091654777527, 'x_poor': 0.34871453046798706, 'x_unknown': 0.7675797343254089, 'y_excellent': 2.900710028409958, 'y_good': 1.5137152910232545, 'y_moderate': 0.6775947093963622, 'y_poor': 0.4974367544054985, 'y_unknown': 1.0852564811706542, 'z_excellent': 517803.49761247635, 'z_good': 607874.5171427727, 'z_moderate': 1151881.2023103237, 'z_poor': 2927449.2621421814, 'z_unknown': 2068407.6935052872})
Optimization Config
An important construct for analyzing an experiment is an OptimizationConfig. An OptimizationConfig contains an objective, and outcome constraints. Experiment's can have a default OptimizationConfig, but models can also take an OptimizationConfig as input independent of the default.
Objective: A metric to optimize, along with a direction to optimize (default: maximize)
Outcome Constraint: A metric to constrain, along with a constraint direction (<= or >=), as well as a bound.
Let's start with a simple OptimizationConfig. By default, our objective metric will be maximized, but can be minimized by setting the minimize
flag. Our outcome constraint will, by default, be evaluated as a relative percentage change. This percentage change is computed relative to the experiment's status quo arm.
experiment.status_quo
Arm(name='status_quo', parameters={'x_excellent': 0.0, 'x_good': 0.0, 'x_moderate': 0.0, 'x_poor': 0.0, 'x_unknown': 0.0, 'y_excellent': 1.0, 'y_good': 1.0, 'y_moderate': 1.0, 'y_poor': 1.0, 'y_unknown': 1.0, 'z_excellent': 1000000.0, 'z_good': 1000000.0, 'z_moderate': 1000000.0, 'z_poor': 1000000.0, 'z_unknown': 1000000.0})
objective_metric = Metric(name="metric_1")
constraint_metric = Metric(name="metric_2")
experiment.optimization_config = OptimizationConfig(
objective=Objective(objective_metric),
outcome_constraints=[
OutcomeConstraint(metric=constraint_metric, op=ComparisonOp.LEQ, bound=5),
]
)
Data
Another critical piece of analysis is data itself! Ax data follows a standard format, shown below. This format is imposed upon the underlying data structure, which is a Pandas DataFrame.
A key set of fields are required for all data, for use with Ax models.
It's a good idea to double check our data before fitting models -- let's make sure all of our expected metrics and arms are present.
data = Data(pd.read_json('hitl_data.json'))
data.df.head()
arm_name | metric_name | mean | sem | trial_index | start_time | end_time | n | |
---|---|---|---|---|---|---|---|---|
0 | 0_1 | metric_1 | 495.763048 | 2.621641 | 0 | 2019-03-30 | 2019-04-03 | 1599994 |
1 | 0_23 | metric_1 | 524.367712 | 2.731647 | 0 | 2019-03-30 | 2019-04-03 | 1596356 |
2 | 0_14 | metric_2 | 21.460244 | 0.069457 | 0 | 2019-03-30 | 2019-04-03 | 1600182 |
3 | 0_53 | metric_2 | 21.437433 | 0.069941 | 0 | 2019-03-30 | 2019-04-03 | 1601081 |
4 | 0_53 | metric_1 | 548.387691 | 2.893486 | 0 | 2019-03-30 | 2019-04-03 | 1601081 |
data.df['arm_name'].unique()
array(['0_1', '0_23', '0_14', '0_53', '0_0', '0_54', '0_55', '0_56', '0_27', '0_57', '0_58', '0_13', '0_59', '0_6', '0_60', '0_61', '0_62', '0_63', '0_7', '0_28', '0_15', '0_16', '0_17', '0_18', '0_19', '0_29', '0_2', '0_20', '0_21', '0_22', '0_3', '0_30', '0_8', '0_10', '0_31', '0_24', '0_32', '0_33', '0_34', '0_35', '0_36', '0_37', '0_38', '0_9', '0_39', '0_4', '0_25', '0_11', '0_40', '0_41', '0_42', '0_43', '0_44', '0_45', 'status_quo', '0_46', '0_47', '0_48', '0_26', '0_49', '0_12', '0_5', '0_50', '0_51', '0_52'], dtype=object)
data.df['metric_name'].unique()
array(['metric_1', 'metric_2'], dtype=object)
Search Space
The final component necessary for human-in-the-loop optimization is a SearchSpace. A SearchSpace defines the feasible region for our parameters, as well as their types.
Here, we have both parameters and a set of constraints on those parameters.
Without a SearchSpace, our models are unable to generate new candidates. By default, the models will read the search space off of the experiment, when they are told to generate candidates. SearchSpaces can also be specified by the user at this time. Sometimes, the first round of an experiment is too restrictive--perhaps the experimenter was too cautious when defining their initial ranges for exploration! In this case, it can be useful to generate candidates from new, expanded search spaces, beyond that specified in the experiment.
experiment.search_space.parameters
{'x_excellent': RangeParameter(name='x_excellent', parameter_type=FLOAT, range=[0.0, 1.0]), 'x_good': RangeParameter(name='x_good', parameter_type=FLOAT, range=[0.0, 1.0]), 'x_moderate': RangeParameter(name='x_moderate', parameter_type=FLOAT, range=[0.0, 1.0]), 'x_poor': RangeParameter(name='x_poor', parameter_type=FLOAT, range=[0.0, 1.0]), 'x_unknown': RangeParameter(name='x_unknown', parameter_type=FLOAT, range=[0.0, 1.0]), 'y_excellent': RangeParameter(name='y_excellent', parameter_type=FLOAT, range=[0.1, 3.0]), 'y_good': RangeParameter(name='y_good', parameter_type=FLOAT, range=[0.1, 3.0]), 'y_moderate': RangeParameter(name='y_moderate', parameter_type=FLOAT, range=[0.1, 3.0]), 'y_poor': RangeParameter(name='y_poor', parameter_type=FLOAT, range=[0.1, 3.0]), 'y_unknown': RangeParameter(name='y_unknown', parameter_type=FLOAT, range=[0.1, 3.0]), 'z_excellent': RangeParameter(name='z_excellent', parameter_type=FLOAT, range=[50000.0, 5000000.0]), 'z_good': RangeParameter(name='z_good', parameter_type=FLOAT, range=[50000.0, 5000000.0]), 'z_moderate': RangeParameter(name='z_moderate', parameter_type=FLOAT, range=[50000.0, 5000000.0]), 'z_poor': RangeParameter(name='z_poor', parameter_type=FLOAT, range=[50000.0, 5000000.0]), 'z_unknown': RangeParameter(name='z_unknown', parameter_type=FLOAT, range=[50000.0, 5000000.0])}
experiment.search_space.parameter_constraints
[OrderConstraint(x_poor <= x_moderate), OrderConstraint(x_moderate <= x_good), OrderConstraint(x_good <= x_excellent), OrderConstraint(y_poor <= y_moderate), OrderConstraint(y_moderate <= y_good), OrderConstraint(y_good <= y_excellent)]
Fitting BoTorch's GPEI will allow us to predict new candidates based on our first Sobol batch. Here, we make use of the default settings for GP-EI defined in the ModelBridge factory.
gp = get_GPEI(
experiment=experiment,
data=data,
)
We can validate the model fits using cross validation, shown below for each metric of interest. Here, our model fits leave something to be desired--the tail ends of each metric are hard to model. In this situation, there are three potential actions to take:
However, away from the tail effects, the fits do show a strong correlations, so we will proceed with candidate generation.
cv_result = cross_validate(gp)
render(tile_cross_validation(cv_result))
/home/travis/virtualenv/python3.6.7/lib/python3.6/site-packages/plotly/tools.py:465: DeprecationWarning: plotly.tools.make_subplots is deprecated, please use plotly.subplots.make_subplots instead
The parameters from the initial batch have a wide range of effects on the metrics of interest, as shown from the outcomes from our fitted GP model.
render(tile_fitted(gp, rel=True))
METRIC_X_AXIS = 'metric_1'
METRIC_Y_AXIS = 'metric_2'
render(plot_multiple_metrics(
gp,
metric_x=METRIC_X_AXIS,
metric_y=METRIC_Y_AXIS,
))
With our fitted GPEI model, we can optimize EI (Expected Improvement) based on any optimization config. We can start with our initial optimization config, and aim to simply maximize the playback smoothness, without worrying about the constraint on quality.
unconstrained = gp.gen(
n=3,
optimization_config=OptimizationConfig(
objective=Objective(objective_metric),
)
)
Let's plot the tradeoffs again, but with our new arms.
render(plot_multiple_metrics(
gp,
metric_x=METRIC_X_AXIS,
metric_y=METRIC_Y_AXIS,
generator_runs_dict={
'unconstrained': unconstrained,
}
))
With our unconstrained optimization, we generate some candidates which are pretty promising with respect to our objective! However, there is a clear regression in our constraint metric, above our initial 5% desired constraint. Let's add that constraint back in.
constraint_5 = OutcomeConstraint(metric=constraint_metric, op=ComparisonOp.LEQ, bound=5)
constraint_5_results = gp.gen(
n=3,
optimization_config=OptimizationConfig(
objective=Objective(objective_metric),
outcome_constraints=[constraint_5]
)
)
This yields a GeneratorRun, which contains points according to our specified optimization config, along with metadata about how the points were generated. Let's plot the tradeoffs in these new points.
from ax.plot.scatter import plot_multiple_metrics
render(plot_multiple_metrics(
gp,
metric_x=METRIC_X_AXIS,
metric_y=METRIC_Y_AXIS,
generator_runs_dict={
'constraint_5': constraint_5_results
}
))
It is important to note that the treatment of constraints in GP EI is probabilistic. The acquisition function weights our objective by the probability that each constraint is feasible. Thus, we may allow points with a very small probability of violating the constraint to be generated, as long as the chance of the points increasing our objective is high enough.
You can see above that the point estimate for each point is significantly below a 5% increase in the constraint metric, but that there is uncertainty in our prediction, and the tail probabilities do include probabilities of small regressions beyond 5%.
constraint_1 = OutcomeConstraint(metric=constraint_metric, op=ComparisonOp.LEQ, bound=1)
constraint_1_results = gp.gen(
n=3,
optimization_config=OptimizationConfig(
objective=Objective(objective_metric),
outcome_constraints=[constraint_1],
)
)
render(plot_multiple_metrics(
gp,
metric_x=METRIC_X_AXIS,
metric_y=METRIC_Y_AXIS,
generator_runs_dict={
"constraint_1": constraint_1_results,
}
))
Finally, let's view all three sets of candidates together.
render(plot_multiple_metrics(
gp,
metric_x=METRIC_X_AXIS,
metric_y=METRIC_Y_AXIS,
generator_runs_dict={
'unconstrained': unconstrained,
'loose_constraint': constraint_5_results,
'tight_constraint': constraint_1_results,
}
))
Having done the analysis and candidate generation for three different optimization configs, we can easily create a new BatchTrial
which combines the candidates from these three different optimizations. Each set of candidates looks promising -- the point estimates are higher along both metric values than in the previous batch. However, there is still a good bit of uncertainty in our predictions. It is hard to choose between the different constraint settings without reducing this noise, so we choose to run a new trial with all three constraint settings. However, we're generally convinced that the tight constraint is too conservative. We'd still like to reduce our uncertainty in that region, but we'll only take one arm from that set.
# We can add entire generator runs, when constructing a new trial.
trial = experiment.new_batch_trial().add_generator_run(unconstrained).add_generator_run(constraint_5_results)
# Or, we can hand-pick arms.
trial.add_arm(constraint_1_results.arms[0])
BatchTrial(experiment_name='human_in_the_loop_tutorial', index=1, status=TrialStatus.CANDIDATE)
The arms are combined into a single trial, along with the status_quo
arm. Their generator can be accessed from the trial as well.
experiment.trials[1].arms
[Arm(name='1_0', parameters={'x_excellent': 0.6796446006388114, 'x_good': 0.309455588177183, 'x_moderate': 0.22656864512816965, 'x_poor': 0.10283508459738865, 'x_unknown': 0.756225112165778, 'y_excellent': 2.5536032309718744, 'y_good': 1.7626894812820295, 'y_moderate': 0.8260249058354127, 'y_poor': 0.2977495287133997, 'y_unknown': 1.9046422579563558, 'z_excellent': 4804333.513507008, 'z_good': 3665351.4463780685, 'z_moderate': 2515319.5985970176, 'z_poor': 1417823.4104254819, 'z_unknown': 2422771.531281581}), Arm(name='1_1', parameters={'x_excellent': 0.7235008940734583, 'x_good': 0.3701801354786402, 'x_moderate': 0.3220609328611533, 'x_poor': 0.12504353803447452, 'x_unknown': 0.5659965952696246, 'y_excellent': 2.5910129023476824, 'y_good': 1.9636215162247057, 'y_moderate': 1.0418116234361752, 'y_poor': 0.3122184832190004, 'y_unknown': 1.1384415894179978, 'z_excellent': 4906712.143641797, 'z_good': 2952818.1248956546, 'z_moderate': 3031866.831498524, 'z_poor': 2560395.95510768, 'z_unknown': 1685405.879485974}), Arm(name='1_2', parameters={'x_excellent': 0.7319843904942936, 'x_good': 0.2561379518851539, 'x_moderate': 0.1768925034414852, 'x_poor': 0.11754688651053978, 'x_unknown': 0.9417891397690281, 'y_excellent': 2.438255129181829, 'y_good': 1.7016319815938767, 'y_moderate': 0.5087601466637882, 'y_poor': 0.30987035812354036, 'y_unknown': 2.170822481561465, 'z_excellent': 4479488.062742735, 'z_good': 4214905.255781805, 'z_moderate': 2066164.9583266906, 'z_poor': 726863.6251030776, 'z_unknown': 3299792.7141880407}), Arm(name='1_3', parameters={'x_excellent': 0.7794520494833224, 'x_good': 0.546295138904969, 'x_moderate': 0.3404599863808334, 'x_poor': 0.0979924824575367, 'x_unknown': 0.690896450294075, 'y_excellent': 2.656340221051597, 'y_good': 1.7627824809161263, 'y_moderate': 0.8547932176462213, 'y_poor': 0.3125808544481243, 'y_unknown': 1.6743337829055007, 'z_excellent': 4912315.747733441, 'z_good': 3394268.904448871, 'z_moderate': 2966808.8470675526, 'z_poor': 1851958.386457277, 'z_unknown': 2334068.4358693194}), Arm(name='1_4', parameters={'x_excellent': 0.8238687151954194, 'x_good': 0.4967794660253842, 'x_moderate': 0.37905765773321926, 'x_poor': 0.17940571956619872, 'x_unknown': 0.6754543084692638, 'y_excellent': 2.6613722451373554, 'y_good': 2.0755672372804823, 'y_moderate': 1.043571274532471, 'y_poor': 0.5524376110930402, 'y_unknown': 1.665976165852893, 'z_excellent': 4696717.051127296, 'z_good': 4294690.88139889, 'z_moderate': 3763823.0086937556, 'z_poor': 2186028.4201497985, 'z_unknown': 3235914.5585285826}), Arm(name='1_5', parameters={'x_excellent': 0.6536956792158667, 'x_good': 0.5124643406569351, 'x_moderate': 0.379318194871776, 'x_poor': 0.04454310397587646, 'x_unknown': 0.48367669714273287, 'y_excellent': 2.461458818727877, 'y_good': 1.7222237556107427, 'y_moderate': 0.9312574223716872, 'y_poor': 0.1750424397155767, 'y_unknown': 1.335401303021846, 'z_excellent': 4609794.801494116, 'z_good': 2717486.5434763157, 'z_moderate': 2245787.0711005623, 'z_poor': 3231815.3496198165, 'z_unknown': 1390123.8968916142}), Arm(name='1_6', parameters={'x_excellent': 0.7861542369985202, 'x_good': 0.6944957994393497, 'x_moderate': 0.4457951891437471, 'x_poor': 0.03536765266590938, 'x_unknown': 0.7393821143713697, 'y_excellent': 2.720118138622808, 'y_good': 1.8460871647292574, 'y_moderate': 0.8005199179770008, 'y_poor': 0.14783627633242566, 'y_unknown': 1.409873656369108, 'z_excellent': 4474391.874313782, 'z_good': 2761986.470393988, 'z_moderate': 1951399.773361276, 'z_poor': 2812881.1084153103, 'z_unknown': 1665050.672500508}), Arm(name='status_quo', parameters={'x_excellent': 0.0, 'x_good': 0.0, 'x_moderate': 0.0, 'x_poor': 0.0, 'x_unknown': 0.0, 'y_excellent': 1.0, 'y_good': 1.0, 'y_moderate': 1.0, 'y_poor': 1.0, 'y_unknown': 1.0, 'z_excellent': 1000000.0, 'z_good': 1000000.0, 'z_moderate': 1000000.0, 'z_poor': 1000000.0, 'z_unknown': 1000000.0})]
The original GeneratorRuns
can be accessed from within the trial as well. This is useful for later analyses, allowing introspection of the OptimizationConfig
used for generation (as well as other information, e.g. SearchSpace
used for generation).
experiment.trials[1]._generator_run_structs
[GeneratorRunStruct(generator_run=GeneratorRun(3 arms, total weight 3.0), weight=1.0), GeneratorRunStruct(generator_run=GeneratorRun(3 arms, total weight 3.0), weight=1.0), GeneratorRunStruct(generator_run=GeneratorRun(1 arms, total weight 1.0), weight=1.0)]
Here, we can see the unconstrained set-up used for our first set of candidates.
experiment.trials[1]._generator_run_structs[0].generator_run.optimization_config
OptimizationConfig(objective=Objective(metric_name="metric_1", minimize=False), outcome_constraints=[])
Total runtime of script: 13 minutes, 21.25 seconds.