Using Ax for Human-in-the-loop Experimentation¶¶

While Ax can be used in as a fully automated service, generating and deploying candidates Ax can be also used in a trial-by-trial fashion, allowing for human oversight.

Typically, human intervention in Ax is necessary when there are clear tradeoffs between multiple metrics of interest. Condensing multiple outcomes of interest into a single scalar quantity can be really challenging. Instead, it can be useful to specify an objective and constraints, and tweak these based on the information from the experiment.

To facilitate this, Ax provides the following key features:

Constrained optimization
Interfaces for easily modifying optimization goals
Utilities for visualizing and deploying new trials composed of multiple optimizations.

In this tutorial, we'll demonstrate how Ax enables users to explore these tradeoffs. With an understanding of the tradeoffs present in our data, we'll then make use of the constrained optimization utilities to generate candidates from multiple different optimization objectives, and create a conglomerate batch, with all of these candidates in together in one trial.

Experiment Setup¶

For this tutorial, we will assume our experiment has already been created.

In [1]:

import os

from ax import (
    Data,
    Metric,
    OptimizationConfig,
    Objective,
    OutcomeConstraint,
    ComparisonOp,
    json_load,
)
from ax.modelbridge.cross_validation import cross_validate
from ax.modelbridge.registry import Models
from ax.plot.diagnostic import tile_cross_validation
from ax.plot.scatter import plot_multiple_metrics, tile_fitted
from ax.utils.notebook.plotting import render, init_notebook_plotting

import pandas as pd

init_notebook_plotting()

[INFO 11-26 07:13:40] ax.utils.notebook.plotting: Injecting Plotly library into cell. Do not overwrite or delete cell.

[INFO 11-26 07:13:40] ax.utils.notebook.plotting: Please see
    (https://ax.dev/tutorials/visualizations.html#Fix-for-plots-that-are-not-rendering)
    if visualizations are not rendering.

NOTE: The path below assumes the tutorial is being run either from the root directory of the Ax package or from the human_in_the_loop directory that this tutorial lives in. This is needed since the jupyter notebooks may change active directory during runtime, making it tricky to find the file in a consistent way.

In [2]:

curr_dir = os.getcwd()
if "human_in_the_loop" not in curr_dir:
    curr_dir = os.path.join(curr_dir, "tutorials", "human_in_the_loop")
experiment = json_load.load_experiment(os.path.join(curr_dir, "hitl_exp.json"))

Initial Sobol Trial¶

Bayesian Optimization experiments almost always begin with a set of random points. In this experiment, these points were chosen via a Sobol sequence, accessible via the ModelBridge factory.

A collection of points run and analyzed together form a BatchTrial. A Trial object provides metadata pertaining to the deployment of these points, including details such as when they were deployed, and the current status of their experiment.

Here, we see an initial experiment has finished running (COMPLETED status).

In [3]:

experiment.trials[0]

Out[3]:

BatchTrial(experiment_name='human_in_the_loop_tutorial', index=0, status=TrialStatus.COMPLETED)

In [4]:

experiment.trials[0].time_created

Out[4]:

datetime.datetime(2019, 3, 29, 18, 10, 6)

In [5]:

# Number of arms in first experiment, including status_quo
len(experiment.trials[0].arms)

Out[5]:

In [6]:

# Sample arm configuration
experiment.trials[0].arms[0]

Out[6]:

Arm(name='0_0', parameters={'x_excellent': 0.9715802669525146, 'x_good': 0.8615524768829346, 'x_moderate': 0.7668091654777527, 'x_poor': 0.34871453046798706, 'x_unknown': 0.7675797343254089, 'y_excellent': 2.900710028409958, 'y_good': 1.5137152910232545, 'y_moderate': 0.6775947093963622, 'y_poor': 0.4974367544054985, 'y_unknown': 1.0852564811706542, 'z_excellent': 517803.49761247635, 'z_good': 607874.5171427727, 'z_moderate': 1151881.2023103237, 'z_poor': 2927449.2621421814, 'z_unknown': 2068407.6935052872})

Experiment Analysis¶

Optimization Config

An important construct for analyzing an experiment is an OptimizationConfig. An OptimizationConfig contains an objective, and outcome constraints. Experiment's can have a default OptimizationConfig, but models can also take an OptimizationConfig as input independent of the default.

Objective: A metric to optimize, along with a direction to optimize (default: maximize)

Outcome Constraint: A metric to constrain, along with a constraint direction (<= or >=), as well as a bound.

Let's start with a simple OptimizationConfig. By default, our objective metric will be maximized, but can be minimized by setting the minimize flag. Our outcome constraint will, by default, be evaluated as a relative percentage change. This percentage change is computed relative to the experiment's status quo arm.

In [7]:

experiment.status_quo

Out[7]:

Arm(name='status_quo', parameters={'x_excellent': 0, 'x_good': 0, 'x_moderate': 0, 'x_poor': 0, 'x_unknown': 0, 'y_excellent': 1, 'y_good': 1, 'y_moderate': 1, 'y_poor': 1, 'y_unknown': 1, 'z_excellent': 1000000, 'z_good': 1000000, 'z_moderate': 1000000, 'z_poor': 1000000, 'z_unknown': 1000000})

In [8]:

objective_metric = Metric(name="metric_1")
constraint_metric = Metric(name="metric_2")

experiment.optimization_config = OptimizationConfig(
    objective=Objective(objective_metric, minimize=False),
    outcome_constraints=[
        OutcomeConstraint(metric=constraint_metric, op=ComparisonOp.LEQ, bound=5),
    ],
)

Data

Another critical piece of analysis is data itself! Ax data follows a standard format, shown below. This format is imposed upon the underlying data structure, which is a Pandas DataFrame.

A key set of fields are required for all data, for use with Ax models.

It's a good idea to double check our data before fitting models -- let's make sure all of our expected metrics and arms are present.

In [9]:

data = Data(pd.read_json(os.path.join(curr_dir, "hitl_data.json")))
data.df.head()

Out[9]:

	arm_name	metric_name	mean	sem	start_time	end_time	n
0	0_1	metric_1	495.763048	2.621641	2019-03-30	2019-04-03	1599994
1	0_23	metric_1	524.367712	2.731647	2019-03-30	2019-04-03	1596356
2	0_14	metric_2	21.460244	0.069457	2019-03-30	2019-04-03	1600182
3	0_53	metric_2	21.437433	0.069941	2019-03-30	2019-04-03	1601081
4	0_53	metric_1	548.387691	2.893486	2019-03-30	2019-04-03	1601081

In [10]:

data.df["arm_name"].unique()

Out[10]:

array(['0_1', '0_23', '0_14', '0_53', '0_0', '0_54', '0_55', '0_56',
       '0_27', '0_57', '0_58', '0_13', '0_59', '0_6', '0_60', '0_61',
       '0_62', '0_63', '0_7', '0_28', '0_15', '0_16', '0_17', '0_18',
       '0_19', '0_29', '0_2', '0_20', '0_21', '0_22', '0_3', '0_30',
       '0_8', '0_10', '0_31', '0_24', '0_32', '0_33', '0_34', '0_35',
       '0_36', '0_37', '0_38', '0_9', '0_39', '0_4', '0_25', '0_11',
       '0_40', '0_41', '0_42', '0_43', '0_44', '0_45', 'status_quo',
       '0_46', '0_47', '0_48', '0_26', '0_49', '0_12', '0_5', '0_50',
       '0_51', '0_52'], dtype=object)

In [11]:

data.df["metric_name"].unique()

Out[11]:

array(['metric_1', 'metric_2'], dtype=object)

Search Space

The final component necessary for human-in-the-loop optimization is a SearchSpace. A SearchSpace defines the feasible region for our parameters, as well as their types.

Here, we have both parameters and a set of constraints on those parameters.

Without a SearchSpace, our models are unable to generate new candidates. By default, the models will read the search space off of the experiment, when they are told to generate candidates. SearchSpaces can also be specified by the user at this time. Sometimes, the first round of an experiment is too restrictive--perhaps the experimenter was too cautious when defining their initial ranges for exploration! In this case, it can be useful to generate candidates from new, expanded search spaces, beyond that specified in the experiment.

In [12]:

experiment.search_space.parameters

Out[12]:

{'x_excellent': RangeParameter(name='x_excellent', parameter_type=FLOAT, range=[0.0, 1.0]),
 'x_good': RangeParameter(name='x_good', parameter_type=FLOAT, range=[0.0, 1.0]),
 'x_moderate': RangeParameter(name='x_moderate', parameter_type=FLOAT, range=[0.0, 1.0]),
 'x_poor': RangeParameter(name='x_poor', parameter_type=FLOAT, range=[0.0, 1.0]),
 'x_unknown': RangeParameter(name='x_unknown', parameter_type=FLOAT, range=[0.0, 1.0]),
 'y_excellent': RangeParameter(name='y_excellent', parameter_type=FLOAT, range=[0.1, 3.0]),
 'y_good': RangeParameter(name='y_good', parameter_type=FLOAT, range=[0.1, 3.0]),
 'y_moderate': RangeParameter(name='y_moderate', parameter_type=FLOAT, range=[0.1, 3.0]),
 'y_poor': RangeParameter(name='y_poor', parameter_type=FLOAT, range=[0.1, 3.0]),
 'y_unknown': RangeParameter(name='y_unknown', parameter_type=FLOAT, range=[0.1, 3.0]),
 'z_excellent': RangeParameter(name='z_excellent', parameter_type=FLOAT, range=[50000.0, 5000000.0]),
 'z_good': RangeParameter(name='z_good', parameter_type=FLOAT, range=[50000.0, 5000000.0]),
 'z_moderate': RangeParameter(name='z_moderate', parameter_type=FLOAT, range=[50000.0, 5000000.0]),
 'z_poor': RangeParameter(name='z_poor', parameter_type=FLOAT, range=[50000.0, 5000000.0]),
 'z_unknown': RangeParameter(name='z_unknown', parameter_type=FLOAT, range=[50000.0, 5000000.0])}

In [13]:

experiment.search_space.parameter_constraints

Out[13]:

[OrderConstraint(x_poor <= x_moderate),
 OrderConstraint(x_moderate <= x_good),
 OrderConstraint(x_good <= x_excellent),
 OrderConstraint(y_poor <= y_moderate),
 OrderConstraint(y_moderate <= y_good),
 OrderConstraint(y_good <= y_excellent)]

Model Fit¶

Fitting a Modular BoTorch Model will allow us to predict new candidates based on our first Sobol batch. Here, we make use of the default settings for BOTORCH_MODULAR defined in the ModelBridge registry (uses BoTorch's SingleTaskGP and qLogNoisyExpectedImprovement by default for single objective optimization).

In [14]:

gp = Models.BOTORCH_MODULAR(
    search_space=experiment.search_space,
    experiment=experiment,
    data=data,
)

We can validate the model fits using cross validation, shown below for each metric of interest. Here, our model fits leave something to be desired--the tail ends of each metric are hard to model. In this situation, there are three potential actions to take:

Increase the amount of traffic in this experiment, to reduce the measurement noise.
Increase the number of points run in the random batch, to assist the GP in covering the space.
Reduce the number of parameters tuned at one time.

However, away from the tail effects, the fits do show a strong correlations, so we will proceed with candidate generation.

In [15]:

cv_result = cross_validate(gp)
render(tile_cross_validation(cv_result))

The parameters from the initial batch have a wide range of effects on the metrics of interest, as shown from the outcomes from our fitted GP model.

In [16]:

render(tile_fitted(gp, rel=True))

In [17]:

METRIC_X_AXIS = "metric_1"
METRIC_Y_AXIS = "metric_2"

render(
    plot_multiple_metrics(
        gp,
        metric_x=METRIC_X_AXIS,
        metric_y=METRIC_Y_AXIS,
    )
)

Candidate Generation¶

With our fitted GPEI model, we can optimize EI (Expected Improvement) based on any optimization config. We can start with our initial optimization config, and aim to simply maximize the playback smoothness, without worrying about the constraint on quality.

In [18]:

unconstrained = gp.gen(
    n=3,
    optimization_config=OptimizationConfig(
        objective=Objective(objective_metric, minimize=False),
    ),
)

Let's plot the tradeoffs again, but with our new arms.

In [19]:

render(
    plot_multiple_metrics(
        gp,
        metric_x=METRIC_X_AXIS,
        metric_y=METRIC_Y_AXIS,
        generator_runs_dict={
            "unconstrained": unconstrained,
        },
    )
)

Change Objectives¶

With our unconstrained optimization, we generate some candidates which are pretty promising with respect to our objective! However, there is a clear regression in our constraint metric, above our initial 5% desired constraint. Let's add that constraint back in.

In [20]:

constraint_5 = OutcomeConstraint(metric=constraint_metric, op=ComparisonOp.LEQ, bound=5)
constraint_5_results = gp.gen(
    n=3,
    optimization_config=OptimizationConfig(
        objective=Objective(objective_metric, minimize=False), outcome_constraints=[constraint_5]
    ),
)

This yields a GeneratorRun, which contains points according to our specified optimization config, along with metadata about how the points were generated. Let's plot the tradeoffs in these new points.

In [21]:

from ax.plot.scatter import plot_multiple_metrics

render(
    plot_multiple_metrics(
        gp,
        metric_x=METRIC_X_AXIS,
        metric_y=METRIC_Y_AXIS,
        generator_runs_dict={"constraint_5": constraint_5_results},
    )
)

It is important to note that the treatment of constraints in GP EI is probabilistic. The acquisition function weights our objective by the probability that each constraint is feasible. Thus, we may allow points with a very small probability of violating the constraint to be generated, as long as the chance of the points increasing our objective is high enough.

You can see above that the point estimate for each point is significantly below a 5% increase in the constraint metric, but that there is uncertainty in our prediction, and the tail probabilities do include probabilities of small regressions beyond 5%.

In [22]:

constraint_1 = OutcomeConstraint(metric=constraint_metric, op=ComparisonOp.LEQ, bound=1)
constraint_1_results = gp.gen(
    n=3,
    optimization_config=OptimizationConfig(
        objective=Objective(objective_metric, minimize=False),
        outcome_constraints=[constraint_1],
    ),
)

In [23]:

render(
    plot_multiple_metrics(
        gp,
        metric_x=METRIC_X_AXIS,
        metric_y=METRIC_Y_AXIS,
        generator_runs_dict={
            "constraint_1": constraint_1_results,
        },
    )
)

Finally, let's view all three sets of candidates together.

In [24]:

render(
    plot_multiple_metrics(
        gp,
        metric_x=METRIC_X_AXIS,
        metric_y=METRIC_Y_AXIS,
        generator_runs_dict={
            "unconstrained": unconstrained,
            "loose_constraint": constraint_5_results,
            "tight_constraint": constraint_1_results,
        },
    )
)

Creating a New Trial¶

Having done the analysis and candidate generation for three different optimization configs, we can easily create a new BatchTrial which combines the candidates from these three different optimizations. Each set of candidates looks promising -- the point estimates are higher along both metric values than in the previous batch. However, there is still a good bit of uncertainty in our predictions. It is hard to choose between the different constraint settings without reducing this noise, so we choose to run a new trial with all three constraint settings. However, we're generally convinced that the tight constraint is too conservative. We'd still like to reduce our uncertainty in that region, but we'll only take one arm from that set.

In [25]:

# We can add entire generator runs, when constructing a new trial.
trial = (
    experiment.new_batch_trial()
    .add_generator_run(unconstrained)
    .add_generator_run(constraint_5_results)
)

# Or, we can hand-pick arms.
trial.add_arm(constraint_1_results.arms[0])

Out[25]:

BatchTrial(experiment_name='human_in_the_loop_tutorial', index=1, status=TrialStatus.CANDIDATE)

The arms are combined into a single trial, along with the status_quo arm. Their generator can be accessed from the trial as well.

In [26]:

experiment.trials[1].arms

Out[26]:

[Arm(name='1_0', parameters={'x_excellent': 0.48826214165080095, 'x_good': 0.0, 'x_moderate': 0.0, 'x_poor': 0.0, 'x_unknown': 0.4668872365426297, 'y_excellent': 3.0, 'y_good': 1.3374200057255254, 'y_moderate': 1.337420005725662, 'y_poor': 0.539556297444796, 'y_unknown': 3.0, 'z_excellent': 5000000.0, 'z_good': 3740530.714798231, 'z_moderate': 3731779.997527201, 'z_poor': 3242062.279540094, 'z_unknown': 5000000.0}),
 Arm(name='1_1', parameters={'x_excellent': 0.20218087735270754, 'x_good': 0.0, 'x_moderate': 0.0, 'x_poor': 0.0, 'x_unknown': 0.9999999999999994, 'y_excellent': 2.211719399706172, 'y_good': 2.044611305619238, 'y_moderate': 0.10000000000000137, 'y_poor': 0.1, 'y_unknown': 3.0, 'z_excellent': 4999999.999999999, 'z_good': 2909894.0237980913, 'z_moderate': 50000.00000000079, 'z_poor': 1711888.9854065448, 'z_unknown': 1356682.73124646}),
 Arm(name='1_2', parameters={'x_excellent': 0.3675295918900718, 'x_good': 0.0, 'x_moderate': 0.0, 'x_poor': 1.490711812940942e-14, 'x_unknown': 1.0, 'y_excellent': 2.999999999999953, 'y_good': 2.3767691722649915, 'y_moderate': 2.3767691722659494, 'y_poor': 0.4575100130035388, 'y_unknown': 3.0, 'z_excellent': 5000000.0, 'z_good': 4188623.7415473685, 'z_moderate': 4999999.999999996, 'z_poor': 5000000.0, 'z_unknown': 5000000.0}),
 Arm(name='1_3', parameters={'x_excellent': 2.47350868652693e-13, 'x_good': 0.0, 'x_moderate': 5.696116065358675e-15, 'x_poor': 0.0, 'x_unknown': 0.4578159932180924, 'y_excellent': 0.8615763466689602, 'y_good': 0.1, 'y_moderate': 0.10000000000001733, 'y_poor': 0.1, 'y_unknown': 0.2615196743392876, 'z_excellent': 3097912.69374085, 'z_good': 1426724.0908285228, 'z_moderate': 5000000.0, 'z_poor': 4999999.9999998305, 'z_unknown': 3842543.121558456}),
 Arm(name='1_4', parameters={'x_excellent': 0.4901497351662576, 'x_good': 0.49014973516753, 'x_moderate': 0.030816411967883973, 'x_poor': 1.779231112666728e-14, 'x_unknown': 1.0, 'y_excellent': 2.9999999999999947, 'y_good': 2.0261481979067217, 'y_moderate': 0.33112445257887735, 'y_poor': 0.10000000008943544, 'y_unknown': 3.0, 'z_excellent': 5000000.0, 'z_good': 2738500.808637117, 'z_moderate': 50000.00011692844, 'z_poor': 4999999.9999942295, 'z_unknown': 3961174.5859369733}),
 Arm(name='1_5', parameters={'x_excellent': 0.4531672195357167, 'x_good': 0.45316721953571637, 'x_moderate': 0.29169575760236954, 'x_poor': 2.3668583842190956e-16, 'x_unknown': 1.0, 'y_excellent': 3.0, 'y_good': 0.752792073531142, 'y_moderate': 0.7527920735311411, 'y_poor': 0.7527920735311443, 'y_unknown': 2.9999999999999587, 'z_excellent': 5000000.0, 'z_good': 3526775.96558171, 'z_moderate': 5000000.0, 'z_poor': 5000000.0, 'z_unknown': 4999999.999999987}),
 Arm(name='1_6', parameters={'x_excellent': 0.0, 'x_good': 0.0, 'x_moderate': 1.3485147983048527e-15, 'x_poor': 0.0, 'x_unknown': 0.0, 'y_excellent': 0.10542159831986245, 'y_good': 0.10542159831937783, 'y_moderate': 0.10542159831941396, 'y_poor': 0.10542159831945444, 'y_unknown': 0.10000000000000193, 'z_excellent': 299258.46489304607, 'z_good': 1308857.2181188832, 'z_moderate': 4999999.99999923, 'z_poor': 4999999.999999946, 'z_unknown': 4951212.772547733})]

The original GeneratorRuns can be accessed from within the trial as well. This is useful for later analyses, allowing introspection of the OptimizationConfig used for generation (as well as other information, e.g. SearchSpace used for generation).

In [27]:

experiment.trials[1]._generator_run_structs

Out[27]:

[GeneratorRunStruct(generator_run=GeneratorRun(3 arms, total weight 3.0), weight=1.0),
 GeneratorRunStruct(generator_run=GeneratorRun(3 arms, total weight 3.0), weight=1.0),
 GeneratorRunStruct(generator_run=GeneratorRun(1 arms, total weight 1.0), weight=1.0)]

Here, we can see the unconstrained set-up used for our first set of candidates.

In [28]:

experiment.trials[1]._generator_run_structs[0].generator_run.optimization_config

Out[28]:

OptimizationConfig(objective=Objective(metric_name="metric_1", minimize=False), outcome_constraints=[])

Download Tutorial Archive (.tar.gz): Notebook, Source and Data

Download Tutorial Jupyter Notebook

Download Tutorial Source Code

Total runtime of script: 1 minutes, 30.27 seconds.