Closed-loop Optimization with Ax
Previously, we've demonstrated using Ax for ask-tell optimization, a
paradigm in which we "ask" Ax for candidate configurations and "tell" Ax our
observations. This can be effective in many scenerios, and it can be automated through
use of flow control statements like for
and while
loops. However there are some
situations where it would be beneficial to allow Ax to orchestrate the entire
optimization: deploying trials to external systems, polling their status, and reading
reading their results. This can be common in a number of real world engineering tasks,
including:
- Large scale machine learning experiments running workloads on high-performance computing clusters
- A/B tests conducted using an external experimentation platform
- Materials science optimizations utilizing a self-driving laboratory
Ax's Client
can orchestrate automated adaptive experiments like this using its method
run_trials
. Users create custom classes which implement Ax's IMetric
and IRunner
protocols to handle data fetching and trial deployment respectively. Then, users simply
configure their Client
as they would normally and call run_trials
; Ax will deploy
trials, fetch data, generate candidates, and repeat as necessary. Ax can manage complex
orchestration tasks including launching multiple trials in parallel while still
respecting a user-defined concurrency limit, and gracefully handling trial failure by
allowing the experiment to continue even if some trials do not complete successfully or
data fetching fails.
In this tutorial we will optimize the Hartmann6 function as before, but we will configure custom Runners and Metrics to mimic an external execution system. The Runner will calculate Hartmann6 with the appropriate parameters, write the result to a file, and tell Ax the trial is ready after 5 seconds. The Metric will find the appropriate file and report the results back to Ax.
Learning Objectives
- Learn when it can be appropriate and/or advantageous to run Ax in a closed-loop
- Configure custom Runners and Metrics, allowing Ax to deploy trials and fetch data automatically
- Understand tradeoffs between parallelism and optimization performance
Prerequisites
- Understanding of adaptive experimentation and Bayesian optimization
- Familiarity with configuring and conducting experiments in Ax
Step 1: Import Necessary Modules
First, ensure you have all the necessary imports:
import os
import time
from typing import Any, Mapping
import numpy as np
from ax.preview.api.client import Client
from ax.preview.api.configs import (
ExperimentConfig,
OrchestrationConfig,
ParameterType,
RangeParameterConfig,
)
from ax.preview.api.protocols.metric import IMetric
from ax.preview.api.protocols.runner import IRunner, TrialStatus
from ax.preview.api.types import TParameterization
Step 2: Defining our custom Runner and Metric
As stated before, we will be creating custom Runner and Metric classes to mimic an external system. Let's start by defining our Hartmann6 function as before.
# Hartmann6 function
def hartmann6(x1, x2, x3, x4, x5, x6):
alpha = np.array([1.0, 1.2, 3.0, 3.2])
A = np.array([
[10, 3, 17, 3.5, 1.7, 8],
[0.05, 10, 17, 0.1, 8, 14],
[3, 3.5, 1.7, 10, 17, 8],
[17, 8, 0.05, 10, 0.1, 14]
])
P = 10**-4 * np.array([
[1312, 1696, 5569, 124, 8283, 5886],
[2329, 4135, 8307, 3736, 1004, 9991],
[2348, 1451, 3522, 2883, 3047, 6650],
[4047, 8828, 8732, 5743, 1091, 381]
])
outer = 0.0
for i in range(4):
inner = 0.0
for j, x in enumerate([x1, x2, x3, x4, x5, x6]):
inner += A[i, j] * (x - P[i, j])**2
outer += alpha[i] * np.exp(-inner)
return -outer
hartmann6(0.1, 0.45, 0.8, 0.25, 0.552, 1.0)
Next, we will define the MockRunner
. The MockRunner
requires two methods:
run_trial
and poll_trial
.
run_trial
deploys a trial to the external system with the given parameters. In this
case, we will simply save a file containing the result of a call to the Hartmann6
function.
poll_trial
queries the external system to see if the trial has completed, failed, or
if it's still running. In this mock example, we will check to see how many seconds have
elapsed since the run_trial
was called and only report a trial as completed once 5
seconds have elapsed.
Runner's may also optionally implement a stop_trial
method to terminate a trial's
execution before it has completed. This is necessary for using
early stopping in closed-loop experimentation, but we will skip
this for now.
class MockRunner(IRunner):
def run_trial(
self, trial_index: int, parameterization: TParameterization
) -> dict[str, Any]:
file_name = f"{int(time.time())}.txt"
x1 = parameterization["x1"]
x2 = parameterization["x2"]
x3 = parameterization["x3"]
x4 = parameterization["x4"]
x5 = parameterization["x5"]
x6 = parameterization["x6"]
result = hartmann6(x1, x2, x3, x4, x5, x6)
with open(file_name, "w") as f:
f.write(f"{result}")
return {"file_name": file_name}
def poll_trial(
self, trial_index: int, trial_metadata: Mapping[str, Any]
) -> TrialStatus:
file_name = trial_metadata["file_name"]
time_elapsed = time.time() - int(file_name[:4])
if time_elapsed < 5:
return TrialStatus.RUNNING
return TrialStatus.COMPLETED
It's worthwhile to instantiate your Runner and test it is behaving as expected. Let's
deploy a mock trial by manually calling run_trial
and ensuring it creates a file.
runner = MockRunner()
trial_metadata = runner.run_trial(
trial_index=-1,
parameterization={
"x1": 0.1,
"x2": 0.45,
"x3": 0.8,
"x4": 0.25,
"x5": 0.552,
"x6": 1.0,
},
)
os.path.exists(trial_metadata["file_name"])
Now, we will implement the Metric. Metrics only need to implement a fetch
method,
which returns a progression value (i.e. a step in a timeseries) and an observation
value. Note that the observation can either be a simple float or a (mean, SEM) pair if
the external system can report observed noise.
In this case, we have neither a relevant progression value nor observed noise so we will
simply read the file and report (0, value)
.
class MockMetric(IMetric):
def fetch(
self,
trial_index: int,
trial_metadata: Mapping[str, Any],
) -> tuple[int, float | tuple[float, float]]:
file_name = trial_metadata["file_name"]
with open(file_name, 'r') as file:
value = float(file.readline())
return (0, value)
Again, let's validate the Metric created above by instantiating it and reporting the value from the file generated during testing of the Runner.
# Note: all Metrics must have a name. This will become relevant when attaching metrics to the Client
hartmann6_metric = MockMetric(name="hartmann6")
hartmann6_metric.fetch(trial_index=-1, trial_metadata=trial_metadata)
Step 3: Initialize the Client and Configure the Experiment
Finally, we can initialize the Client
and configure the experiment as before. This
will be familiar to readers of the Ask-tell optimization with Ax tutorial
-- the only difference is we will attach the previously defined Runner and Metric by
calling configure_runner
and configure_metrics
respectively.
Note that when initializing hartmann6_metric
we set name=hartmann6
, matching the
objective we now set in configure_optimization
. The configure_metrics
method uses
this name to ensure that data fetched by this Metric is used correctly during the
experiment. Be careful to correctly set the name of the Metric to reflect its use as an
objective or outcome constraint.
client = Client()
# Define six float parameters for the Hartmann6 function
parameters = [
RangeParameterConfig(
name=f"x{i + 1}", parameter_type=ParameterType.FLOAT, bounds=(0, 1)
)
for i in range(6)
]
# Create an experiment configuration
experiment_config = ExperimentConfig(
name="hartmann6_experiment",
parameters=parameters,
# The following arguments are optional
description="Optimization of the Hartmann6 function",
owner="developer",
)
# Apply the experiment configuration to the client
client.configure_experiment(experiment_config=experiment_config)
client.configure_optimization(objective="-hartmann6")
client.configure_runner(runner=runner)
client.configure_metrics(metrics=[hartmann6_metric])
Step 5: Run trials
Once the Client
has been configured, we can begin running trials.
Internally, Ax uses a class named Scheduler
to orchestrate the trial deployment,
polling, data fetching, and candidate generation.
The OrchestrationConfig
provides users with control over various orchestration
settings:
parallelism
defines the maximum number of trials that may be run at once. If your external system supports multiple evaluations in parallel, increasing this number can significantly decrease experimentation time. However, it is important to note that as parallelism increases, optimiztion performance often decreases. This is because adaptive experimentation methods rely on previously observed data for candidate generation -- the more tirals that have been observed prior to generation of a new candidate, the more accurate Ax's model will be for generation of that candidate.tolerated_trial_failure_rate
sets the proportion of trials are allowed to fail before Ax raises an Exception. Depending on how expensive a single trial is to evaluate or how unreliable trials are expected to be, the experimenter may want to be notified as soon as a single trial fails or they may not care until more than half the trials are failing. Set this value as is appropriate for your context.initial_seconds_between_polls
sets the frequency at which the status of a trial is checked and the results are attempted to be fetched. Set this to be low for trials that are expected to complete quickly or high for trials the are expected to take a long time.
orchestration_config = OrchestrationConfig(
parallelism=3,
tolerated_trial_failure_rate=0.1,
initial_seconds_between_polls=1,
)
client.run_trials(maximum_trials=30, options=orchestration_config)
Step 6: Analyze Results
As before, Ax can compute the best parameterization observed and produce a number of analyses to help interpret the results of the experiment.
It is also worth noting that the experiment can be resumed at any time using Ax's
storage functionality. When configured to use a SQL databse, the Client
saves a
snapshot of itself at various points throughout the call to run_trials
, making it
incredibly easy to continue optimization after an unexpected failure. You can learn more
about storage in Ax here.
best_parameters, prediction, index, name = client.get_best_parameterization()
print("Best Parameters:", best_parameters)
print("Prediction (mean, variance):", prediction)
client.compute_analyses()
Parallel Coordinates for hartmann6
View arm parameterizations with their respective metric values
Interaction Analysis for hartmann6
Understand an Experiment's data as one- or two-dimensional additive components with sparsity. Important components are visualized through slice or contour plots
Summary for hartmann6_experiment
High-level summary of the Trial
-s in this Experiment
trial_index | arm_name | trial_status | generation_method | generation_node | hartmann6 | x1 | x2 | x3 | x4 | x5 | x6 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | 0_0 | COMPLETED | Sobol | Sobol | -0.623644 | 0.363031 | 0.945295 | 0.340183 | 0.667994 | 0.603335 | 0.362126 |
1 | 1 | 1_0 | COMPLETED | Sobol | Sobol | -0.041114 | 0.65267 | 0.113505 | 0.992559 | 0.225429 | 0.006312 | 0.771671 |
2 | 2 | 2_0 | COMPLETED | Sobol | Sobol | -0.041114 | 0.834833 | 0.590151 | 0.091789 | 0.767372 | 0.384273 | 0.63648 |
3 | 3 | 3_0 | COMPLETED | Sobol | Sobol | -0.216396 | 0.180868 | 0.483954 | 0.743918 | 0.338968 | 0.975437 | 0.233522 |
4 | 4 | 4_0 | COMPLETED | Sobol | Sobol | -0.123834 | 0.047893 | 0.656932 | 0.766566 | 0.42441 | 0.831611 | 0.444829 |
5 | 5 | 5_0 | COMPLETED | BoTorch | MBM | -0.664324 | 0.262527 | 1 | 0.124567 | 0.859707 | 0.510453 | 0.174615 |
6 | 6 | 6_0 | COMPLETED | BoTorch | MBM | -1.25597 | 0.370259 | 1 | 0 | 0.839573 | 0.765258 | 0 |
7 | 7 | 7_0 | COMPLETED | BoTorch | MBM | -0.363433 | 0.32177 | 1 | 0.212705 | 0.893927 | 0.127175 | 0.29284 |
8 | 8 | 8_0 | COMPLETED | BoTorch | MBM | -0.001047 | 0.287625 | 1 | 0 | 0.992348 | 0.722755 | 0.689541 |
9 | 9 | 9_0 | COMPLETED | BoTorch | MBM | -1.92815 | 0.443413 | 1 | 0 | 0.731131 | 0.926524 | 0 |
10 | 10 | 10_0 | COMPLETED | BoTorch | MBM | -0.423365 | 0.425158 | 1 | 0.347627 | 1 | 0.880167 | 0 |
11 | 11 | 11_0 | COMPLETED | BoTorch | MBM | -2.10611 | 0.415318 | 1 | 0 | 0.429301 | 0.724823 | 0 |
12 | 12 | 12_0 | COMPLETED | BoTorch | MBM | -1.35167 | 0.593764 | 1 | 0 | 0.548924 | 1 | 0 |
13 | 13 | 13_0 | COMPLETED | BoTorch | MBM | -0.152752 | 0 | 1 | 0 | 0.542398 | 1 | 0 |
14 | 14 | 14_0 | COMPLETED | BoTorch | MBM | -0.006013 | 1 | 1 | 0 | 0.552464 | 1 | 0 |
15 | 15 | 15_0 | COMPLETED | BoTorch | MBM | -2.34749 | 0.447036 | 1 | 0 | 0.518058 | 1 | 0 |
16 | 16 | 16_0 | COMPLETED | BoTorch | MBM | -2.01618 | 0.445823 | 0.700704 | 0 | 0.519419 | 1 | 0 |
17 | 17 | 17_0 | COMPLETED | BoTorch | MBM | -0.000359 | 0.079446 | 0 | 0.236764 | 1 | 0 | 0 |
18 | 18 | 18_0 | COMPLETED | BoTorch | MBM | -1.55431 | 0.465478 | 1 | 0 | 0.464164 | 1 | 0.187063 |
19 | 19 | 19_0 | COMPLETED | BoTorch | MBM | -3.6e-05 | 0.473244 | 1 | 0 | 0.50927 | 0.349636 | 0 |
20 | 20 | 20_0 | COMPLETED | BoTorch | MBM | -3.6e-05 | 1 | 1 | 1 | 0 | 1 | 1 |
21 | 21 | 21_0 | COMPLETED | BoTorch | MBM | -2.28535 | 0.391745 | 1 | 0 | 0.473818 | 0.912146 | 0 |
22 | 22 | 22_0 | COMPLETED | BoTorch | MBM | -2.26593 | 0.432069 | 1 | 0 | 0.473391 | 0.898261 | 0 |
23 | 23 | 23_0 | COMPLETED | BoTorch | MBM | -2.27757 | 0.401187 | 1 | 0.256914 | 0.461642 | 0.908451 | 0 |
24 | 24 | 24_0 | COMPLETED | BoTorch | MBM | -0.162798 | 0.403391 | 1 | 0 | 0.048044 | 0.751597 | 0 |
25 | 25 | 25_0 | COMPLETED | BoTorch | MBM | -0.525992 | 0.233813 | 1 | 0 | 0.238657 | 0.537735 | 0 |
26 | 26 | 26_0 | COMPLETED | BoTorch | MBM | -0.084527 | 0.310534 | 1 | 0 | 0 | 0.594362 | 0 |
27 | 27 | 27_0 | COMPLETED | BoTorch | MBM | -2.5334 | 0.403422 | 1 | 0 | 0.55421 | 0.891376 | 0 |
28 | 28 | 28_0 | COMPLETED | BoTorch | MBM | -0.017965 | 0.075075 | 1 | 1 | 1 | 0 | 0.892241 |
29 | 29 | 29_0 | COMPLETED | BoTorch | MBM | -0.020962 | 0.47912 | 1 | 1 | 1 | 0 | 1 |
Cross Validation for hartmann6
Out-of-sample predictions using leave-one-out CV
Conclusion
This tutorial demonstrates how to use Ax's Client
for closed-loop optimization using
the Hartmann6 function as an example. This style of optimization is useful in scenarios
where trials are evaluated on some external system or when experimenters wish to take
advantage of parallel evaluation, trial failure handling, or simply to manage
long-running optimization tasks without human intervention. You can define your own
Runner and Metric classes to communicate with whatever external systems you wish to
interface with, and control optimization using the OrchestrationConfig
.