Version: Next

Automating Orchestration

Previously, we've demonstrated using Ax for ask-tell optimization, a paradigm in which we "ask" Ax for candidate configurations and "tell" Ax our observations. This can be effective in many scenerios, and it can be automated through use of flow control statements like for and while loops. However there are some situations where it would be beneficial to allow Ax to orchestrate the entire optimization: deploying trials to external systems, polling their status, and reading reading their results. This can be common in a number of real world engineering tasks, including:

Large scale machine learning experiments running workloads on high-performance computing clusters
A/B tests conducted using an external experimentation platform
Materials science optimizations utilizing a self-driving laboratory

Ax's Client can orchestrate automated adaptive experiments like this using its method run_trials. Users create custom classes which implement Ax's IMetric and IRunner protocols to handle data fetching and trial deployment respectively. Then, users simply configure their Client as they would normally and call run_trials; Ax will deploy trials, fetch data, generate candidates, and repeat as necessary. Ax can manage complex orchestration tasks including launching multiple trials in parallel while still respecting a user-defined concurrency limit, and gracefully handling trial failure by allowing the experiment to continue even if some trials do not complete successfully or data fetching fails.

In this tutorial we will optimize the Hartmann6 function as before, but we will configure custom Runners and Metrics to mimic an external execution system. The Runner will calculate Hartmann6 with the appropriate parameters, write the result to a file, and tell Ax the trial is ready after 5 seconds. The Metric will find the appropriate file and report the results back to Ax.

Learning Objectives

Learn when it can be appropriate and/or advantageous to run Ax in a closed-loop
Configure custom Runners and Metrics, allowing Ax to deploy trials and fetch data automatically
Understand tradeoffs between parallelism and optimization performance

Prerequisites

Understanding of adaptive experimentation and Bayesian optimization
Familiarity with configuring and conducting experiments in Ax

Step 1: Import Necessary Modules

First, ensure you have all the necessary imports:

import os
import time
from typing import Any, Mapping

import numpy as np
from ax.api.client import Client
from ax.api.configs import RangeParameterConfig
from ax.api.protocols.metric import IMetric
from ax.api.protocols.runner import IRunner, TrialStatus
from ax.api.types import TParameterization

Step 2: Defining our custom Runner and Metric

As stated before, we will be creating custom Runner and Metric classes to mimic an external system. Let's start by defining our Hartmann6 function as before.

# Hartmann6 function
def hartmann6(x1, x2, x3, x4, x5, x6):
    alpha = np.array([1.0, 1.2, 3.0, 3.2])
    A = np.array([
        [10, 3, 17, 3.5, 1.7, 8],
        [0.05, 10, 17, 0.1, 8, 14],
        [3, 3.5, 1.7, 10, 17, 8],
        [17, 8, 0.05, 10, 0.1, 14]
    ])
    P = 10**-4 * np.array([
        [1312, 1696, 5569, 124, 8283, 5886],
        [2329, 4135, 8307, 3736, 1004, 9991],
        [2348, 1451, 3522, 2883, 3047, 6650],
        [4047, 8828, 8732, 5743, 1091, 381]
    ])

    outer = 0.0
    for i in range(4):
        inner = 0.0
        for j, x in enumerate([x1, x2, x3, x4, x5, x6]):
            inner += A[i, j] * (x - P[i, j])**2
        outer += alpha[i] * np.exp(-inner)
    return -outer

hartmann6(0.1, 0.45, 0.8, 0.25, 0.552, 1.0)

Output:
np.float64(-0.4878737485613134)

Next, we will define the MockRunner. The MockRunner requires two methods: run_trial and poll_trial.

run_trial deploys a trial to the external system with the given parameters. In this case, we will simply save a file containing the result of a call to the Hartmann6 function.

poll_trial queries the external system to see if the trial has completed, failed, or if it's still running. In this mock example, we will check to see how many seconds have elapsed since the run_trial was called and only report a trial as completed once 5 seconds have elapsed.

Runner's may also optionally implement a stop_trial method to terminate a trial's execution before it has completed. This is necessary for using early stopping in closed-loop experimentation, but we will skip this for now.

class MockRunner(IRunner):
    def run_trial(
        self, trial_index: int, parameterization: TParameterization
    ) -> dict[str, Any]:
        file_name = f"{int(time.time())}.txt"

        x1 = parameterization["x1"]
        x2 = parameterization["x2"]
        x3 = parameterization["x3"]
        x4 = parameterization["x4"]
        x5 = parameterization["x5"]
        x6 = parameterization["x6"]

        result = hartmann6(x1, x2, x3, x4, x5, x6)

        with open(file_name, "w") as f:
            f.write(f"{result}")

        return {"file_name": file_name}

    def poll_trial(
        self, trial_index: int, trial_metadata: Mapping[str, Any]
    ) -> TrialStatus:
        file_name = trial_metadata["file_name"]
        time_elapsed = time.time() - int(file_name[:4])

        if time_elapsed < 5:
            return TrialStatus.RUNNING

        return TrialStatus.COMPLETED

It's worthwhile to instantiate your Runner and test it is behaving as expected. Let's deploy a mock trial by manually calling run_trial and ensuring it creates a file.

runner = MockRunner()

trial_metadata = runner.run_trial(
    trial_index=-1,
    parameterization={
        "x1": 0.1,
        "x2": 0.45,
        "x3": 0.8,
        "x4": 0.25,
        "x5": 0.552,
        "x6": 1.0,
    },
)

os.path.exists(trial_metadata["file_name"])

Output:

True

Now, we will implement the Metric. Metrics only need to implement a fetch method, which returns a progression value (i.e. a step in a timeseries) and an observation value. Note that the observation can either be a simple float or a (mean, SEM) pair if the external system can report observed noise.

In this case, we have neither a relevant progression value nor observed noise so we will simply read the file and report (0, value).

class MockMetric(IMetric):
    def fetch(
        self,
        trial_index: int,
        trial_metadata: Mapping[str, Any],
    ) -> tuple[int, float | tuple[float, float]]:
        file_name = trial_metadata["file_name"]

        with open(file_name, 'r') as file:
            value = float(file.readline())
            return (0, value)

Again, let's validate the Metric created above by instantiating it and reporting the value from the file generated during testing of the Runner.

# Note: all Metrics must have a name. This will become relevant when attaching metrics to the Client
hartmann6_metric = MockMetric(name="hartmann6")

hartmann6_metric.fetch(trial_index=-1, trial_metadata=trial_metadata)

Output:
(0, -0.4878737485613134)

Step 3: Initialize the Client and Configure the Experiment

Finally, we can initialize the Client and configure the experiment as before. This will be familiar to readers of the Getting Started with Ax tutorial -- the only difference is we will attach the previously defined Runner and Metric by calling configure_runner and configure_metrics respectively.

Note that when initializing hartmann6_metric we set name=hartmann6, matching the objective we now set in configure_optimization. The configure_metrics method uses this name to ensure that data fetched by this Metric is used correctly during the experiment. Be careful to correctly set the name of the Metric to reflect its use as an objective or outcome constraint.

client = Client()
# Define six float parameters for the Hartmann6 function
parameters = [
    RangeParameterConfig(name=f"x{i + 1}", parameter_type="float", bounds=(0, 1))
    for i in range(6)
]

client.configure_experiment(
    parameters=parameters,
    # The following arguments are only necessary when saving to the DB
    name="hartmann6_experiment",
    description="Optimization of the Hartmann6 function",
    owner="developer",
)
client.configure_optimization(objective="-hartmann6")

client.configure_runner(runner=runner)
client.configure_metrics(metrics=[hartmann6_metric])

Step 5: Run trials

Once the Client has been configured, we can begin running trials.

Internally, Ax uses a class named Scheduler to orchestrate the trial deployment, polling, data fetching, and candidate generation.

Scheduler state machine

The run_trials method provides users with control over various orchestration settings as well as the total maximum number of trials to evaluate:

parallelism defines the maximum number of trials that may be run at once. If your external system supports multiple evaluations in parallel, increasing this number can significantly decrease experimentation time. However, it is important to note that as parallelism increases, optimiztion performance often decreases. This is because adaptive experimentation methods rely on previously observed data for candidate generation -- the more tirals that have been observed prior to generation of a new candidate, the more accurate Ax's model will be for generation of that candidate.
tolerated_trial_failure_rate sets the proportion of trials are allowed to fail before Ax raises an Exception. Depending on how expensive a single trial is to evaluate or how unreliable trials are expected to be, the experimenter may want to be notified as soon as a single trial fails or they may not care until more than half the trials are failing. Set this value as is appropriate for your context.
initial_seconds_between_polls sets the frequency at which the status of a trial is checked and the results are attempted to be fetched. Set this to be low for trials that are expected to complete quickly or high for trials the are expected to take a long time.

client.run_trials(
    max_trials=30,
    parallelism=3,
    tolerated_trial_failure_rate=0.1,
    initial_seconds_between_polls=1,
)

Output:
[INFO 08-18 05:07:18] ax.api.client: GenerationStrategy(name='Center+Sobol+MBM:fast', nodes=[CenterGenerationNode(next_node_name='Sobol'), GenerationNode(node_name='Sobol', generator_specs=[GeneratorSpec(generator_enum=Sobol, model_key_override=None)], transition_criteria=[MinTrials(transition_to='MBM'), MinTrials(transition_to='MBM')]), GenerationNode(node_name='MBM', generator_specs=[GeneratorSpec(generator_enum=BoTorch, model_key_override=None)], transition_criteria=[])]) chosen based on user input and problem structure.
[INFO 08-18 05:07:18] Orchestrator: Orchestrator requires experiment to have immutable search space and optimization config. Setting property immutable_search_space_and_opt_config to True on experiment.
[INFO 08-18 05:07:18] Orchestrator: Running trials [0]...
[INFO 08-18 05:07:19] Orchestrator: Running trials [1]...
[INFO 08-18 05:07:20] Orchestrator: Running trials [2]...
[INFO 08-18 05:07:21] Orchestrator: Retrieved COMPLETED trials: 0 - 2.
[INFO 08-18 05:07:21] Orchestrator: Running trials [3]...
[INFO 08-18 05:07:22] Orchestrator: Running trials [4]...
[INFO 08-18 05:07:24] Orchestrator: Running trials [5]...
[INFO 08-18 05:07:25] Orchestrator: Retrieved COMPLETED trials: 3 - 5.
[INFO 08-18 05:07:26] Orchestrator: Running trials [6]...
[INFO 08-18 05:07:28] Orchestrator: Running trials [7]...
[INFO 08-18 05:07:30] Orchestrator: Running trials [8]...
[INFO 08-18 05:07:31] Orchestrator: Retrieved COMPLETED trials: 6 - 8.
[INFO 08-18 05:07:32] Orchestrator: Running trials [9]...
[INFO 08-18 05:07:34] Orchestrator: Running trials [10]...
[INFO 08-18 05:07:35] Orchestrator: Running trials [11]...
[INFO 08-18 05:07:36] Orchestrator: Retrieved COMPLETED trials: 9 - 11.
[INFO 08-18 05:07:37] Orchestrator: Running trials [12]...
[INFO 08-18 05:07:39] Orchestrator: Running trials [13]...
[INFO 08-18 05:07:41] Orchestrator: Running trials [14]...
[INFO 08-18 05:07:42] Orchestrator: Retrieved COMPLETED trials: 12 - 14.
[INFO 08-18 05:07:43] Orchestrator: Running trials [15]...
[INFO 08-18 05:07:45] Orchestrator: Running trials [16]...
[INFO 08-18 05:07:48] Orchestrator: Running trials [17]...
[INFO 08-18 05:07:49] Orchestrator: Retrieved COMPLETED trials: 15 - 17.
[INFO 08-18 05:07:50] Orchestrator: Running trials [18]...
[INFO 08-18 05:07:52] Orchestrator: Running trials [19]...
[INFO 08-18 05:07:54] Orchestrator: Running trials [20]...
[INFO 08-18 05:07:55] Orchestrator: Retrieved COMPLETED trials: 18 - 20.
[INFO 08-18 05:07:57] Orchestrator: Running trials [21]...
[INFO 08-18 05:07:59] Orchestrator: Running trials [22]...
[INFO 08-18 05:08:01] Orchestrator: Running trials [23]...
[INFO 08-18 05:08:02] Orchestrator: Retrieved COMPLETED trials: 21 - 23.
[INFO 08-18 05:08:03] Orchestrator: Running trials [24]...
[INFO 08-18 05:08:05] Orchestrator: Running trials [25]...
[INFO 08-18 05:08:07] Orchestrator: Running trials [26]...
[INFO 08-18 05:08:08] Orchestrator: Retrieved COMPLETED trials: 24 - 26.
[INFO 08-18 05:08:10] Orchestrator: Running trials [27]...
[INFO 08-18 05:08:12] Orchestrator: Running trials [28]...
[INFO 08-18 05:08:14] Orchestrator: Running trials [29]...
[INFO 08-18 05:08:15] Orchestrator: Retrieved COMPLETED trials: 27 - 29.

Step 6: Analyze Results

As before, Ax can compute the best parameterization observed and produce a number of analyses to help interpret the results of the experiment.

It is also worth noting that the experiment can be resumed at any time using Ax's storage functionality. When configured to use a SQL databse, the Client saves a snapshot of itself at various points throughout the call to run_trials, making it incredibly easy to continue optimization after an unexpected failure. You can learn more about storage in Ax here.

best_parameters, prediction, index, name = client.get_best_parameterization()
print("Best Parameters:", best_parameters)
print("Prediction (mean, variance):", prediction)

Output:
Best Parameters: {'x1': 0.39824467120760615, 'x2': 0.8872687099380296, 'x3': 1.0, 'x4': 0.5397953006871664, 'x5': 0.0, 'x6': 0.0}
Prediction (mean, variance): {'hartmann6': (np.float64(-3.0895225878605928), np.float64(0.003198371033371525))}

# display=True instructs Ax to sort then render the resulting analyses
cards = client.compute_analyses(display=True)

Modeled Arm Effects on hartmann6

Modeled effects on hartmann6. This plot visualizes predictions of the true metric changes for each arm based on Ax's model. This is the expected delta you would expect if you (re-)ran that arm. This plot helps in anticipating the outcomes and performance of arms based on the model's predictions. Note, flat predictions across arms indicate that the model predicts that there is no effect, meaning if you were to re-run the experiment, the delta you would see would be small and fall within the confidence interval indicated in the plot.

Observed Arm Effects on hartmann6

Observed effects on hartmann6. This plot visualizes the effects from previously-run arms on a specific metric, providing insights into their performance. This plot allows one to compare and contrast the effectiveness of different arms, highlighting which configurations have yielded the most favorable outcomes.

Summary for hartmann6_experiment

High-level summary of the Trial-s in this Experiment

	trial_index	arm_name	trial_status	generation_node	hartmann6	x1	x2	x3	x4	x5	x6
0	0	0_0	COMPLETED	CenterOfSearchSpace	-0.505315	0.5	0.5	0.5	0.5	0.5	0.5
1	1	1_0	COMPLETED	Sobol	-0.005011	0.090748	0.151117	0.014209	0.767552	0.924322	0.101407
2	2	2_0	COMPLETED	Sobol	-0.103405	0.779016	0.840932	0.846303	0.144846	0.464855	0.958485
3	3	3_0	COMPLETED	Sobol	-0.275222	0.541717	0.28781	0.372133	0.728051	0.209724	0.670789
4	4	4_0	COMPLETED	Sobol	-0.755523	0.355397	0.719807	0.516608	0.355643	0.679941	0.27762
5	5	5_0	COMPLETED	MBM	-0.143505	0.693978	0.319817	0.672351	0.507557	0.583086	0.425054
6	6	6_0	COMPLETED	MBM	-0.160829	0.326616	1	0.576313	0.188079	0.536092	0.454957
7	7	7_0	COMPLETED	MBM	-2.76621	0.364672	0.877363	0.557835	0.536094	1	0
8	8	8_0	COMPLETED	MBM	-0.154235	0.316758	0.474515	0.560073	0.003	0.628883	0.158524
9	9	9_0	COMPLETED	MBM	-0.488493	0.356741	1	0.578899	0.977651	1	0
10	10	10_0	COMPLETED	MBM	-0.059938	0.858832	1	0.537596	0.414069	1	0
11	11	11_0	COMPLETED	MBM	-0.078029	0	1	0.575738	0.387144	1	0.286465
12	12	12_0	COMPLETED	MBM	-2.58152	0.406559	1	0.644129	0.557165	1	0
13	13	13_0	COMPLETED	MBM	-2.83368	0.38309	0.882385	0.394339	0.556904	1	0
14	14	14_0	COMPLETED	MBM	-1.52658	0.345351	0.613724	0.649747	0.55691	1	0
15	15	15_0	COMPLETED	MBM	-2.3978	0.315452	0.920556	0.090658	0.608207	1	0
16	16	16_0	COMPLETED	MBM	-2.81374	0.430512	0.868223	0.861948	0.614334	1	0
17	17	17_0	COMPLETED	MBM	-2.91594	0.367823	0.90478	0.298906	0.599298	0.59356	0
18	18	18_0	COMPLETED	MBM	-3.06681	0.418949	0.877011	0.551655	0.610534	0.025063	0
19	19	19_0	COMPLETED	MBM	-0.458253	0.415256	0.887276	0.53062	0.630286	0.483662	0.418377
20	20	20_0	COMPLETED	MBM	-2.51259	0.489588	0.859149	0.352584	0.634652	0.778441	0
21	21	21_0	COMPLETED	MBM	-3.09031	0.398245	0.887269	1	0.539795	0	0
22	22	22_0	COMPLETED	MBM	-0.002505	0.39137	0.251688	0	0.015747	0.9317	0.842261
23	23	23_0	COMPLETED	MBM	-2.94459	0.409782	0.884525	0	0.525864	0	0
24	24	24_0	COMPLETED	MBM	-0.00042	0	0	1	1	1	1
25	25	25_0	COMPLETED	MBM	-0.504165	0	0.574153	1	1	0	1
26	26	26_0	COMPLETED	MBM	-0.025286	0	1	1	0	0	1
27	27	27_0	COMPLETED	MBM	-1.8e-05	1	1	0	1	0	1
28	28	28_0	COMPLETED	MBM	-0.126429	1	0	1	0	0	1
29	29	29_0	COMPLETED	MBM	-1.6e-05	1	0	0	0.606515	1	1

Sensitivity Analysis for hartmann6

Understand how each parameter affects hartmann6 according to a second-order sensitivity analysis.

x2 vs. hartmann6

The slice plot provides a one-dimensional view of predicted outcomes for hartmann6 as a function of a single parameter, while keeping all other parameters fixed at their status_quo value (or mean value if status_quo is unavailable). This visualization helps in understanding the sensitivity and impact of changes in the selected parameter on the predicted metric outcomes.

x2, x6 vs. hartmann6

The contour plot visualizes the predicted outcomes for hartmann6 across a two-dimensional parameter space, with other parameters held fixed at their status_quo value (or mean value if status_quo is unavailable). This plot helps in identifying regions of optimal performance and understanding how changes in the selected parameters influence the predicted outcomes. Contour lines represent levels of constant predicted values, providing insights into the gradient and potential optima within the parameter space.

x1, x6 vs. hartmann6

Cross Validation for hartmann6

The cross-validation plot displays the model fit for each metric in the experiment. It employs a leave-one-out approach, where the model is trained on all data except one sample, which is used for validation. The plot shows the predicted outcome for the validation set on the y-axis against its actual value on the x-axis. Points that align closely with the dotted diagonal line indicate a strong model fit, signifying accurate predictions. Additionally, the plot includes 95% confidence intervals that provide insight into the noise in observations and the uncertainty in model predictions. A horizontal, flat line of predictions indicates that the model has not picked up on sufficient signal in the data, and instead is just predicting the mean.

Conclusion

This tutorial demonstrates how to use Ax's Client for closed-loop optimization using the Hartmann6 function as an example. This style of optimization is useful in scenarios where trials are evaluated on some external system or when experimenters wish to take advantage of parallel evaluation, trial failure handling, or simply to manage long-running optimization tasks without human intervention. You can define your own Runner and Metric classes to communicate with whatever external systems you wish to interface with, and control optimization using the OrchestrationConfig.

Learning Objectives​

Prerequisites​

Step 1: Import Necessary Modules​

Step 2: Defining our custom Runner and Metric

Step 3: Initialize the Client and Configure the Experiment​

Step 5: Run trials​

Step 6: Analyze Results​

Conclusion​