Skip to main content
Version: 0.5.0

Configurable closed-loop optimization with Ax Scheduler

We recommend reading through the "Developer API" tutorial before getting started with the Scheduler, as using it in this tutorial will require an Ax Experiment and an understanding of the experiment's subcomponents like the search space and the runner.

Contents:

  1. Scheduler and external systems for trial evalution –– overview of how scheduler works with an external system to run a closed-loop optimization.
  2. Set up a mock external system –– creating a dummy external system client, which will be used to illustrate a scheduler setup in this tutorial.
  3. Set up an experiment according to the mock external system –– set up a runner that deploys trials to the dummy external system from part 2 and a metric that fetches trial results from that system, then leverage those runner and metric and set up an experiment.
  4. Set up a scheduler, given an experiment.
    1. Create a scheduler subclass to poll trial status.
    2. Set up a generation strategy using an auto-selection utility.
  5. Running the optimization via Scheduler.run_n_trials.
  6. Leveraging SQL storage and experiment resumption –– resuming an experiment in one line of code.
  7. Configuring the scheduler –– overview of the many options scheduler provides to configure the closed-loop down to granular detail.
  8. Advanced functionality:
    1. Reporting results to an external system during the optimization.
    2. Using Scheduler.run_trials_and_yield_results to run the optimization via a generator method.

1. Scheduler and external systems for trial evaluation

Scheduler is a closed-loop manager class in Ax that continuously deploys trial runs to an arbitrary external system in an asynchronous fashion, polls their status from that system, and leverages known trial results to generate more trials.

Key features of the Scheduler:

  • Maintains user-set concurrency limits for trials run in parallel, keep track of tolerated level of failed trial runs, and 'oversee' the optimization in other ways,
  • Leverages an Ax Experiment for optimization setup (an optimization config with metrics, a search space, a runner for trial evaluations),
  • Uses an Ax GenerationStrategy for flexible specification of an optimization algorithm used to generate new trials to run,
  • Supports SQL storage and allows for easy resumption of stored experiments.

This scheme summarizes how the scheduler interacts with any external system used to run trial evaluations:

image-2.png

2. Set up a mock external execution system

An example of an 'external system' running trial evaluations could be a remote server executing scheduled jobs, a subprocess conducting ML training runs, an engine running physics simulations, etc. For the sake of example here, let us assume a dummy external system with the following client:

import sys
in_colab = 'google.colab' in sys.modules
if in_colab:
%pip install ax-platform
from random import randint
from time import time
from typing import Any, Dict, NamedTuple, Union

from ax.core.base_trial import TrialStatus
from ax.utils.measurement.synthetic_functions import branin


class MockJob(NamedTuple):
"""Dummy class to represent a job scheduled on `MockJobQueue`."""

id: int
parameters: Dict[str, Union[str, float, int, bool]]


class MockJobQueueClient:
"""Dummy class to represent a job queue where the Ax `Scheduler` will
deploy trial evaluation runs during optimization.
"""

jobs: Dict[str, MockJob] = {}

def schedule_job_with_parameters(
self, parameters: Dict[str, Union[str, float, int, bool]]
) -> int:
"""Schedules an evaluation job with given parameters and returns job ID."""
# Code to actually schedule the job and produce an ID would go here;
# using timestamp in microseconds as dummy ID for this example.
job_id = int(time() * 1e6)
self.jobs[job_id] = MockJob(job_id, parameters)
return job_id

def get_job_status(self, job_id: int) -> TrialStatus:
""" "Get status of the job by a given ID. For simplicity of the example,
return an Ax `TrialStatus`.
"""
job = self.jobs[job_id]
# Instead of randomizing trial status, code to check actual job status
# would go here.
if randint(0, 3) > 0:
return TrialStatus.COMPLETED
return TrialStatus.RUNNING

def get_outcome_value_for_completed_job(self, job_id: int) -> Dict[str, float]:
"""Get evaluation results for a given completed job."""
job = self.jobs[job_id]
# In a real external system, this would retrieve real relevant outcomes and
# not a synthetic function value.
return {"branin": branin(job.parameters.get("x1"), job.parameters.get("x2"))}


MOCK_JOB_QUEUE_CLIENT = MockJobQueueClient()


def get_mock_job_queue_client() -> MockJobQueueClient:
"""Obtain the singleton job queue instance."""
return MOCK_JOB_QUEUE_CLIENT

3. Set up an experiment according to the mock external system

As mentioned above, using a Scheduler requires a fully set up experiment with metrics and a runner. Refer to the "Building Blocks of Ax" tutorial to learn more about those components, as here we assume familiarity with them.

The following runner and metric set up intractions between the Scheduler and the mock external system we assume:

from collections import defaultdict
from typing import Iterable, Set

from ax.core.base_trial import BaseTrial
from ax.core.runner import Runner
from ax.core.trial import Trial


class MockJobRunner(Runner): # Deploys trials to external system.
def run(self, trial: BaseTrial) -> Dict[str, Any]:
"""Deploys a trial based on custom runner subclass implementation.

Args:
trial: The trial to deploy.

Returns:
Dict of run metadata from the deployment process.
"""
if not isinstance(trial, Trial):
raise ValueError("This runner only handles `Trial`.")

mock_job_queue = get_mock_job_queue_client()
job_id = mock_job_queue.schedule_job_with_parameters(
parameters=trial.arm.parameters
)
# This run metadata will be attached to trial as `trial.run_metadata`
# by the base `Scheduler`.
return {"job_id": job_id}

def poll_trial_status(
self, trials: Iterable[BaseTrial]
) -> Dict[TrialStatus, Set[int]]:
"""Checks the status of any non-terminal trials and returns their
indices as a mapping from TrialStatus to a list of indices. Required
for runners used with Ax ``Scheduler``.

NOTE: Does not need to handle waiting between polling calls while trials
are running; this function should just perform a single poll.

Args:
trials: Trials to poll.

Returns:
A dictionary mapping TrialStatus to a list of trial indices that have
the respective status at the time of the polling. This does not need to
include trials that at the time of polling already have a terminal
(ABANDONED, FAILED, COMPLETED) status (but it may).
"""
status_dict = defaultdict(set)
for trial in trials:
mock_job_queue = get_mock_job_queue_client()
status = mock_job_queue.get_job_status(
job_id=trial.run_metadata.get("job_id")
)
status_dict[status].add(trial.index)

return status_dict
import pandas as pd

from ax.core.metric import Metric, MetricFetchResult, MetricFetchE
from ax.core.base_trial import BaseTrial
from ax.core.data import Data
from ax.utils.common.result import Ok, Err


class BraninForMockJobMetric(Metric): # Pulls data for trial from external system.
def fetch_trial_data(self, trial: BaseTrial) -> MetricFetchResult:
"""Obtains data via fetching it from ` for a given trial."""
if not isinstance(trial, Trial):
raise ValueError("This metric only handles `Trial`.")

try:
mock_job_queue = get_mock_job_queue_client()

# Here we leverage the "job_id" metadata created by `MockJobRunner.run`.
branin_data = mock_job_queue.get_outcome_value_for_completed_job(
job_id=trial.run_metadata.get("job_id")
)
df_dict = {
"trial_index": trial.index,
"metric_name": "branin",
"arm_name": trial.arm.name,
"mean": branin_data.get("branin"),
# Can be set to 0.0 if function is known to be noiseless
# or to an actual value when SEM is known. Setting SEM to
# `None` results in Ax assuming unknown noise and inferring
# noise level from data.
"sem": None,
}
return Ok(value=Data(df=pd.DataFrame.from_records([df_dict])))
except Exception as e:
return Err(
MetricFetchE(message=f"Failed to fetch {self.name}", exception=e)
)

Now we can set up the experiment using the runner and metric we defined. This experiment will have a single-objective optimization config, minimizing the Branin function, and the search space that corresponds to that function.

from ax import *


def make_branin_experiment_with_runner_and_metric() -> Experiment:
parameters = [
RangeParameter(
name="x1",
parameter_type=ParameterType.FLOAT,
lower=-5,
upper=10,
),
RangeParameter(
name="x2",
parameter_type=ParameterType.FLOAT,
lower=0,
upper=15,
),
]

objective = Objective(metric=BraninForMockJobMetric(name="branin"), minimize=True)

return Experiment(
name="branin_test_experiment",
search_space=SearchSpace(parameters=parameters),
optimization_config=OptimizationConfig(objective=objective),
runner=MockJobRunner(),
is_test=True, # Marking this experiment as a test experiment.
)


experiment = make_branin_experiment_with_runner_and_metric()
Out:

[INFO 02-03 18:39:26] ax.core.experiment: The is_test flag has been set to True. This flag is meant purely for development and integration testing purposes. If you are running a live experiment, please set this flag to False

4. Setting up a Scheduler

4A. Auto-selecting a generation strategy

A Scheduler requires an Ax GenerationStrategy specifying the algorithm to use for the optimization. Here we use the choose_generation_strategy utility that auto-picks a generation strategy based on the search space properties. To construct a custom generation strategy instead, refer to the "Generation Strategy" tutorial.

Importantly, a generation strategy in Ax limits allowed parallelism levels for each generation step it contains. If you would like the Scheduler to ensure parallelism limitations, set max_examples on each generation step in your generation strategy.

from ax.modelbridge.dispatch_utils import choose_generation_strategy

generation_strategy = choose_generation_strategy(
search_space=experiment.search_space,
max_parallelism_cap=3,
)
Out:

[INFO 02-03 18:39:26] ax.modelbridge.dispatch_utils: Using Models.BOTORCH_MODULAR since there is at least one ordered parameter and there are no unordered categorical parameters.

Out:

[INFO 02-03 18:39:26] ax.modelbridge.dispatch_utils: Calculating the number of remaining initialization trials based on num_initialization_trials=None max_initialization_trials=None num_tunable_parameters=2 num_trials=None use_batch_trials=False

Out:

[INFO 02-03 18:39:26] ax.modelbridge.dispatch_utils: calculated num_initialization_trials=5

Out:

[INFO 02-03 18:39:26] ax.modelbridge.dispatch_utils: num_completed_initialization_trials=0 num_remaining_initialization_trials=5

Out:

[INFO 02-03 18:39:26] ax.modelbridge.dispatch_utils: verbose, disable_progbar, and jit_compile are not yet supported when using choose_generation_strategy with ModularBoTorchModel, dropping these arguments.

Out:

[INFO 02-03 18:39:26] ax.modelbridge.dispatch_utils: Using Bayesian Optimization generation strategy: GenerationStrategy(name='Sobol+BoTorch', steps=[Sobol for 5 trials, BoTorch for subsequent trials]). Iterations after 5 will take longer to generate due to model-fitting.

Now we have all the components needed to start the scheduler:

from ax.service.scheduler import Scheduler, SchedulerOptions


scheduler = Scheduler(
experiment=experiment,
generation_strategy=generation_strategy,
options=SchedulerOptions(),
)
Out:

[INFO 02-03 18:39:26] Scheduler: Scheduler requires experiment to have immutable search space and optimization config. Setting property immutable_search_space_and_opt_config to True on experiment.

4B. Optional: Defining a plotting function

import numpy as np
from ax.plot.trace import optimization_trace_single_method
from ax.utils.notebook.plotting import render, init_notebook_plotting
import plotly.io as pio

init_notebook_plotting()
if in_colab:
pio.renderers.default = "colab"


def get_plot():
best_objectives = np.array(
[[trial.objective_mean for trial in scheduler.experiment.trials.values()]]
)
best_objective_plot = optimization_trace_single_method(
y=np.minimum.accumulate(best_objectives, axis=1),
title="Model performance vs. # of iterations",
ylabel="Y",
)
return best_objective_plot
Out:

[INFO 02-03 18:39:26] ax.utils.notebook.plotting: Injecting Plotly library into cell. Do not overwrite or delete cell.

Out:

[INFO 02-03 18:39:26] ax.utils.notebook.plotting: Please see

(https://ax.dev/tutorials/visualizations.html#Fix-for-plots-that-are-not-rendering)

if visualizations are not rendering.

5. Running the optimization

Once the Scheduler instance is set up, user can execute run_n_trials as many times as needed, and each execution will add up to the specified max_trials trials to the experiment. The number of trials actually run might be less than max_trials if the optimization was concluded (e.g. there are no more points in the search space).

scheduler.run_n_trials(max_trials=3)
Out:

/home/runner/work/Ax/Ax/ax/modelbridge/cross_validation.py:439: UserWarning:

Encountered exception in computing model fit quality: RandomModelBridge does not support prediction.

[INFO 02-03 18:39:26] Scheduler: Running trials [0]...

Out:

/home/runner/work/Ax/Ax/ax/modelbridge/cross_validation.py:439: UserWarning:

Encountered exception in computing model fit quality: RandomModelBridge does not support prediction.

[INFO 02-03 18:39:27] Scheduler: Running trials [1]...

Out:

/home/runner/work/Ax/Ax/ax/modelbridge/cross_validation.py:439: UserWarning:

Encountered exception in computing model fit quality: RandomModelBridge does not support prediction.

[INFO 02-03 18:39:28] Scheduler: Running trials [2]...

Out:

[INFO 02-03 18:39:29] Scheduler: Retrieved COMPLETED trials: 0 - 2.

Out:

OptimizationResult()

best_objective_plot = get_plot()
render(best_objective_plot)
loading...

We can examine experiment to see that it now has three trials:

from ax.service.utils.report_utils import exp_to_df

exp_to_df(experiment)
trial_indexarm_nametrial_statusgeneration_methodbraninx1x2
000_0COMPLETEDSobol121.7076.2663511.2042
111_0COMPLETEDSobol48.3815-2.593734.1723
222_0COMPLETEDSobol56.2477-1.2342814.7245

Now we can run run_n_trials again to add three more trials to the experiment (this time, without plotting).

scheduler.run_n_trials(max_trials=3)
Out:

/home/runner/work/Ax/Ax/ax/modelbridge/cross_validation.py:439: UserWarning:

Encountered exception in computing model fit quality: RandomModelBridge does not support prediction.

[INFO 02-03 18:39:31] Scheduler: Running trials [3]...

Out:

/home/runner/work/Ax/Ax/ax/core/data.py:295: FutureWarning:

The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.

/home/runner/work/Ax/Ax/ax/modelbridge/cross_validation.py:439: UserWarning:

Encountered exception in computing model fit quality: RandomModelBridge does not support prediction.

[INFO 02-03 18:39:32] Scheduler: Running trials [4]...

Out:

/home/runner/work/Ax/Ax/ax/core/data.py:295: FutureWarning:

The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.

Out:

[INFO 02-03 18:39:34] Scheduler: Running trials [5]...

Out:

[INFO 02-03 18:39:35] Scheduler: Retrieved COMPLETED trials: 3 - 4.

Out:

[INFO 02-03 18:39:35] Scheduler: Done submitting trials, waiting for remaining 1 running trials...

Out:

[INFO 02-03 18:39:35] Scheduler: Waiting for completed trials (for 1 sec, currently running trials: 1).

Out:

[INFO 02-03 18:39:36] Scheduler: Waiting for completed trials (for 1.5 sec, currently running trials: 1).

Out:

[INFO 02-03 18:39:37] Scheduler: Retrieved COMPLETED trials: [5].

Out:

OptimizationResult()

best_objective_plot = get_plot()
render(best_objective_plot)
loading...

Examiniming the experiment, we now see 6 trials, one of which is produced by Bayesian optimization (GPEI):

exp_to_df(experiment)
trial_indexarm_nametrial_statusgeneration_methodbraninx1x2
000_0COMPLETEDSobol121.7076.2663511.2042
111_0COMPLETEDSobol48.3815-2.593734.1723
222_0COMPLETEDSobol56.2477-1.2342814.7245
333_0COMPLETEDSobol13.08654.906680.190758
444_0COMPLETEDSobol112.264.2511911.8902
555_0COMPLETEDBoTorch17.5083-515

For each call to run_n_trials, one can specify a timeout; if run_n_trials has been running for too long without finishing its max_trials, the operation will exit gracefully:

scheduler.run_n_trials(max_trials=3, timeout_hours=0.00001)
Out:

[INFO 02-03 18:39:39] Scheduler: Running trials [6]...

Out:

[ERROR 02-03 18:39:39] Scheduler: Optimization timed out (timeout hours: 1e-05)!

Out:

[INFO 02-03 18:39:39] Scheduler: should_abort_optimization is True, not running more trials.

Out:

[INFO 02-03 18:39:39] Scheduler: Retrieved COMPLETED trials: [6].

Out:

[ERROR 02-03 18:39:39] Scheduler: Optimization timed out (timeout hours: 1e-05)!

Out:

OptimizationResult()

best_objective_plot = get_plot()
render(best_objective_plot)
loading...

6. Leveraging SQL storage and experiment resumption

When a scheduler is SQL-enabled, it will automatically save all updates it makes to the experiment in the course of the optimization. The experiment can then be resumed in the event of a crash or after a pause. The scheduler should be stateless and therefore, the scheduler itself is not saved in the database.

To store state of optimization to an SQL backend, first follow setup instructions on Ax website. Having set up the SQL backend, pass DBSettings to the Scheduler on instantiation (note that SQLAlchemy dependency will have to be installed – for installation, refer to optional dependencies on Ax website):

from ax.storage.registry_bundle import RegistryBundle
from ax.storage.sqa_store.db import (
create_all_tables,
get_engine,
init_engine_and_session_factory,
)
from ax.storage.sqa_store.decoder import Decoder
from ax.storage.sqa_store.encoder import Encoder
from ax.storage.sqa_store.sqa_config import SQAConfig
from ax.storage.sqa_store.structs import DBSettings

bundle = RegistryBundle(
metric_clss={BraninForMockJobMetric: None}, runner_clss={MockJobRunner: None}
)

# URL is of the form "dialect+driver://username:password@host:port/database".
# Instead of URL, can provide a `creator function`; can specify custom encoders/decoders if necessary.
db_settings = DBSettings(
url="sqlite:///foo.db",
encoder=bundle.encoder,
decoder=bundle.decoder,
)

# The following lines are only necessary because it is the first time we are using this database
# in practice, you will not need to run these lines every time you initialize your scheduler
init_engine_and_session_factory(url=db_settings.url)
engine = get_engine()
create_all_tables(engine)

stored_experiment = make_branin_experiment_with_runner_and_metric()
generation_strategy = choose_generation_strategy(search_space=experiment.search_space)

scheduler_with_storage = Scheduler(
experiment=stored_experiment,
generation_strategy=generation_strategy,
options=SchedulerOptions(),
db_settings=db_settings,
)
Out:

[INFO 02-03 18:39:40] ax.core.experiment: The is_test flag has been set to True. This flag is meant purely for development and integration testing purposes. If you are running a live experiment, please set this flag to False

Out:

[INFO 02-03 18:39:40] ax.modelbridge.dispatch_utils: Using Models.BOTORCH_MODULAR since there is at least one ordered parameter and there are no unordered categorical parameters.

Out:

[INFO 02-03 18:39:40] ax.modelbridge.dispatch_utils: Calculating the number of remaining initialization trials based on num_initialization_trials=None max_initialization_trials=None num_tunable_parameters=2 num_trials=None use_batch_trials=False

Out:

[INFO 02-03 18:39:40] ax.modelbridge.dispatch_utils: calculated num_initialization_trials=5

Out:

[INFO 02-03 18:39:40] ax.modelbridge.dispatch_utils: num_completed_initialization_trials=0 num_remaining_initialization_trials=5

Out:

[INFO 02-03 18:39:40] ax.modelbridge.dispatch_utils: verbose, disable_progbar, and jit_compile are not yet supported when using choose_generation_strategy with ModularBoTorchModel, dropping these arguments.

Out:

[INFO 02-03 18:39:40] ax.modelbridge.dispatch_utils: Using Bayesian Optimization generation strategy: GenerationStrategy(name='Sobol+BoTorch', steps=[Sobol for 5 trials, BoTorch for subsequent trials]). Iterations after 5 will take longer to generate due to model-fitting.

Out:

[INFO 02-03 18:39:40] Scheduler: Scheduler requires experiment to have immutable search space and optimization config. Setting property immutable_search_space_and_opt_config to True on experiment.

Out:

[INFO 02-03 18:39:40] ax.service.utils.with_db_settings_base: Experiment branin_test_experiment is not yet in DB, storing it.

Out:

[ERROR 02-03 18:39:40] ax.storage.sqa_store.encoder: ATTENTION: The Ax team is considering deprecating SQLAlchemy storage. If you are currently using SQLAlchemy storage, please reach out to us via GitHub Issues here: https://github.com/facebook/Ax/issues/2975

Out:

[INFO 02-03 18:39:40] ax.core.experiment: The is_test flag has been set to True. This flag is meant purely for development and integration testing purposes. If you are running a live experiment, please set this flag to False

Out:

[INFO 02-03 18:39:40] ax.service.utils.with_db_settings_base: Generation strategy Sobol+BoTorch is not yet in DB, storing it.

To resume a stored experiment:

reloaded_experiment_scheduler = Scheduler.from_stored_experiment(
experiment_name="branin_test_experiment",
options=SchedulerOptions(),
# `DBSettings` are also required here so scheduler has access to the
# database, from which it needs to load the experiment.
db_settings=db_settings,
)
Out:

[INFO 02-03 18:39:41] ax.service.utils.with_db_settings_base: Loading experiment and generation strategy (with reduced state: True)...

Out:

[INFO 02-03 18:39:41] ax.core.experiment: The is_test flag has been set to True. This flag is meant purely for development and integration testing purposes. If you are running a live experiment, please set this flag to False

Out:

[INFO 02-03 18:39:41] ax.service.utils.with_db_settings_base: Loaded experiment branin_test_experiment & 0 trials in 0.02 seconds.

Out:

[INFO 02-03 18:39:41] ax.service.utils.with_db_settings_base: Loaded generation strategy for experiment branin_test_experiment in 0.01 seconds.

With the newly reloaded experiment, the Scheduler can continue the optimization:

reloaded_experiment_scheduler.run_n_trials(max_trials=3)
Out:

/home/runner/work/Ax/Ax/ax/modelbridge/cross_validation.py:439: UserWarning:

Encountered exception in computing model fit quality: RandomModelBridge does not support prediction.

[INFO 02-03 18:39:41] Scheduler: Running trials [0]...

Out:

/home/runner/work/Ax/Ax/ax/modelbridge/cross_validation.py:439: UserWarning:

Encountered exception in computing model fit quality: RandomModelBridge does not support prediction.

[INFO 02-03 18:39:41] Scheduler: Running trials [1]...

Out:

/home/runner/work/Ax/Ax/ax/modelbridge/cross_validation.py:439: UserWarning:

Encountered exception in computing model fit quality: RandomModelBridge does not support prediction.

[INFO 02-03 18:39:41] Scheduler: Running trials [2]...

Out:

[INFO 02-03 18:39:41] Scheduler: Retrieved COMPLETED trials: [0, 2].

Out:

[INFO 02-03 18:39:41] Scheduler: Done submitting trials, waiting for remaining 1 running trials...

Out:

[INFO 02-03 18:39:41] Scheduler: Waiting for completed trials (for 1 sec, currently running trials: 1).

Out:

[INFO 02-03 18:39:42] Scheduler: Retrieved COMPLETED trials: [1].

Out:

OptimizationResult()

7. Configuring the scheduler with SchedulerOptions, like early stopping

Scheduler exposes many options to configure the exact settings of the closed-loop optimization to perform. A few notable ones are:

  • trial_type –– currently only Trial and not BatchTrial is supported, but support for BatchTrial-s will follow,
  • tolerated_trial_failure_rate and min_failed_trials_for_failure_rate_check –– together these two settings control how the scheduler monitors the failure rate among trial runs it deploys. Once min_failed_trials_for_failure_rate_check is deployed, the scheduler will start checking whether the ratio of failed to total trials is greater than tolerated_trial_failure_rate, and if it is, scheduler will exit the optimization with a FailureRateExceededError,
  • ttl_seconds_for_trials –– sometimes a failure in a trial run means that it will be difficult to query its status (e.g. due to a crash). If this setting is specified, the Ax Experiment will automatically mark trials that have been running for too long (more than their 'time-to-live' (TTL) seconds) as failed,
  • run_trials_in_batches –– if True, the scheduler will attempt to run trials not by calling Scheduler.run_trial in a loop, but by calling Scheduler.run_trials on all ready-to-deploy trials at once. This could allow for saving compute in cases where the deployment operation has large overhead and deploying many trials at once saves compute. Note that using this option successfully will require your scheduler subclass to implement MySchedulerSubclass.run_trials and MySchedulerSubclass.poll_available_capacity.
  • early_stopping_strategy -- determines whether a trial should be stopped given the current state of the experiment, so that less promising trials can be terminated quickly. For more on this, see the Trial-Level Early Stopping tutorial: https://ax.dev/tutorials/early_stopping/early_stopping.html
  • global_stopping_strategy -- determines whether the full optimization should be stopped or not, so that the run terminates when little progress is being made. A global_stopping_strategy instance can be passed to SchedulerOptions just as it is passed to AxClient, as illustrated in the tutorial on Global Stopping Strategy with AxClient: https://ax.dev/tutorials/gss.html

The rest of the options are described in the docstring below:

print(SchedulerOptions.__doc__)
Out:

Settings for a scheduler instance.

Attributes:

max_pending_trials: Maximum number of pending trials the scheduler

can have STAGED or RUNNING at once, required. If looking

to use Runner.poll_available_capacity as a primary guide for

how many trials should be pending at a given time, set this limit

to a high number, as an upper bound on number of trials that

should not be exceeded.

trial_type: Type of trials (1-arm Trial or multi-arm Batch

Trial) that will be deployed using the scheduler. Defaults

to 1-arm Trial. NOTE: use BatchTrial only if need to

evaluate multiple arms *together*, e.g. in an A/B-test

influenced by data nonstationarity. For cases where just

deploying multiple arms at once is beneficial but the trials

are evaluated *independently*, implement run_trials method

in scheduler subclass, to deploy multiple 1-arm trials at

the same time.

batch_size: If using BatchTrial the number of arms to be generated and

deployed per trial.

total_trials: Limit on number of trials a given Scheduler

should run. If no stopping criteria are implemented on

a given scheduler, exhaustion of this number of trials

will be used as default stopping criterion in

Scheduler.run_all_trials. Required to be non-null if

using Scheduler.run_all_trials (not required for

Scheduler.run_n_trials).

tolerated_trial_failure_rate: Fraction of trials in this

optimization that are allowed to fail without the whole

optimization ending. Expects value between 0 and 1.

NOTE: Failure rate checks begin once

min_failed_trials_for_failure_rate_check trials have

failed; after that point if the ratio of failed trials

to total trials ran so far exceeds the failure rate,

the optimization will halt.

min_failed_trials_for_failure_rate_check: The minimum number

of trials that must fail in Scheduler in order to start

checking failure rate.

log_filepath: File, to which to write optimization logs.

logging_level: Minimum level of logging statements to log,

defaults to logging.INFO.

ttl_seconds_for_trials: Optional TTL for all trials created

within this Scheduler, in seconds. Trials that remain

RUNNING for more than their TTL seconds will be marked

FAILED once the TTL elapses and may be re-suggested by

the Ax optimization models.

init_seconds_between_polls: Initial wait between rounds of

polling, in seconds. Relevant if using the default wait-

for-completed-runs functionality of the base Scheduler

(if wait_for_completed_trials_and_report_results is not

overridden). With the default waiting, every time a poll

returns that no trial evaluations completed, wait

time will increase; once some completed trial evaluations

are found, it will reset back to this value. Specify 0

to not introduce any wait between polls.

min_seconds_before_poll: Minimum number of seconds between

beginning to run a trial and the first poll to check

trial status.

timeout_hours: Number of hours after which the optimization will abort.

seconds_between_polls_backoff_factor: The rate at which the poll

interval increases.

run_trials_in_batches: If True and poll_available_capacity is

implemented to return non-null results, trials will be dispatched

in groups via run_trials instead of one-by-one via run_trial.

This allows to save time, IO calls or computation in cases where

dispatching trials in groups is more efficient then sequential

deployment. The size of the groups will be determined as

the minimum of self.poll_available_capacity() and the number

of generator runs that the generation strategy is able to produce

without more data or reaching its allowed max paralellism limit.

debug_log_run_metadata: Whether to log run_metadata for debugging purposes.

early_stopping_strategy: A BaseEarlyStoppingStrategy that determines

whether a trial should be stopped given the current state of

the experiment. Used in should_stop_trials_early.

global_stopping_strategy: A BaseGlobalStoppingStrategy that determines

whether the full optimization should be stopped or not.

suppress_storage_errors_after_retries: Whether to fully suppress SQL

storage-related errors if encountered, after retrying the call

multiple times. Only use if SQL storage is not important for the given

use case, since this will only log, but not raise, an exception if

it's encountered while saving to DB or loading from it.

wait_for_running_trials: Whether the scheduler should wait for running trials

or exit.

fetch_kwargs: Kwargs to be used when fetching data.

validate_metrics: Whether to raise an error if there is a problem with the

metrics attached to the experiment.

status_quo_weight: The weight of the status quo arm. This is only used

if the scheduler is using a BatchTrial. This requires that the status_quo

be set on the experiment.

enforce_immutable_search_space_and_opt_config: Whether to enforce that the

search space and optimization config are immutable. If true, will add

"immutable_search_space_and_opt_config": True to experiment properties

mt_experiment_trial_type: Type of trial to run for MultiTypeExperiments. This

is currently required for MultiTypeExperiments. This is ignored for

"regular" or single type experiments. If you don't know what a single type

experiment is, you don't need this.

force_candidate_generation: Whether to force candidate generation even if the

generation strategy is not ready to generate candidates, meaning one of the

transition criteria with block_gen_if_met is met.

**This is not yet implemented.**

8. Advanced functionality

8a. Reporting results to an external system

The Scheduler can report the optimization result to an external system each time there are new completed trials if the user-implemented subclass implements MySchedulerSubclass.report_results to do so. For example, the folliwing method:

class MySchedulerSubclass(Scheduler):
...

def report_results(self, force_refit: bool = False):
write_to_external_database(len(self.experiment.trials))
return (True, \{}) # Returns optimization success status and optional dict of outputs.

could be used to record number of trials in experiment so far in an external database.

Since report_results is an instance method, it has access to self.experiment and self.generation_strategy, which contain all the information about the state of the optimization thus far.

8b. Using run_trials_and_yield_results generator method

In some systems it's beneficial to have greater control over Scheduler.run_n_trials instead of just starting it and needing to wait for it to run all the way to completion before having access to its output. For this purpose, the Scheduler implements a generator method run_trials_and_yield_results, which yields the output of Scheduler.report_results each time there are new completed trials and can be used like so:

class ResultReportingScheduler(Scheduler):
def report_results(self, force_refit: bool = False):
return True, {
"trials so far": len(self.experiment.trials),
"currently producing trials from generation step": self.generation_strategy._curr.model_name,
"running trials": [t.index for t in self.running_trials],
}
experiment = make_branin_experiment_with_runner_and_metric()
scheduler = ResultReportingScheduler(
experiment=experiment,
generation_strategy=choose_generation_strategy(
search_space=experiment.search_space,
max_parallelism_cap=3,
),
options=SchedulerOptions(),
)

for reported_result in scheduler.run_trials_and_yield_results(max_trials=6):
print("Reported result: ", reported_result)
Out:

[INFO 02-03 18:39:44] ax.core.experiment: The is_test flag has been set to True. This flag is meant purely for development and integration testing purposes. If you are running a live experiment, please set this flag to False

Out:

[INFO 02-03 18:39:44] ax.modelbridge.dispatch_utils: Using Models.BOTORCH_MODULAR since there is at least one ordered parameter and there are no unordered categorical parameters.

Out:

[INFO 02-03 18:39:44] ax.modelbridge.dispatch_utils: Calculating the number of remaining initialization trials based on num_initialization_trials=None max_initialization_trials=None num_tunable_parameters=2 num_trials=None use_batch_trials=False

Out:

[INFO 02-03 18:39:44] ax.modelbridge.dispatch_utils: calculated num_initialization_trials=5

Out:

[INFO 02-03 18:39:44] ax.modelbridge.dispatch_utils: num_completed_initialization_trials=0 num_remaining_initialization_trials=5

Out:

[INFO 02-03 18:39:44] ax.modelbridge.dispatch_utils: verbose, disable_progbar, and jit_compile are not yet supported when using choose_generation_strategy with ModularBoTorchModel, dropping these arguments.

Out:

[INFO 02-03 18:39:44] ax.modelbridge.dispatch_utils: Using Bayesian Optimization generation strategy: GenerationStrategy(name='Sobol+BoTorch', steps=[Sobol for 5 trials, BoTorch for subsequent trials]). Iterations after 5 will take longer to generate due to model-fitting.

Out:

[INFO 02-03 18:39:44] ResultReportingScheduler: Scheduler requires experiment to have immutable search space and optimization config. Setting property immutable_search_space_and_opt_config to True on experiment.

Out:

/home/runner/work/Ax/Ax/ax/modelbridge/cross_validation.py:439: UserWarning:

Encountered exception in computing model fit quality: RandomModelBridge does not support prediction.

[INFO 02-03 18:39:44] ResultReportingScheduler: Running trials [0]...

Out:

/home/runner/work/Ax/Ax/ax/modelbridge/cross_validation.py:439: UserWarning:

Encountered exception in computing model fit quality: RandomModelBridge does not support prediction.

[INFO 02-03 18:39:44] ResultReportingScheduler: Running trials [1]...

Out:

/home/runner/work/Ax/Ax/ax/modelbridge/cross_validation.py:439: UserWarning:

Encountered exception in computing model fit quality: RandomModelBridge does not support prediction.

[INFO 02-03 18:39:45] ResultReportingScheduler: Running trials [2]...

Out:

[INFO 02-03 18:39:45] ResultReportingScheduler: Generated all trials that can be generated currently. Max parallelism currently reached.

Out:

[INFO 02-03 18:39:45] ResultReportingScheduler: Retrieved COMPLETED trials: 0 - 2.

Out:

/home/runner/work/Ax/Ax/ax/modelbridge/cross_validation.py:439: UserWarning:

Encountered exception in computing model fit quality: RandomModelBridge does not support prediction.

[INFO 02-03 18:39:45] ResultReportingScheduler: Running trials [3]...

Out:

Reported result: (True, {'trials so far': 3, 'currently producing trials from generation step': 'Sobol', 'running trials': []})

Out:

/home/runner/work/Ax/Ax/ax/core/data.py:295: FutureWarning:

The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.

/home/runner/work/Ax/Ax/ax/modelbridge/cross_validation.py:439: UserWarning:

Encountered exception in computing model fit quality: RandomModelBridge does not support prediction.

[INFO 02-03 18:39:46] ResultReportingScheduler: Running trials [4]...

Out:

/home/runner/work/Ax/Ax/ax/core/data.py:295: FutureWarning:

The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.

Out:

[INFO 02-03 18:39:48] ResultReportingScheduler: Running trials [5]...

Out:

[INFO 02-03 18:39:49] ResultReportingScheduler: Retrieved COMPLETED trials: 3 - 5.

Out:

Reported result: (True, {'trials so far': 6, 'currently producing trials from generation step': 'BoTorch', 'running trials': []})

Reported result: (True, {'trials so far': 6, 'currently producing trials from generation step': 'BoTorch', 'running trials': []})

# Clean up to enable running the tutorial repeatedly with
# the same results. You wouldn't do this if you wanted to
# keep adding data to the same experiment.
from ax.storage.sqa_store.delete import delete_experiment

delete_experiment("branin_test_experiment")
Out:

[INFO 02-03 18:39:49] ax.storage.sqa_store.delete: You are deleting branin_test_experiment and all its associated data from the database.