ax.benchmark

Benchmark

Benchmark Method

class ax.benchmark.benchmark_method.BenchmarkMethod(*, name: str = 'DEFAULT', generation_strategy: GenerationStrategy, timeout_hours: float = 4.0, distribute_replications: bool = False, use_model_predictions_for_best_point: bool = False, batch_size: int = 1, run_trials_in_batches: bool = False, max_pending_trials: int = 1, early_stopping_strategy: BaseEarlyStoppingStrategy | None = None)[source]

Bases: Base

Benchmark method, represented in terms of Ax generation strategy (which tells us which models to use when) and scheduler options (which tell us extra execution information like maximum parallelism, early stopping configuration, etc.).

Parameters:
  • name – String description.

  • generation_strategy – The GenerationStrategy to use.

  • timeout_hours – Number of hours after which to stop a benchmark replication.

  • distribute_replications – Indicates whether the replications should be run in a distributed manner. Ax itself does not use this attribute.

  • use_model_predictions_for_best_point – Whether to use model predictions with get_pareto_optimal_parameters (if multi-objective) or BestPointMixin._get_best_trial (if single-objective). However, note that if multi-objective, best-point selection is not currently supported and get_pareto_optimal_parameters will raise a NotImplementedError.

  • batch_size – Number of arms per trial. If greater than 1, trials are BatchTrial``s; otherwise, they are ``Trial``s. Defaults to 1. This and the following arguments are passed to ``SchedulerOptions.

  • run_trials_in_batches – Passed to SchedulerOptions.

  • max_pending_trials – Passed to SchedulerOptions.

batch_size: int = 1
distribute_replications: bool = False
early_stopping_strategy: BaseEarlyStoppingStrategy | None = None
generation_strategy: GenerationStrategy
get_best_parameters(experiment: Experiment, optimization_config: OptimizationConfig, n_points: int) list[dict[str, None | str | bool | float | int]][source]

Get n_points promising points. NOTE: Only SOO with n_points = 1 is supported.

The expected use case is that these points will be evaluated against an oracle for hypervolume (if multi-objective) or for the value of the best parameter (if single-objective).

For multi-objective cases, n_points > 1 is needed. For SOO, n_points > 1 reflects setups where we can choose some points which will then be evaluated noiselessly or at high fidelity and then use the best one.

Parameters:
  • experiment – The experiment to get the data from. This should contain values that would be observed in a realistic setting and not contain oracle values.

  • optimization_config – The optimization_config for the corresponding BenchmarkProblem.

  • n_points – The number of points to return.

max_pending_trials: int = 1
name: str = 'DEFAULT'
run_trials_in_batches: bool = False
timeout_hours: float = 4.0
use_model_predictions_for_best_point: bool = False

Benchmark Metric

Metric classes for Ax benchmarking.

Metrics vary on two dimensions: Whether they are `MapMetric`s or not, and whether they are available while running or not.

There are four Metric classes: - BenchmarkMetric: A non-Map metric

is not available while running.

  • BenchmarkMapMetric: For when outputs should be MapData (not Data) and

    data is available while running.

  • BenchmarkTimeVaryingMetric: For when outputs should be Data and the metric is available while running.

  • BenchmarkMapUnavailableWhileRunningMetric: For when outputs should be MapData and the metric is not available while running.

Any of these can be used with or without a simulator. However, BenchmarkMetric.fetch_trial_data cannot take in data with multiple time steps, as they will not be used and this is assumed to be an error. The below table enumerates use cases.

Benchmark Metrics Table

Metric

Map

Available while running

Simulator

Reason/use case

1

BenchmarkMetric

No

No

No

Vanilla

2

BenchmarkMetric

No

No

Yes

Asynchronous, data read only at end

3

BenchmarkTimeVaryingMetric

No

Yes

No

Behaves like #1 because it will never be RUNNING

4

BenchmarkTimeVaryingMetric

No

Yes

Yes

Scalar data that changes over time

5

BenchmarkMapUnavailableWhileRunningMetric

Yes

No

No

MapData that returns immediately; could be used for getting baseline

6

BenchmarkMapUnavailableWhileRunningMetric

Yes

No

Yes

Asynchronicity with MapData read only at end

7

BenchmarkMapMetric

Yes

Yes

No

Behaves same as #5

8

BenchmarkMapMetric

Yes

Yes

Yes

Early stopping

class ax.benchmark.benchmark_metric.BenchmarkMapMetric(name: str, lower_is_better: bool, observe_noise_sd: bool = True)[source]

Bases: MapMetric, BenchmarkMetricBase

MapMetric for benchmarking. It is available while running.

classmethod is_available_while_running() bool[source]

Whether metrics of this class are available while the trial is running. Metrics that are not available while the trial is running are assumed to be available only upon trial completion. For such metrics, data is assumed to never change once the trial is completed.

NOTE: If this method returns False, data-fetching via experiment.fetch_data will return the data cached on the experiment (for the metrics of the given class) whenever its available. Data is cached on experiment when attached via experiment.attach_data.

map_key_info: MapKeyInfo[int] = <ax.core.map_data.MapKeyInfo object>
class ax.benchmark.benchmark_metric.BenchmarkMapUnavailableWhileRunningMetric(name: str, lower_is_better: bool, observe_noise_sd: bool = True)[source]

Bases: MapMetric, BenchmarkMetricBase

map_key_info: MapKeyInfo[int] = <ax.core.map_data.MapKeyInfo object>
class ax.benchmark.benchmark_metric.BenchmarkMetric(name: str, lower_is_better: bool, observe_noise_sd: bool = True)[source]

Bases: BenchmarkMetricBase

Non-map Metric for benchmarking that is not available while running.

It cannot process data with multiple time steps, as it would only return one value – the value it has at completion time – regardless.

class ax.benchmark.benchmark_metric.BenchmarkMetricBase(name: str, lower_is_better: bool, observe_noise_sd: bool = True)[source]

Bases: Metric

fetch_trial_data(trial: BaseTrial, **kwargs: Any) Result[Data, MetricFetchE][source]
Parameters:
  • trial – The trial from which to fetch data.

  • kwargs – Unsupported and will raise an exception.

Returns:

A MetricFetchResult containing the data for the requested metric.

class ax.benchmark.benchmark_metric.BenchmarkTimeVaryingMetric(name: str, lower_is_better: bool, observe_noise_sd: bool = True)[source]

Bases: BenchmarkMetricBase

Non-Map Metric for benchmarking that is available while running.

It can produce different values at different times depending on when it is called, using the time on a BackendSimulator.

classmethod is_available_while_running() bool[source]

Whether metrics of this class are available while the trial is running. Metrics that are not available while the trial is running are assumed to be available only upon trial completion. For such metrics, data is assumed to never change once the trial is completed.

NOTE: If this method returns False, data-fetching via experiment.fetch_data will return the data cached on the experiment (for the metrics of the given class) whenever its available. Data is cached on experiment when attached via experiment.attach_data.

Benchmark Problem

class ax.benchmark.benchmark_problem.BenchmarkProblem(*, name: str, optimization_config: ~ax.core.optimization_config.OptimizationConfig, num_trials: int, test_function: ~ax.benchmark.benchmark_test_function.BenchmarkTestFunction, noise_std: float | ~collections.abc.Sequence[float] | ~collections.abc.Mapping[str, float] = 0.0, optimal_value: float, baseline_value: float, search_space: ~ax.core.search_space.SearchSpace, report_inference_value_as_trace: bool = False, n_best_points: int = 1, step_runtime_function: ~ax.benchmark.benchmark_step_runtime_function.TBenchmarkStepRuntimeFunction | None = None, target_fidelity_and_task: ~collections.abc.Mapping[str, None | str | bool | float | int] = <factory>, status_quo_params: dict[str, None | str | bool | float | int] | None = None)[source]

Bases: Base

Problem against which different methods can be benchmarked.

Defines how data is generated, the objective (via the OptimizationConfig), and the SearchSpace.

Parameters:
  • name – Can be generated programmatically with _get_name.

  • optimization_config – Defines the objective of optimization. Metrics must be `BenchmarkMetric`s.

  • num_trials – Number of optimization iterations to run. BatchTrials count as one trial.

  • optimal_value – The best ground-truth objective value, used for scoring optimization results on a scale from 0 to 100, where achieving the optimal_value receives a score of 100. The optimal_value should be a hypervolume for multi-objective problems. If the best value is not known, it is conventional to set it to a value that is almost certainly better than the best value, so that a benchmark’s score will not exceed 100%.

  • baseline_value – Similar to optimal_value, but a not-so-good value which benchmarks are expected to do better than. A baseline value can be derived using the function compute_baseline_value_from_sobol, which takes the best of five quasi-random Sobol trials.

  • search_space – The search space.

  • test_function – A BenchmarkTestFunction, which will generate noiseless data. This will be used by a BenchmarkRunner.

  • noise_std – Describes how noise is added to the output of the test_function. If a float, IID random normal noise with that standard deviation is added. A list of floats, or a dict whose keys match test_functions.outcome_names, sets different noise standard deviations for the different outcomes produced by the test_function. This will be used by a BenchmarkRunner.

  • report_inference_value_as_trace – Whether the optimization_trace on a BenchmarkResult should use the oracle_trace (if False, default) or the inference_trace. See BenchmarkResult for more information. Currently, this is only supported for single-objective problems.

  • n_best_points – Number of points for a best-point selector to recommend. Currently, only n_best_points=1 is supported.

  • step_runtime_function – Optionally, a function that takes in params (typically dictionaries mapping strings to TParamValue``s) and returns the runtime of an step. If ``step_runtime_function is left as None, each step will take one simulated second. (When data is not time-series, the whole trial consists of one step.)

baseline_value: float
property is_moo: bool

Whether the problem is multi-objective.

n_best_points: int = 1
name: str
noise_std: float | Sequence[float] | Mapping[str, float] = 0.0
num_trials: int
optimal_value: float
optimization_config: OptimizationConfig
report_inference_value_as_trace: bool = False
search_space: SearchSpace
status_quo_params: dict[str, None | str | bool | float | int] | None = None
step_runtime_function: TBenchmarkStepRuntimeFunction | None = None
target_fidelity_and_task: Mapping[str, None | str | bool | float | int]
test_function: BenchmarkTestFunction
ax.benchmark.benchmark_problem.create_problem_from_botorch(*, test_problem_class: type[BaseTestProblem], test_problem_kwargs: dict[str, Any], noise_std: float | list[float] = 0.0, num_trials: int, baseline_value: float | None = None, name: str | None = None, lower_is_better: bool = True, observe_noise_sd: bool = False, search_space: SearchSpace | None = None, report_inference_value_as_trace: bool = False, step_runtime_function: TBenchmarkStepRuntimeFunction | None = None, status_quo_params: dict[str, None | str | bool | float | int] | None = None) BenchmarkProblem[source]

Create a BenchmarkProblem from a BoTorch BaseTestProblem.

The resulting BenchmarkProblem’s test_function is constructed from the BaseTestProblem class (test_problem_class) and its arguments (test_problem_kwargs). All other fields are passed to BenchmarkProblem if they are specified and populated with reasonable defaults otherwise. num_trials, however, must be specified.

Parameters:
  • test_problem_class – The BoTorch test problem class which will be used to define the search_space, optimization_config, and runner.

  • test_problem_kwargs – Keyword arguments used to instantiate the test_problem_class. This should not include noise_std or negate, since these are handled through Ax benchmarking (as the noise_std and lower_is_better arguments to BenchmarkProblem).

  • noise_std – Standard deviation of synthetic noise added to outcomes. If a float, the same noise level is used for all objectives.

  • lower_is_better – Whether this is a minimization problem. For MOO, this applies to all objectives.

  • num_trials – Simply the num_trials of the BenchmarkProblem created.

  • baseline_value – If not provided, will be looked up from BOTORCH_BASELINE_VALUES.

  • name – Will be passed to BenchmarkProblem if specified and populated with reasonable defaults otherwise.

  • observe_noise_sd – Whether the standard deviation of the observation noise is observed or not (in which case it must be inferred by the model). This is separate from whether synthetic noise is added to the problem, which is controlled by the noise_std of the test problem.

  • search_space – If provided, the search_space of the BenchmarkProblem. Otherwise, a SearchSpace with all `RangeParameter`s is created from the bounds of the test problem.

  • report_inference_value_as_trace – If True, indicates that the optimization_trace on a BenchmarkResult ought to be the inference_trace; otherwise, it will be the oracle_trace. See BenchmarkResult for more information.

  • status_quo_params – The status quo parameters for the problem.

  • step_runtime_function – Optionally, a function that takes in params (typically dictionaries mapping strings to TParamValue``s) and returns the runtime of an step. If ``step_runtime_function is left as None, each step will take one simulated second. (When data is not time-series, the whole trial consists of one step.)

Example

>>> from ax.benchmark.benchmark_problem import create_problem_from_botorch
>>> from botorch.test_functions.synthetic import Branin
>>> problem = create_problem_from_botorch(
...    test_problem_class=Branin,
...    test_problem_kwargs={},
...    noise_std=0.1,
...    num_trials=10,
...    observe_noise_sd=True,
...    step_runtime_function=lambda params: 1 / params["fidelity"],
... )
ax.benchmark.benchmark_problem.get_continuous_search_space(bounds: list[tuple[float, float]]) SearchSpace[source]
ax.benchmark.benchmark_problem.get_moo_opt_config(*, outcome_names: Sequence[str], ref_point: Sequence[float], num_constraints: int = 0, lower_is_better: bool = True, observe_noise_sd: bool = False, use_map_metric: bool = False) MultiObjectiveOptimizationConfig[source]

Create a MultiObjectiveOptimizationConfig, potentially with constraints.

Parameters:
  • outcome_names – Names of the outcomes. If num_constraints is greater than zero, the last num_constraints elements of outcome_names will become the names of ``BenchmarkMetric``s on constraints, and the others will correspond to the objectives.

  • ref_point – Objective thresholds for the objective metrics. Note: Although this method requires providing a threshold for each objective, this is not required in general and could be enabled for this method.

  • num_constraints – Number of constraints.

  • lower_is_better – Whether the objectives are lower-is-better. Applies to all objectives and not to constraints. For constraints, higher is better (feasible). Note: Ax allows different metrics to have different values of lower_is_better; that isn’t enabled for this method, but could be.

  • observe_noise_sd – Whether the standard deviation of the observation

  • constraints. (noise is observed. Applies to all objective and)

ax.benchmark.benchmark_problem.get_soo_opt_config(*, outcome_names: Sequence[str], lower_is_better: bool = True, observe_noise_sd: bool = False, use_map_metric: bool = False) OptimizationConfig[source]

Create a single-objective OptimizationConfig, potentially with constraints.

Parameters:
  • outcome_names – Names of the outcomes. If outcome_names has more than one element, constraints will be created. The first element of outcome_names will be the name of the BenchmarkMetric on the objective, and others (if pressent) will be for constraints.

  • lower_is_better – Whether the objective is a minimization problem. This only affects objectives, not constraints; for constraints, higher is better (feasible).

  • observe_noise_sd – Whether the standard deviation of the observation noise is observed. Applies to all objective and constraints.

  • use_map_metric – Whether to use a BenchmarkMapMetric.

Benchmark Result

class ax.benchmark.benchmark_result.AggregatedBenchmarkResult(name: str, results: list[BenchmarkResult], optimization_trace: DataFrame, score_trace: DataFrame, fit_time: list[float], gen_time: list[float])[source]

Bases: Base

The result of a benchmark test, or series of replications. Scalar data present in the BenchmarkResult is here represented as (mean, sem) pairs.

fit_time: list[float]
classmethod from_benchmark_results(results: list[BenchmarkResult]) AggregatedBenchmarkResult[source]

Aggregrates a list of BenchmarkResults. For various reasons (timeout, errors, etc.) each BenchmarkResult may have a different number of trials; aggregated traces and statistics are computed with and truncated to the minimum trial count to ensure each replication is included.

gen_time: list[float]
name: str
optimization_trace: DataFrame
results: list[BenchmarkResult]
score_trace: DataFrame
class ax.benchmark.benchmark_result.BenchmarkResult(name: str, seed: int, oracle_trace: ndarray[Any, dtype[_ScalarType_co]], inference_trace: ndarray[Any, dtype[_ScalarType_co]], optimization_trace: ndarray[Any, dtype[_ScalarType_co]], score_trace: ndarray[Any, dtype[_ScalarType_co]], fit_time: float, gen_time: float, experiment: Experiment | None = None, experiment_storage_id: str | None = None)[source]

Bases: Base

The result of a single optimization loop from one (BenchmarkProblem, BenchmarkMethod) pair.

Parameters:
  • name – Name of the benchmark. Should make it possible to determine the problem and the method.

  • seed – Seed used for determinism.

  • oracle_trace – For single-objective problems, element i of the optimization trace is the best oracle value of the arms evaluated after the first i trials. For multi-objective problems, element i of the optimization trace is the hypervolume of the oracle values of the arms in the first i trials (which may be ``BatchTrial``s). Oracle values are typically ground-truth (rather than noisy) and evaluated at the target task and fidelity.

  • inference_trace

    Inference trace comes from choosing a “best” point based only on data that would be observable in realistic settings and then evaluating the oracle value of that point. For multi-objective problems, we find a Pareto set and evaluate its hypervolume.

    There are several ways of specifying the “best” point: One could pick the point with the best observed value, or the point with the best model prediction, and could consider the whole search space, the set of trials completed so far, etc. How the inference trace is computed is specified by a best-point selector, which is an attribute of the BenchmarkMethod.

    Note: This is not “inference regret”, which is a lower-is-better value that is relative to the best possible value. The inference value trace is higher-is-better if the problem is a maximization problem or if the problem is multi-objective (in which case hypervolume is used). Hence, it is signed the same as oracle_trace and optimization_trace. score_trace is higher-is-better and relative to the optimum.

  • optimization_trace – Either the oracle_trace or the inference_trace, depending on whether the BenchmarkProblem specifies report_inference_value. Having optimization_trace specified separately is useful when we need just one value to evaluate how well the benchmark went.

  • score_trace – The scores associated with the problem, typically either the optimization_trace or inference_value_trace normalized to a 0-100 scale for comparability between problems.

  • fit_time – Total time spent fitting models.

  • gen_time – Total time spent generating candidates.

  • experiment – If not None, the Ax experiment associated with the optimization that generated this data. Either experiment or experiment_storage_id must be provided.

  • experiment_storage_id – Pointer to location where experiment data can be read.

experiment: Experiment | None = None
experiment_storage_id: str | None = None
fit_time: float
gen_time: float
inference_trace: ndarray[Any, dtype[_ScalarType_co]]
name: str
optimization_trace: ndarray[Any, dtype[_ScalarType_co]]
oracle_trace: ndarray[Any, dtype[_ScalarType_co]]
score_trace: ndarray[Any, dtype[_ScalarType_co]]
seed: int

Benchmark

Module for benchmarking Ax algorithms.

Key terms used:

  • Replication: 1 run of an optimization loop; (BenchmarkProblem, BenchmarkMethod) pair.

  • Test: multiple replications, ran for statistical significance.

  • Full run: multiple tests on many (BenchmarkProblem, BenchmarkMethod) pairs.

  • Method: (one of) the algorithm(s) being benchmarked.

  • Problem: a synthetic function, a surrogate surface, or an ML model, on which to assess the performance of algorithms.

ax.benchmark.benchmark.benchmark_multiple_problems_methods(problems: Iterable[BenchmarkProblem], methods: Iterable[BenchmarkMethod], seeds: Iterable[int]) list[AggregatedBenchmarkResult][source]

For each problem and method in the Cartesian product of problems and methods, run the replication on each seed in seeds and get the results as an AggregatedBenchmarkResult, then return a list of each AggregatedBenchmarkResult.

ax.benchmark.benchmark.benchmark_one_method_problem(problem: BenchmarkProblem, method: BenchmarkMethod, seeds: Iterable[int]) AggregatedBenchmarkResult[source]
ax.benchmark.benchmark.benchmark_replication(problem: BenchmarkProblem, method: BenchmarkMethod, seed: int, strip_runner_before_saving: bool = True) BenchmarkResult[source]

Run one benchmarking replication (equivalent to one optimization loop).

After each trial, the method gets the best parameter(s) found so far, as evaluated based on empirical data. After all trials are run, the problem gets the oracle values of each “best” parameter; this yields the inference trace. The cumulative maximum of the oracle value of each parameterization tested is the oracle_trace.

Parameters:
  • problem – The BenchmarkProblem to test against (can be synthetic or real)

  • method – The BenchmarkMethod to test

  • seed – The seed to use for this replication.

  • strip_runner_before_saving – Whether to strip the runner from the experiment before saving it. This enables serialization.

Returns:

BenchmarkResult object.

ax.benchmark.benchmark.compute_baseline_value_from_sobol(optimization_config: OptimizationConfig, search_space: SearchSpace, test_function: BenchmarkTestFunction, target_fidelity_and_task: Mapping[str, None | str | bool | float | int] | None = None, n_repeats: int = 50) float[source]

Compute the baseline_value that will be assigned to a BenchmarkProblem.

Computed by taking the best of five quasi-random Sobol trials and then repeating 50 times. The value is evaluated at the ground truth (noiseless and at the target task and fidelity).

Parameters:
  • optimization_config – Typically, the optimization_config of a BenchmarkProblem (or that will later be used to define a BenchmarkProblem).

  • search_space – Similarly, the search_space of a BenchmarkProblem.

  • test_function – Similarly, the test_function of a BenchmarkProblem.

  • target_fidelity_and_task – Typically, the target_fidelity_and_task of a BenchmarkProblem.

  • n_repeats – Number of times to repeat the five Sobol trials.

ax.benchmark.benchmark.compute_score_trace(optimization_trace: ndarray[Any, dtype[_ScalarType_co]], baseline_value: float, optimal_value: float) ndarray[Any, dtype[_ScalarType_co]][source]

Compute a score trace from the optimization trace.

Score is expressed as a percentage of possible improvement over a baseline. A higher score is better.

Element i of the score trace is optimization_trace[i] - baseline expressed as a percent of optimal_value - baseline, where baseline is optimization_trace[num_baseline_trials - 1]. It can be over 100 if values better than optimal_value are attained or below 0 if values worse than the baseline value are attained.

Parameters:
  • optimization_trace – Objective values. Can be either higher- or lower-is-better.

  • baseline_value – Value to use as a baseline. Any values that are not better than the baseline will receive negative scores.

  • optimal_value – The best possible value of the objective; when the optimization_trace equals the optimal_value, the score is 100.

ax.benchmark.benchmark.get_benchmark_runner(problem: BenchmarkProblem, max_concurrency: int = 1) BenchmarkRunner[source]

Construct a BenchmarkRunner for the given problem and concurrency.

If max_concurrency > 1 or if there is a sample_runtime_func is present on BenchmarkProblem, construct a SimulatedBenchmarkRunner to track when trials start and stop.

Parameters:
  • problem – The BenchmarkProblem; provides a BenchmarkTestFunction (used to generate data) and step_runtime_function (used to determine timing for the simulator).

  • max_concurrency – The maximum number of trials that can be run concurrently. Typically, max_pending_trials from SchedulerOptions, which are stored on the BenchmarkMethod.

ax.benchmark.benchmark.get_benchmark_scheduler_options(method: BenchmarkMethod, include_sq: bool = False) SchedulerOptions[source]

Get the SchedulerOptions for the given BenchmarkMethod.

Parameters:
  • method – The BenchmarkMethod.

  • include_sq – Whether to include the status quo in each trial.

Returns:

SchedulerOptions

ax.benchmark.benchmark.get_oracle_experiment_from_experiment(problem: BenchmarkProblem, experiment: Experiment) Experiment[source]

Get an Experiment that is the same as the original experiment but has metrics evaluated at oracle values (noiseless ground-truth values evaluated at the target task and fidelity)

ax.benchmark.benchmark.get_oracle_experiment_from_params(problem: BenchmarkProblem, dict_of_dict_of_params: Mapping[int, Mapping[str, Mapping[str, None | str | bool | float | int]]]) Experiment[source]

Get a new experiment with the same search space and optimization config as those belonging to this problem, but with parameterizations evaluated at oracle values (noiseless ground-truth values evaluated at the target task and fidelity).

Parameters:
  • problemBenchmarkProblem from which to take a test function for generating metrics, as well as a search space and optimization config for generating an experiment.

  • dict_of_dict_of_params – Keys are trial indices, values are Mappings (e.g. dicts) that map arm names to parameterizations.

Example

>>> get_oracle_experiment_from_params(
...     problem=problem,
...     dict_of_dict_of_params={
...         0: {
...            "0_0": {"x0": 0.0, "x1": 0.0},
...            "0_1": {"x0": 0.3, "x1": 0.4},
...         },
...         1: {"1_0": {"x0": 0.0, "x1": 0.0}},
...     }
... )

Benchmark Runner

class ax.benchmark.benchmark_runner.BenchmarkRunner(*, test_function: BenchmarkTestFunction, noise_std: float | Sequence[float] | Mapping[str, float] = 0.0, step_runtime_function: TBenchmarkStepRuntimeFunction | None = None, max_concurrency: int = 1)[source]

Bases: Runner

A Runner that produces both observed and ground-truth values.

Observed values equal ground-truth values plus noise, with the noise added according to the standard deviations returned by get_noise_stds().

This runner does require that every benchmark has a ground truth, which won’t necessarily be true for real-world problems. Such problems fall into two categories:

  • If they are deterministic, they can be used with this runner by viewing them as noiseless problems where the observed values are the ground truth. The observed values will be used for tracking the progress of optimization.

  • If they are not deterministc, they are not supported. It is not conceptually clear how to benchmark such problems, so we decided to not over-engineer for that before such a use case arrives.

If max_concurrency is left as default (1), trials run serially and complete immediately. Otherwise, a SimulatedBackendRunner is constructed to track the status of trials.

Parameters:
  • test_function – A BenchmarkTestFunction from which to generate deterministic data before adding noise.

  • noise_std – The standard deviation of the noise added to the data. Can be a list or dict to be per-metric.

  • step_runtime_function – A function that takes in parameters (in TParameterization format) and returns the runtime of a step.

  • max_concurrency – The maximum number of trials that can be running at a given time. Typically, this is max_pending_trials from the scheduler_options on the BenchmarkMethod.

classmethod deserialize_init_args(args: dict[str, Any], decoder_registry: dict[str, type[T] | Callable[[...], T]] | None = None, class_decoder_registry: dict[str, Callable[[dict[str, Any]], Any]] | None = None) dict[str, Any][source]

It is tricky to use SerializationMixin with instances that have Ax objects as attributes, as BenchmarkRunners do. Therefore, serialization is not supported.

get_Y_true(params: Mapping[str, None | str | bool | float | int]) ndarray[Any, dtype[_ScalarType_co]][source]

Evaluates the test problem.

Returns:

An array of ground truth (noiseless) evaluations, with shape (len(outcome_names), n_intervals) if is_map is True, and (len(outcome_names), 1) otherwise.

get_noise_stds() dict[str, float][source]
max_concurrency: int = 1
noise_std: float | Sequence[float] | Mapping[str, float] = 0.0
property outcome_names: Sequence[str]

The names of the outcomes.

poll_trial_status(trials: Iterable[BaseTrial]) dict[TrialStatus, set[int]][source]

Checks the status of any non-terminal trials and returns their indices as a mapping from TrialStatus to a list of indices. Required for runners used with Ax Scheduler.

NOTE: Does not need to handle waiting between polling calls while trials are running; this function should just perform a single poll.

Parameters:

trials – Trials to poll.

Returns:

A dictionary mapping TrialStatus to a list of trial indices that have the respective status at the time of the polling. This does not need to include trials that at the time of polling already have a terminal (ABANDONED, FAILED, COMPLETED) status (but it may).

run(trial: BaseTrial) dict[str, BenchmarkTrialMetadata][source]

Run the trial by evaluating its parameterization(s).

Parameters:

trial – The trial to evaluate.

Returns:

metadata}, where metadata is a BenchmarkTrialMetadata.

Return type:

A dictionary {“benchmark_metadata”

classmethod serialize_init_args(obj: Any) dict[str, Any][source]

It is tricky to use SerializationMixin with instances that have Ax objects as attributes, as BenchmarkRunners do. Therefore, serialization is not supported.

simulated_backend_runner: SimulatedBackendRunner | None
step_runtime_function: TBenchmarkStepRuntimeFunction | None = None
stop(trial: BaseTrial, reason: str | None = None) dict[str, Any][source]

Stop a trial based on custom runner subclass implementation.

Optional method.

Parameters:
  • trial – The trial to stop.

  • reason – A message containing information why the trial is to be stopped.

Returns:

A dictionary of run metadata from the stopping process.

test_function: BenchmarkTestFunction
ax.benchmark.benchmark_runner.get_total_runtime(trial: BaseTrial, step_runtime_function: TBenchmarkStepRuntimeFunction | None, n_steps: int) float[source]

Get the total runtime of a trial.

Benchmark Test Function

class ax.benchmark.benchmark_test_function.BenchmarkTestFunction(*, outcome_names: Sequence[str], n_steps: int = 1)[source]

Bases: ABC

The basic Ax class for generating deterministic data to benchmark against.

(Noise - if desired - is added by the runner.)

Parameters:
  • outcome_names – Names of the outcomes.

  • n_steps – Number of data points produced per metric and per evaluation. 1 if data is not time-series. If data is time-series, this will eventually become the number of values on a MapMetric for evaluations that run to completion.

abstract evaluate_true(params: Mapping[str, None | str | bool | float | int]) Tensor[source]

Evaluate noiselessly.

Returns:

A 2d tensor of shape (len(self.outcome_names), self.n_steps).

n_steps: int = 1
outcome_names: Sequence[str]

Benchmark Step Runtime Function

class ax.benchmark.benchmark_step_runtime_function.TBenchmarkStepRuntimeFunction(*args, **kwargs)[source]

Bases: Protocol

Benchmark Trial Metadata

class ax.benchmark.benchmark_trial_metadata.BenchmarkTrialMetadata(*, dfs: Mapping[str, DataFrame], backend_simulator: BackendSimulator | None = None)[source]

Bases: object

Data pertaining to one trial evaluation.

Parameters:
  • df – A dict mapping each metric name to a Pandas DataFrame with columns [“metric_name”, “arm_name”, “mean”, “sem”, and “step”]. The “sem” is always present in this df even if noise levels are unobserved; BenchmarkMetric and BenchmarkMapMetric hide that data if it should not be observed, and ``BenchmarkMapMetric``s drop data from time periods that that are not observed based on the (simulated) trial progression.

  • backend_simulator – Optionally, the backend simulator that is tracking the trial’s status.

backend_simulator: BackendSimulator | None = None
dfs: Mapping[str, DataFrame]

Benchmark Methods Modular BoTorch

ax.benchmark.methods.modular_botorch.get_sobol_botorch_modular_acquisition(model_cls: type[Model], acquisition_cls: type[AcquisitionFunction], distribute_replications: bool, name: str | None = None, num_sobol_trials: int = 5, model_gen_kwargs: dict[str, Any] | None = None, use_model_predictions_for_best_point: bool = False, batch_size: int = 1) BenchmarkMethod[source]

Get a BenchmarkMethod that uses Sobol followed by MBM.

Parameters:
  • model_cls – BoTorch model class, e.g. SingleTaskGP

  • acquisition_cls – Acquisition function class, e.g. qLogNoisyExpectedImprovement.

  • distribute_replications – Whether to use multiple machines

  • scheduler_options – Passed as-is to scheduler. Default: get_benchmark_scheduler_options().

  • name – Name that will be attached to the GenerationStrategy.

  • num_sobol_trials – Number of Sobol trials; if the scheduler_options specify to use `BatchTrial`s, then this refers to the number of `BatchTrial`s.

  • model_gen_kwargs – Passed to the BoTorch GenerationStep and ultimately to the BoTorch Model.

  • use_model_predictions_for_best_point – Passed to the created BenchmarkMethod.

  • batch_size – Passed to the created BenchmarkMethod.

Example

>>> # A simple example
>>> from ax.benchmark.methods.sobol_botorch_modular import (
...     get_sobol_botorch_modular_acquisition
... )
>>> from ax.benchmark.benchmark_method import get_benchmark_scheduler_options
>>>
>>> method = get_sobol_botorch_modular_acquisition(
...     model_cls=SingleTaskGP,
...     acquisition_cls=qLogNoisyExpectedImprovement,
...     distribute_replications=False,
... )
>>> # Pass sequential=False to BoTorch's optimize_acqf
>>> batch_method = get_sobol_botorch_modular_acquisition(
...     model_cls=SingleTaskGP,
...     acquisition_cls=qLogNoisyExpectedImprovement,
...     distribute_replications=False,
...     batch_size=5,
...     model_gen_kwargs={
...         "model_gen_options": {
...             "optimizer_kwargs": {"sequential": False}
...         }
...     },
...     num_sobol_trials=1,
... )
ax.benchmark.methods.modular_botorch.get_sobol_mbm_generation_strategy(model_cls: type[Model], acquisition_cls: type[AcquisitionFunction], name: str | None = None, num_sobol_trials: int = 5, model_gen_kwargs: dict[str, Any] | None = None, batch_size: int = 1) GenerationStrategy[source]

Get a BenchmarkMethod that uses Sobol followed by MBM.

Parameters:
  • model_cls – BoTorch model class, e.g. SingleTaskGP

  • acquisition_cls – Acquisition function class, e.g. qLogNoisyExpectedImprovement.

  • scheduler_options – Passed as-is to scheduler. Default: get_benchmark_scheduler_options().

  • name – Name that will be attached to the GenerationStrategy.

  • num_sobol_trials – Number of Sobol trials; if the scheduler_options specify to use `BatchTrial`s, then this refers to the number of `BatchTrial`s.

  • model_gen_kwargs – Passed to the BoTorch GenerationStep and ultimately to the BoTorch Model.

Example

>>> # A simple example
>>> from ax.benchmark.methods.sobol_botorch_modular import (
...     get_sobol_mbm_generation_strategy
... )
>>> from ax.benchmark.benchmark_method import get_benchmark_scheduler_options
>>> gs = get_sobol_mbm_generation_strategy(
...     model_cls=SingleTaskGP,
...     acquisition_cls=qLogNoisyExpectedImprovement,
...     distribute_replications=False,
... )

Benchmark Methods Sobol

ax.benchmark.methods.sobol.get_sobol_benchmark_method(distribute_replications: bool, batch_size: int = 1) BenchmarkMethod[source]
ax.benchmark.methods.sobol.get_sobol_generation_strategy() GenerationStrategy[source]

Benchmark Problems Registry

class ax.benchmark.problems.registry.BenchmarkProblemRegistryEntry(factory_fn: collections.abc.Callable[..., ax.benchmark.benchmark_problem.BenchmarkProblem], factory_kwargs: dict[str, Any])[source]

Bases: object

factory_fn: Callable[[...], BenchmarkProblem]
factory_kwargs: dict[str, Any]
ax.benchmark.problems.registry.get_problem(problem_key: str, registry: Mapping[str, BenchmarkProblemRegistryEntry] | None = None, **additional_kwargs: Any) BenchmarkProblem[source]

Generate a benchmark problem from a key, registry, and additional arguments.

Parameters:
  • problem_key – The key by which a BenchmarkProblemRegistryEntry is looked up in the registry; a problem will then be generated from that entry and additional_kwargs. Note that this is not necessarily the same as the name attribute of the problem, and that one problem_key can generate several different BenchmarkProblem`s by passing `additional_kwargs. However, it is a good practice to maintain a 1:1 mapping between problem_key and the name.

  • registry – If not provided, uses BENCHMARK_PROBLEM_REGISTRY to use problems defined within Ax.

  • additional_kwargs – Additional kwargs to pass to the factory function of the BenchmarkProblemRegistryEntry.

Benchmark Problems High Dimensional Embedding

ax.benchmark.problems.hd_embedding.embed_higher_dimension(problem: TProblem, total_dimensionality: int) TProblem[source]

Return a new BenchmarkProblem with enough RangeParameter`s added to the search space to make its total dimensionality equal to `total_dimensionality and add total_dimensionality to its name.

The search space of the original problem is within the search space of the new problem, and the constraints are copied from the original problem.

Benchmark Problems Mixed Integer Synthetic

Mixed integer extensions of some common synthetic test functions. These are adapted from [Daulton2022bopr].

References

[Daulton2022bopr]

S. Daulton, X. Wan, D. Eriksson, M. Balandat, M. A. Osborne, E. Bakshy. Bayesian Optimization over Discrete and Mixed Spaces via Probabilistic Reparameterization. Advances in Neural Information Processing Systems 35, 2022.

ax.benchmark.problems.synthetic.discretized.mixed_integer.get_discrete_ackley(num_trials: int = 50, observe_noise_sd: bool = False, bounds: list[tuple[float, float]] | None = None) BenchmarkProblem[source]

13D Ackley problem where first 10 dimensions are discretized.

This also restricts Ackley evaluation bounds to [0, 1].

ax.benchmark.problems.synthetic.discretized.mixed_integer.get_discrete_hartmann(num_trials: int = 50, observe_noise_sd: bool = False, bounds: list[tuple[float, float]] | None = None) BenchmarkProblem[source]

6D Hartmann problem where first 4 dimensions are discretized.

ax.benchmark.problems.synthetic.discretized.mixed_integer.get_discrete_rosenbrock(num_trials: int = 50, observe_noise_sd: bool = False, bounds: list[tuple[float, float]] | None = None) BenchmarkProblem[source]

10D Rosenbrock problem where first 6 dimensions are discretized.

Benchmark Problems Jenatton

class ax.benchmark.problems.synthetic.hss.jenatton.Jenatton(*, outcome_names: Sequence[str], n_steps: int = 1)[source]

Bases: BenchmarkTestFunction

Jenatton test function for hierarchical search spaces.

evaluate_true(params: Mapping[str, float | int | None]) Tensor[source]

Evaluate noiselessly.

Returns:

A 2d tensor of shape (len(self.outcome_names), self.n_steps).

ax.benchmark.problems.synthetic.hss.jenatton.get_jenatton_benchmark_problem(num_trials: int = 50, observe_noise_sd: bool = False, noise_std: float = 0.0) BenchmarkProblem[source]
ax.benchmark.problems.synthetic.hss.jenatton.get_jenatton_search_space() HierarchicalSearchSpace[source]
ax.benchmark.problems.synthetic.hss.jenatton.jenatton_test_function(x1: int | None = None, x2: int | None = None, x3: int | None = None, x4: float | None = None, x5: float | None = None, x6: float | None = None, x7: float | None = None, r8: float | None = None, r9: float | None = None) float[source]

Jenatton test function for hierarchical search spaces.

This function is taken from:

R. Jenatton, C. Archambeau, J. González, and M. Seeger. Bayesian optimization with tree-structured dependencies. ICML 2017.

Benchmark Problems PyTorchCNN

Benchmark Problems PyTorchCNN TorchVision

class ax.benchmark.problems.hpo.torchvision.CNN[source]

Bases: Module

forward(x: Tensor) Tensor[source]

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class ax.benchmark.problems.hpo.torchvision.PyTorchCNNTorchvisionBenchmarkTestFunction(*, outcome_names: list[str] = <factory>, n_steps: int = 1, name: str, device: torch.device = <factory>, train_loader: dataclasses.InitVar[torch.utils.data.dataloader.DataLoader | None] = None, test_loader: dataclasses.InitVar[torch.utils.data.dataloader.DataLoader | None] = None)[source]

Bases: BenchmarkTestFunction

device: device
evaluate_true(params: Mapping[str, int | float]) Tensor[source]

Evaluate noiselessly.

Returns:

A 2d tensor of shape (len(self.outcome_names), self.n_steps).

name: str
outcome_names: list[str]
test_loader: dataclasses.InitVar[torch.utils.data.dataloader.DataLoader | None] = None
train_loader: dataclasses.InitVar[torch.utils.data.dataloader.DataLoader | None] = None
ax.benchmark.problems.hpo.torchvision.get_pytorch_cnn_torchvision_benchmark_problem(name: str, num_trials: int) BenchmarkProblem[source]
ax.benchmark.problems.hpo.torchvision.train_and_evaluate(lr: float, momentum: float, weight_decay: float, step_size: int, gamma: float, device: device, train_loader: DataLoader, test_loader: DataLoader) float[source]

Return the fraction of correctly classified test examples.

Benchmark Problems Runtime Functions

ax.benchmark.problems.runtime_funcs.int_from_params(params: Mapping[str, None | str | bool | float | int], n_possibilities: int = 10) int[source]

Get an int between 0 and n_possibilities - 1, using a hash of the parameters.

Benchmark Test Functions: BoTorch Test

class ax.benchmark.benchmark_test_functions.botorch_test.BoTorchTestFunction(*, outcome_names: Sequence[str], n_steps: int = 1, botorch_problem: BaseTestProblem, modified_bounds: Sequence[tuple[float, float]] | None = None)[source]

Bases: BenchmarkTestFunction

Class for generating data from a BoTorch BaseTestProblem.

Parameters:
  • outcome_names – Names of outcomes. Should have the same length as the dimension of the test function, including constraints.

  • botorch_problem – The BoTorch BaseTestProblem.

  • modified_bounds – The bounds that are used by the Ax search space while optimizing the problem. If different from the bounds of the test problem, we project the parameters into the test problem bounds before evaluating the test problem. For example, if the test problem is defined on [0, 1] but the Ax search space is integers in [0, 10], an Ax parameter value of 5 will correspond to 0.5 while evaluating the test problem. If modified bounds are not provided, the test problem will be evaluated using the raw parameter values.

botorch_problem: BaseTestProblem
evaluate_true(params: Mapping[str, float | int]) Tensor[source]

Evaluate noiselessly.

Returns:

A 2d tensor of shape (len(self.outcome_names), self.n_steps).

modified_bounds: Sequence[tuple[float, float]] | None = None
outcome_names: Sequence[str]
tensorize_params(params: Mapping[str, int | float]) Tensor[source]

Benchmark Test Functions: Surrogate

class ax.benchmark.benchmark_test_functions.surrogate.SurrogateTestFunction(*, outcome_names: list[str], n_steps: int = 1, name: str, _surrogate: TorchModelBridge | None = None, get_surrogate: None | Callable[[], TorchModelBridge] = None)[source]

Bases: BenchmarkTestFunction

Data-generating function for surrogate benchmark problems.

Parameters:
  • name – The name of the runner.

  • outcome_names – Names of outcomes to return in evaluate_true, if the surrogate produces more outcomes than are needed.

  • _surrogate – Either None, or a TorchModelBridge surrogate to use for generating observations. If None, get_surrogate must not be None and will be used to generate the surrogate when it is needed.

  • get_surrogate – Function that returns the surrogate, to allow for lazy construction. If get_surrogate is not provided, surrogate must be provided and vice versa.

evaluate_true(params: Mapping[str, None | str | bool | float | int]) Tensor[source]

Evaluate noiselessly.

Returns:

A 2d tensor of shape (len(self.outcome_names), self.n_steps).

get_surrogate: None | Callable[[], TorchModelBridge] = None
name: str
outcome_names: list[str]
property surrogate: TorchModelBridge