ax.benchmark¶
Benchmark¶
Benchmark Problem¶
-
class
ax.benchmark.benchmark_problem.
BenchmarkProblem
(search_space: ax.core.search_space.SearchSpace, optimization_config: ax.core.optimization_config.OptimizationConfig, name: Optional[str] = None, optimal_value: Optional[float] = None, evaluate_suggested: bool = True)[source]¶ Bases:
ax.utils.common.base.Base
Benchmark problem, represented in terms of Ax search space and optimization config. Useful to represent complex problems that involve constaints, non- range parameters, etc.
Note: if this problem is computationally intensive, consider setting evaluate_suggested argument to False.
- Parameters
search_space – Problem domain.
optimization_config – Problem objective and constraints. Note that by default, an Objective in the OptimizationConfig has minimize set to False, so by default an OptimizationConfig is that of maximization.
name – Optional name of the problem, will default to the name of the objective metric (e.g., “Branin” or “Branin_constrainted” if constraints are present). The name of the problem is reflected in the names of the benchmarking experiments (e.g. “Sobol_on_Branin”).
optimal_value – Optional target objective value for the optimization.
evaluate_suggested – Whether the model-predicted best value should be evaluated when benchmarking on this problem. Note that in practice, this means that for every model-generated trial, an extra point will be evaluated. This extra point is often different from the model- generated trials, since those trials aim to both explore and exploit, so the aim is not usually to suggest the current model-predicted optimum.
-
optimization_config
: ax.core.optimization_config.OptimizationConfig¶
-
search_space
: ax.core.search_space.SearchSpace¶
-
class
ax.benchmark.benchmark_problem.
SimpleBenchmarkProblem
(f: Union[ax.utils.measurement.synthetic_functions.SyntheticFunction, function], name: Optional[str] = None, domain: Optional[List[Tuple[float, float]]] = None, optimal_value: Optional[float] = None, minimize: bool = False, noise_sd: float = 0.0, evaluate_suggested: bool = True)[source]¶ Bases:
ax.benchmark.benchmark_problem.BenchmarkProblem
Benchmark problem, represented in terms of simplified constructions: a callable function, a domain that consists or ranges, etc. This problem does not support parameter or outcome constraints.
Note: if this problem is computationally intensive, consider setting evaluate_suggested argument to False.
- Parameters
f – Ax SyntheticFunction or an ad-hoc callable that evaluates points represented as nd-arrays. Input to the callable should be an (n x d) array, where n is the number of points to evaluate, and d is the dimensionality of the points. Returns a float or an (1 x n) array. Used as problem objective.
name – Optional name of the problem, will default to the name of the objective metric (e.g., “Branin” or “Branin_constrainted” if constraints are present). The name of the problem is reflected in the names of the benchmarking experiments (e.g. “Sobol_on_Branin”).
domain – Problem domain as list of tuples. Parameter names will be derived from the length of this list, as {“x1”, …, “xN”}, where N is the length of this list.
optimal_value – Optional target objective value for the optimization.
minimize – Whether this is a minimization problem, defatuls to False.
noise_sd – Measure of the noise that will be added to the observations during the optimization. During the evaluation phase, true values will be extracted to measure a method’s performance. Only applicable when using a known SyntetheticFunction as the f argument.
evaluate_suggested – Whether the model-predicted best value should be evaluated when benchmarking on this problem. Note that in practice, this means that for every model-generated trial, an extra point will be evaluated. This extra point is often different from the model- generated trials, since those trials aim to both explore and exploit, so the aim is not usually to suggest the current model-predicted optimum.
-
domain_as_ax_client_parameters
() → List[Dict[str, Union[str, bool, float, int, None, List[Optional[Union[str, bool, float, int]]]]]][source]¶
-
f
: Union[ax.utils.measurement.synthetic_functions.SyntheticFunction, function]¶
Benchmark Result¶
-
class
ax.benchmark.benchmark_result.
BenchmarkResult
(true_performance: Dict[str, numpy.ndarray], fit_times: Dict[str, List[float]], gen_times: Dict[str, List[float]], optimum: Union[float, NoneType] = None, model_transitions: Union[Dict[str, Union[List[int], NoneType]], NoneType] = None, is_multi_objective: bool = False, pareto_frontiers: Union[Dict[str, ax.plot.pareto_utils.ParetoFrontierResults], NoneType] = None)[source]¶ Bases:
object
-
pareto_frontiers
: Optional[Dict[str, ax.plot.pareto_utils.ParetoFrontierResults]] = None¶
-
-
ax.benchmark.benchmark_result.
aggregate_problem_results
(runs: Dict[str, List[ax.core.experiment.Experiment]], problem: ax.benchmark.benchmark_problem.BenchmarkProblem, model_transitions: Optional[Dict[str, List[int]]] = None, is_asynchronous: bool = False, **kwargs) → ax.benchmark.benchmark_result.BenchmarkResult[source]¶
-
ax.benchmark.benchmark_result.
extract_optimization_trace
(experiment: ax.core.experiment.Experiment, problem: ax.benchmark.benchmark_problem.BenchmarkProblem, is_asynchronous: bool, **kwargs) → numpy.ndarray[source]¶ Extract outcomes of an experiment: best cumulative objective as numpy ND- array, and total model-fitting time and candidate generation time as floats.
-
ax.benchmark.benchmark_result.
generate_report
(benchmark_results: Dict[str, ax.benchmark.benchmark_result.BenchmarkResult], errors_encountered: Optional[List[str]] = None, include_individual_method_plots: bool = False, notebook_env: bool = False) → str[source]¶
-
ax.benchmark.benchmark_result.
make_plots
(benchmark_result: ax.benchmark.benchmark_result.BenchmarkResult, problem_name: str, include_individual: bool) → List[ax.plot.base.AxPlotConfig][source]¶
Benchmark¶
Module for benchmarking Ax algorithms.
Key terms used:
Trial –– usual Ax Trial or BatchTral, one execution of a given arm or group of arms.
Replication –– one run of an optimization loop; 1 method + problem combination.
Test –– multiple replications, ran for statistical significance.
Full run –– multiple tests: run all methods with all problems.
Method –– (one of) the algorithm(s) being benchmarked.
Problem –– a synthetic function, a surrogate surface, or an ML model, on which to assess the performance of algorithms.
-
class
ax.benchmark.benchmark.
AsyncBenchmarkOptions
(scheduler_options: Optional[ax.service.scheduler.SchedulerOptions] = None, backend_options: Optional[ax.utils.testing.backend_simulator.BackendSimulatorOptions] = None, sample_runtime_func: Optional[Callable[[ax.core.base_trial.BaseTrial], float]] = None, timeout_hours: Optional[int] = None, max_pending_trials: int = 10, early_stopping_strategy: Optional[ax.early_stopping.strategies.BaseEarlyStoppingStrategy] = None)[source]¶ Bases:
object
Options used in an async, Scheduler-based benchmark:
- Parameters
scheduler_options – Options passed to the
AsyncSimulatedBackendScheduler
.backend_options – Options passed to the
BackendSimulator
.sample_runtime_func – A method to sample a runtime given a trial.
timeout_hours – The number of hours to run before timing out, passed to the
AsyncSimulatedBackendScheduler
.max_pending_trials – The maximum number of pending trials, which is passed to the
AsyncSimulatedBackendScheduler
.early_stopping_strategy – The early stopping strategy.
-
backend_options
: Optional[ax.utils.testing.backend_simulator.BackendSimulatorOptions] = None¶
-
early_stopping_strategy
: Optional[ax.early_stopping.strategies.BaseEarlyStoppingStrategy] = None¶
-
sample_runtime_func
: Optional[Callable[[ax.core.base_trial.BaseTrial], float]] = None¶
-
scheduler_options
: Optional[ax.service.scheduler.SchedulerOptions] = None¶
-
exception
ax.benchmark.benchmark.
NonRetryableBenchmarkingError
[source]¶ Bases:
ValueError
Error that indicates an issue with the benchmarking setup (e.g. unexpected problem setup, a benchmarking function called incorrectly, etc.) –– something that prevents the benchmarking suite itself from running, rather than an error that occurs during the runs of the benchmarking trials, replications, or tests.
-
ax.benchmark.benchmark.
benchmark_minimize_callable
(problem: ax.benchmark.benchmark_problem.BenchmarkProblem, num_trials: int, method_name: str, replication_index: Optional[int] = None) → Tuple[ax.core.experiment.Experiment, Callable[[List[float]], float]][source]¶ An interface for evaluating external methods on Ax benchmark problems. The arms run and performance will be tracked by Ax, so the external method can be evaluated alongside Ax methods.
It is designed around methods that implement an interface like scipy.optimize.minimize. This function will return a callable evaluation function that takes in an array of parameter values and returns a float objective value. The evaluation function should always be minimized: if the benchmark problem is a maximization problem, then the value returned by the evaluation function will be negated so it can be used directly by methods that minimize. This callable can be given to an external minimization function, and Ax will track all of the calls made to it and the arms that were evaluated.
This will also return an Experiment object that will track the arms evaluated by the external method in the same way as done for Ax internal benchmarks. This function should thus be used for each benchmark replication.
- Parameters
problem – The Ax benchmark problem to be used to construct the evalutaion function.
num_trials – The maximum number of trials for a benchmark run.
method_name – Name of the method being tested.
replication_index – Replicate number, if multiple replicates are being run.
-
ax.benchmark.benchmark.
benchmark_replication
(problem: ax.benchmark.benchmark_problem.BenchmarkProblem, method: ax.modelbridge.generation_strategy.GenerationStrategy, num_trials: int, replication_index: Optional[int] = None, batch_size: int = 1, raise_all_exceptions: bool = False, benchmark_trial: function = <function benchmark_trial>, verbose_logging: bool = True, failed_trials_tolerated: int = 5, async_benchmark_options: Optional[ax.benchmark.benchmark.AsyncBenchmarkOptions] = None) → ax.core.experiment.Experiment[source]¶ Runs one benchmarking replication (equivalent to one optimization loop).
- Parameters
problem – Problem to benchmark on.
method – Method to benchmark, represented as generation strategies.
num_trials – Number of trials in each test experiment.
batch_size – Batch size for this replication, defaults to 1.
raise_all_exceptions – If set to True, any encountered exception will be raised; alternatively, failure tolerance thresholds are used and a few number of trials failed_trials_tolerated can fail before a replication is considered failed.
benchmark_trial – Function that runs a single trial. Defaults to benchmark_trial in this module and must have the same signature.
verbose_logging – Whether logging level should be set to INFO.
failed_trials_tolerated – How many trials can fail before a replication is considered failed and aborted. Defaults to 5.
async_benchmark_options – Options to use for the case of an async, Scheduler-based benchmark. If omitted, a synchronous benchmark (possibly with batch sizes greater than one) is run without using a Scheduler.
-
ax.benchmark.benchmark.
benchmark_test
(problem: ax.benchmark.benchmark_problem.BenchmarkProblem, method: ax.modelbridge.generation_strategy.GenerationStrategy, num_trials: int, num_replications: int = 20, batch_size: int = 1, raise_all_exceptions: bool = False, benchmark_replication: function = <function benchmark_replication>, benchmark_trial: function = <function benchmark_trial>, verbose_logging: bool = True, failed_trials_tolerated: int = 5, failed_replications_tolerated: int = 3, async_benchmark_options: Optional[ax.benchmark.benchmark.AsyncBenchmarkOptions] = None) → List[ax.core.experiment.Experiment][source]¶ Runs one benchmarking test (equivalent to one problem-method combination), translates into num_replication replications, ran for statistical significance of the results.
- Parameters
problem – Problem to benchmark on.
method – Method to benchmark, represented as generation strategies.
num_replications – Number of times to run each test (each problem-method combination), for an aggregated result.
num_trials – Number of trials in each test experiment, defaults to 20.
batch_size – Batch size for this test, defaults to 1.
raise_all_exceptions – If set to True, any encountered exception will be raised; alternatively, failure tolerance thresholds are used and a few number of trials failed_trials_tolerated can fail before a replication is considered failed, as well some replications failed_replications_tolerated can fail before a benchmarking test is considered failed.
benchmark_replication – Function that runs a single benchmarking replication. Defaults to benchmark_replication in this module and must have the same signature.
benchmark_trial – Function that runs a single trial. Defaults to benchmark_trial in this module and must have the same signature.
verbose_logging – Whether logging level should be set to INFO.
failed_trials_tolerated – How many trials can fail before a replication is considered failed and aborted. Defaults to 5.
failed_replications_tolerated – How many replications can fail before a test is considered failed and aborted. Defaults to 3.
async_benchmark_options – Options to use for the case of an async, Scheduler-based benchmark. If omitted, a synchronous benchmark (possibly with batch sizes greater than one) is run without using a Scheduler.
-
ax.benchmark.benchmark.
benchmark_trial
(parameterization: Optional[numpy.ndarray] = None, evaluation_function: Optional[Union[ax.utils.measurement.synthetic_functions.SyntheticFunction, function]] = None, experiment: Optional[ax.core.experiment.Experiment] = None, trial_index: Optional[int] = None) → Union[Tuple[float, float], ax.core.abstract_data.AbstractDataFrameData][source]¶ Evaluates one trial from benchmarking replication (an Ax trial or batched trial). Evaluation requires either the parameterization and evaluation_ function parameters or the experiment and trial_index parameters.
Note: evaluation function relies on the ordering of items in the parameterization nd-array.
- Parameters
parameterization – The parameterization to evaluate.
evaluation_function – The evaluation function for the benchmark objective.
experiment – Experiment, for a trial on which to fetch data.
trial_index – Index of the trial, for which to fetch data.
-
ax.benchmark.benchmark.
full_benchmark_run
(problem_groups: Optional[Dict[str, Union[List[ax.benchmark.benchmark_problem.BenchmarkProblem], List[str]]]] = None, method_groups: Optional[Dict[str, Union[List[ax.modelbridge.generation_strategy.GenerationStrategy], List[str]]]] = None, num_trials: Union[int, List[List[int]]] = 20, num_replications: int = 20, batch_size: Union[int, List[List[int]]] = 1, raise_all_exceptions: bool = False, benchmark_test: function = <function benchmark_test>, benchmark_replication: function = <function benchmark_replication>, benchmark_trial: function = <function benchmark_trial>, verbose_logging: bool = True, failed_trials_tolerated: int = 5, failed_replications_tolerated: int = 3, async_benchmark_options: Optional[ax.benchmark.benchmark.AsyncBenchmarkOptions] = None) → Dict[str, Dict[str, List[ax.core.experiment.Experiment]]][source]¶ Full run of the benchmarking suite. To make benchmarking distrubuted at a level of a test, a replication, or a trial (or any combination of those), by passing in a wrapped (in some scheduling logic) version of a corresponding function from this module.
Here, problem_groups and method_groups are dictionaries that have the same keys such that we can run a specific subset of problems with a corresponding subset of methods.
Example:
problem_groups = { "single_fidelity": [ackley, branin], "multi_fidelity": [augmented_hartmann], } method_groups = { "single_fidelity": [single_task_GP_and_NEI_strategy], "multi_fidelity": [fixed_noise_MFGP_and_MFKG_strategy], }
Here, ackley and branin will be run against single_task_GP_and_NEI_strategy and augmented_hartmann against fixed_noise_MFGP_and_MFKG_strategy.
- Parameters
problem_groups – Problems to benchmark on, represented as a dictionary from category string to List of BenchmarkProblem-s or string keys (must be in standard BOProblems). More on problem_groups below.
method_groups – Methods to benchmark on, represented as a dictionary from category string to List of generation strategies or string keys (must be in standard BOMethods). More on method_groups below.
num_replications – Number of times to run each test (each problem-method combination), for an aggregated result.
num_trials – Number of trials in each test experiment.
raise_all_exceptions – If set to True, any encountered exception will be raised; alternatively, failure tolerance thresholds are used and a few number of trials failed_trials_tolerated can fail before a replication is considered failed, as well some replications failed_replications_tolerated can fail before a benchmarking test is considered failed.
benchmark_test – Function that runs a single benchmarking test. Defaults to benchmark_test in this module and must have the same signature.
benchmark_replication – Function that runs a single benchmarking replication. Defaults to benchmark_replication in this module and must have the same signature.
benchmark_trial – Function that runs a single trial. Defaults to benchmark_trial in this module and must have the same signature.
verbose_logging – Whether logging level should be set to INFO.
failed_trials_tolerated – How many trials can fail before a replication is considered failed and aborted. Defaults to 5.
failed_replications_tolerated – How many replications can fail before a test is considered failed and aborted. Defaults to 3.
async_benchmark_options – Options to use for the case of an async, Scheduler-based benchmark. If omitted, a synchronous benchmark (possibly with batch sizes greater than one) is run without using a Scheduler.
Benchmark Utilities¶
-
ax.benchmark.utils.
get_corresponding
(value_or_matrix: Union[int, List[List[int]]], row: int, col: int) → int[source]¶ If value_or_matrix is a matrix, extract the value in cell specified by row and col. If value_or_matrix is a scalar, just return it.
-
ax.benchmark.utils.
get_problems_and_methods
(problems: Optional[Union[List[ax.benchmark.benchmark_problem.BenchmarkProblem], List[str]]] = None, methods: Optional[Union[List[ax.modelbridge.generation_strategy.GenerationStrategy], List[str]]] = None) → Tuple[List[ax.benchmark.benchmark_problem.BenchmarkProblem], List[ax.modelbridge.generation_strategy.GenerationStrategy]][source]¶ Validate problems and methods; find them by string keys if passed as strings.
BoTorch Methods¶
-
ax.benchmark.botorch_methods.
fixed_noise_gp_model_constructor
(Xs: List[torch.Tensor], Ys: List[torch.Tensor], Yvars: List[torch.Tensor], task_features: List[int], fidelity_features: List[int], metric_names: List[str], state_dict: Optional[Dict[str, torch.Tensor]] = None, refit_model: bool = True, **kwargs: Any) → botorch.models.model.Model[source]¶
-
ax.benchmark.botorch_methods.
make_basic_generation_strategy
(name: str, acquisition: str, num_initial_trials: int = 14, surrogate_model_constructor: Callable = <function singletask_gp_model_constructor>) → ax.modelbridge.generation_strategy.GenerationStrategy[source]¶
-
ax.benchmark.botorch_methods.
singletask_gp_model_constructor
(Xs: List[torch.Tensor], Ys: List[torch.Tensor], Yvars: List[torch.Tensor], task_features: List[int], fidelity_features: List[int], metric_names: List[str], state_dict: Optional[Dict[str, torch.Tensor]] = None, refit_model: bool = True, **kwargs: Any) → botorch.models.model.Model[source]¶