Ax integrates easily with different scheduling frameworks and distributed training frameworks. In this example, Ax-driven optimization is executed in a distributed fashion using RayTune.
RayTune is a scalable framework for hyperparameter tuning that provides many state-of-the-art hyperparameter tuning algorithms and seamlessly scales from laptop to distributed cluster with fault tolerance. RayTune leverages Ray's Actor API to provide asynchronous parallel and distributed execution.
Ray 'Actors' are a simple and clean abstraction for replicating your Python classes across multiple workers and nodes. Each hyperparameter evaluation is asynchronously executed on a separate Ray actor and reports intermediate training progress back to RayTune. Upon reporting, RayTune then uses this information to performs actions such as early termination, re-prioritization, or checkpointing.
import logging
from ray import tune
from ray.tune import report
from ray.tune.suggest.ax import AxSearch
logger = logging.getLogger(tune.__name__)
logger.setLevel(
level=logging.CRITICAL
) # Reduce the number of Ray warnings that are not relevant here.
import numpy as np
import torch
from ax.plot.contour import plot_contour
from ax.plot.trace import optimization_trace_single_method
from ax.service.ax_client import AxClient
from ax.utils.notebook.plotting import init_notebook_plotting, render
from ax.utils.tutorials.cnn_utils import CNN, evaluate, load_mnist, train
init_notebook_plotting()
[INFO 04-25 21:32:14] ax.utils.notebook.plotting: Injecting Plotly library into cell. Do not overwrite or delete cell.
We specify enforce_sequential_optimization
as False, because Ray runs many trials in parallel. With the sequential optimization enforcement, AxClient
would expect the first few trials to be completed with data before generating more trials.
When high parallelism is not required, it is best to enforce sequential optimization, as it allows for achieving optimal results in fewer (but sequential) trials. In cases where parallelism is important, such as with distributed training using Ray, we choose to forego minimizing resource utilization and run more trials in parallel.
ax = AxClient(enforce_sequential_optimization=False)
[INFO 04-25 21:32:14] ax.service.ax_client: Starting optimization with verbose logging. To disable logging, set the `verbose_logging` argument to `False`. Note that float values in the logs are rounded to 6 decimal points.
Here we set up the search space and specify the objective; refer to the Ax API tutorials for more detail.
MINIMIZE = False # Whether we should be minimizing or maximizing the objective
ax.create_experiment(
name="mnist_experiment",
parameters=[
{"name": "lr", "type": "range", "bounds": [1e-6, 0.4], "log_scale": True},
{"name": "momentum", "type": "range", "bounds": [0.0, 1.0]},
],
objective_name="mean_accuracy",
minimize=MINIMIZE,
)
[INFO 04-25 21:32:14] ax.service.utils.instantiation: Inferred value type of ParameterType.FLOAT for parameter lr. If that is not the expected value type, you can explicity specify 'value_type' ('int', 'float', 'bool' or 'str') in parameter dict. [INFO 04-25 21:32:14] ax.service.utils.instantiation: Inferred value type of ParameterType.FLOAT for parameter momentum. If that is not the expected value type, you can explicity specify 'value_type' ('int', 'float', 'bool' or 'str') in parameter dict. [INFO 04-25 21:32:14] ax.service.utils.instantiation: Created search space: SearchSpace(parameters=[RangeParameter(name='lr', parameter_type=FLOAT, range=[1e-06, 0.4], log_scale=True), RangeParameter(name='momentum', parameter_type=FLOAT, range=[0.0, 1.0])], parameter_constraints=[]). [INFO 04-25 21:32:14] ax.modelbridge.dispatch_utils: Using Bayesian optimization since there are more ordered parameters than there are categories for the unordered categorical parameters. [INFO 04-25 21:32:14] ax.modelbridge.dispatch_utils: Using Bayesian Optimization generation strategy: GenerationStrategy(name='Sobol+GPEI', steps=[Sobol for 5 trials, GPEI for subsequent trials]). Iterations after 5 will take longer to generate due to model-fitting.
ax.experiment.optimization_config.objective.minimize
False
load_mnist(data_path="~/.data") # Pre-load the dataset before the initial evaluations are executed.
Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz to /home/runner/.data/MNIST/raw/train-images-idx3-ubyte.gz
Extracting /home/runner/.data/MNIST/raw/train-images-idx3-ubyte.gz to /home/runner/.data/MNIST/raw Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz to /home/runner/.data/MNIST/raw/train-labels-idx1-ubyte.gz
Extracting /home/runner/.data/MNIST/raw/train-labels-idx1-ubyte.gz to /home/runner/.data/MNIST/raw Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz to /home/runner/.data/MNIST/raw/t10k-images-idx3-ubyte.gz
Extracting /home/runner/.data/MNIST/raw/t10k-images-idx3-ubyte.gz to /home/runner/.data/MNIST/raw Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz to /home/runner/.data/MNIST/raw/t10k-labels-idx1-ubyte.gz
Extracting /home/runner/.data/MNIST/raw/t10k-labels-idx1-ubyte.gz to /home/runner/.data/MNIST/raw
(<torch.utils.data.dataloader.DataLoader at 0x7f4dac30f850>, <torch.utils.data.dataloader.DataLoader at 0x7f4dac39af10>, <torch.utils.data.dataloader.DataLoader at 0x7f4dac32d210>)
Since we use the Ax Service API here, we evaluate the parameterizations that Ax suggests, using RayTune. The evaluation function follows its usual pattern, taking in a parameterization and outputting an objective value. For detail on evaluation functions, see Trial Evaluation.
def train_evaluate(parameterization):
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
train_loader, valid_loader, test_loader = load_mnist(data_path="~/.data")
net = train(
net=CNN(),
train_loader=train_loader,
parameters=parameterization,
dtype=torch.float,
device=device,
)
report(
mean_accuracy=evaluate(
net=net,
data_loader=valid_loader,
dtype=torch.float,
device=device,
)
)
Execute the Ax optimization and trial evaluation in RayTune using AxSearch algorithm:
# Set up AxSearcher in RayTune
algo = AxSearch(ax_client=ax)
# Wrap AxSearcher in a concurrently limiter, to ensure that Bayesian optimization receives the
# data for completed trials before creating more trials
algo = tune.suggest.ConcurrencyLimiter(algo, max_concurrent=3)
tune.run(
train_evaluate,
num_samples=30,
search_alg=algo,
verbose=0, # Set this level to 1 to see status updates and to 2 to also see trial results.
# To use GPU, specify: resources_per_trial={"gpu": 1}.
)
[INFO 04-25 21:32:18] ax.service.ax_client: Generated new trial 0 with parameters {'lr': 0.001032, 'momentum': 0.274014}. [INFO 04-25 21:32:19] ax.service.ax_client: Generated new trial 1 with parameters {'lr': 0.153823, 'momentum': 0.895628}. [INFO 04-25 21:32:19] ax.service.ax_client: Generated new trial 2 with parameters {'lr': 1e-05, 'momentum': 0.446445}. [INFO 04-25 21:32:32] ax.service.ax_client: Completed trial 0 with data: {'mean_accuracy': (0.96, None)}. [INFO 04-25 21:32:32] ax.service.ax_client: Completed trial 1 with data: {'mean_accuracy': (0.096667, None)}. [INFO 04-25 21:32:33] ax.service.ax_client: Generated new trial 3 with parameters {'lr': 0.060196, 'momentum': 0.79894}. [INFO 04-25 21:32:33] ax.service.ax_client: Generated new trial 4 with parameters {'lr': 0.387275, 'momentum': 0.423804}. [INFO 04-25 21:32:46] ax.service.ax_client: Completed trial 3 with data: {'mean_accuracy': (0.100167, None)}. [INFO 04-25 21:32:46] ax.service.ax_client: Completed trial 2 with data: {'mean_accuracy': (0.710167, None)}. [INFO 04-25 21:32:47] ax.service.ax_client: Generated new trial 5 with parameters {'lr': 0.000252, 'momentum': 0.032802}. [INFO 04-25 21:32:48] ax.service.ax_client: Generated new trial 6 with parameters {'lr': 0.036335, 'momentum': 0.030551}. [INFO 04-25 21:32:59] ax.service.ax_client: Completed trial 5 with data: {'mean_accuracy': (0.913667, None)}. [INFO 04-25 21:33:00] ax.service.ax_client: Completed trial 4 with data: {'mean_accuracy': (0.104833, None)}. [INFO 04-25 21:33:00] ax.service.ax_client: Generated new trial 7 with parameters {'lr': 0.000232, 'momentum': 0.271583}. [INFO 04-25 21:33:02] ax.service.ax_client: Generated new trial 8 with parameters {'lr': 0.002014, 'momentum': 0.015456}. [INFO 04-25 21:33:13] ax.service.ax_client: Completed trial 6 with data: {'mean_accuracy': (0.099167, None)}. [INFO 04-25 21:33:14] ax.service.ax_client: Completed trial 7 with data: {'mean_accuracy': (0.925333, None)}. [INFO 04-25 21:33:15] ax.service.ax_client: Generated new trial 9 with parameters {'lr': 0.000478, 'momentum': 0.689477}. [INFO 04-25 21:33:18] ax.service.ax_client: Generated new trial 10 with parameters {'lr': 0.00057, 'momentum': 0.193867}. [INFO 04-25 21:33:28] ax.service.ax_client: Completed trial 8 with data: {'mean_accuracy': (0.957833, None)}. [INFO 04-25 21:33:29] ax.service.ax_client: Completed trial 9 with data: {'mean_accuracy': (0.961833, None)}. [INFO 04-25 21:33:31] ax.service.ax_client: Generated new trial 11 with parameters {'lr': 0.00012, 'momentum': 1.0}. [INFO 04-25 21:33:32] ax.service.ax_client: Generated new trial 12 with parameters {'lr': 0.00093, 'momentum': 0.0}. [INFO 04-25 21:33:43] ax.service.ax_client: Completed trial 11 with data: {'mean_accuracy': (0.7575, None)}. [INFO 04-25 21:33:44] ax.service.ax_client: Completed trial 10 with data: {'mean_accuracy': (0.9465, None)}. [INFO 04-25 21:33:45] ax.service.ax_client: Generated new trial 13 with parameters {'lr': 0.001196, 'momentum': 1.0}. [INFO 04-25 21:33:47] ax.service.ax_client: Generated new trial 14 with parameters {'lr': 1e-06, 'momentum': 0.0}. [INFO 04-25 21:33:58] ax.service.ax_client: Completed trial 12 with data: {'mean_accuracy': (0.950333, None)}. [INFO 04-25 21:33:59] ax.service.ax_client: Completed trial 13 with data: {'mean_accuracy': (0.093667, None)}. [INFO 04-25 21:34:00] ax.service.ax_client: Generated new trial 15 with parameters {'lr': 0.000147, 'momentum': 0.627285}. [INFO 04-25 21:34:02] ax.service.ax_client: Generated new trial 16 with parameters {'lr': 6e-06, 'momentum': 1.0}. [INFO 04-25 21:34:13] ax.service.ax_client: Completed trial 14 with data: {'mean_accuracy': (0.165, None)}. [INFO 04-25 21:34:14] ax.service.ax_client: Completed trial 15 with data: {'mean_accuracy': (0.922333, None)}. [INFO 04-25 21:34:16] ax.service.ax_client: Generated new trial 17 with parameters {'lr': 0.000484, 'momentum': 0.522484}. [INFO 04-25 21:34:17] ax.service.ax_client: Generated new trial 18 with parameters {'lr': 1e-06, 'momentum': 1.0}. [INFO 04-25 21:34:28] ax.service.ax_client: Completed trial 16 with data: {'mean_accuracy': (0.832, None)}. [INFO 04-25 21:34:30] ax.service.ax_client: Generated new trial 19 with parameters {'lr': 2.5e-05, 'momentum': 0.838966}. [INFO 04-25 21:34:30] ax.service.ax_client: Completed trial 17 with data: {'mean_accuracy': (0.9565, None)}. [INFO 04-25 21:34:33] ax.service.ax_client: Generated new trial 20 with parameters {'lr': 1e-06, 'momentum': 0.734977}. [INFO 04-25 21:34:43] ax.service.ax_client: Completed trial 18 with data: {'mean_accuracy': (0.8105, None)}. [INFO 04-25 21:34:46] ax.service.ax_client: Generated new trial 21 with parameters {'lr': 0.001433, 'momentum': 0.129269}. [INFO 04-25 21:34:46] ax.service.ax_client: Completed trial 19 with data: {'mean_accuracy': (0.894833, None)}. [INFO 04-25 21:34:51] ax.service.ax_client: Generated new trial 22 with parameters {'lr': 0.001367, 'momentum': 0.48733}. [INFO 04-25 21:34:59] ax.service.ax_client: Completed trial 20 with data: {'mean_accuracy': (0.195667, None)}. [INFO 04-25 21:35:02] ax.service.ax_client: Generated new trial 23 with parameters {'lr': 3.9e-05, 'momentum': 0.165983}. [INFO 04-25 21:35:03] ax.service.ax_client: Completed trial 21 with data: {'mean_accuracy': (0.958667, None)}. [INFO 04-25 21:35:07] ax.service.ax_client: Generated new trial 24 with parameters {'lr': 0.000112, 'momentum': 0.787248}. [INFO 04-25 21:35:16] ax.service.ax_client: Completed trial 22 with data: {'mean_accuracy': (0.953833, None)}. [INFO 04-25 21:35:19] ax.service.ax_client: Generated new trial 25 with parameters {'lr': 0.000632, 'momentum': 0.605018}. [INFO 04-25 21:35:19] ax.service.ax_client: Completed trial 23 with data: {'mean_accuracy': (0.833167, None)}. [INFO 04-25 21:35:22] ax.service.ax_client: Generated new trial 26 with parameters {'lr': 0.000846, 'momentum': 0.400577}. [INFO 04-25 21:35:32] ax.service.ax_client: Completed trial 24 with data: {'mean_accuracy': (0.934667, None)}. [INFO 04-25 21:35:35] ax.service.ax_client: Generated new trial 27 with parameters {'lr': 0.000352, 'momentum': 0.650033}. [INFO 04-25 21:35:35] ax.service.ax_client: Completed trial 25 with data: {'mean_accuracy': (0.9565, None)}. [INFO 04-25 21:35:38] ax.service.ax_client: Generated new trial 28 with parameters {'lr': 0.00258, 'momentum': 0.331716}. [INFO 04-25 21:35:48] ax.service.ax_client: Completed trial 26 with data: {'mean_accuracy': (0.9605, None)}. [INFO 04-25 21:35:51] ax.service.ax_client: Generated new trial 29 with parameters {'lr': 2.7e-05, 'momentum': 1.0}. [INFO 04-25 21:35:51] ax.service.ax_client: Completed trial 27 with data: {'mean_accuracy': (0.960833, None)}. [INFO 04-25 21:36:02] ax.service.ax_client: Completed trial 28 with data: {'mean_accuracy': (0.959833, None)}. [INFO 04-25 21:36:05] ax.service.ax_client: Completed trial 29 with data: {'mean_accuracy': (0.9135, None)}.
<ray.tune.analysis.experiment_analysis.ExperimentAnalysis at 0x7f4da4e8dd10>
best_parameters, values = ax.get_best_parameters()
best_parameters
{'lr': 0.00035236033797689286, 'momentum': 0.6500327399496211}
means, covariances = values
means
{'mean_accuracy': 0.9617932183215473}
render(
plot_contour(
model=ax.generation_strategy.model,
param_x="lr",
param_y="momentum",
metric_name="mean_accuracy",
)
)
# `plot_single_method` expects a 2-d array of means, because it expects to average means from multiple
# optimization runs, so we wrap out best objectives array in another array.
best_objectives = np.array(
[[trial.objective_mean * 100 for trial in ax.experiment.trials.values()]]
)
best_objective_plot = optimization_trace_single_method(
y=np.maximum.accumulate(best_objectives, axis=1),
title="Model performance vs. # of iterations",
ylabel="Accuracy",
)
render(best_objective_plot)
Total runtime of script: 4 minutes, 0.88 seconds.