Multi-Objective Optimization Ax API¶

This Tutorial¶

This tutorial will show two algorithms available in Ax for multi-objective optimization and visualize they compare to eachother and to quasirandom search.

Multi-objective optimization (MOO) covers the case where we care about multiple outcomes in our experiment but we do not know before hand a specific weighting of those objectives (covered by ScalarizedObjective) or a specific constraint on one objective (covered by OutcomeConstraints) that will produce the best result.

The solution in this case is to find a whole Pareto frontier, a surface in outcome-space containing points that can't be improved on in every outcome. This shows us the tradeoffs between objectives that we can choose to make.

Problem Statement¶

Optimize a list of M objective functions $ \bigl(f^{(1)}( x),..., f^{(M)}( x) \bigr)$ over a bounded search space $\mathcal X \subset \mathbb R^d$.

We assume $f^{(i)}$ are expensive-to-evaluate black-box functions with no known analytical expression, and no observed gradients. For instance, a machine learning model where we're interested in maximizing accuracy and minimizing inference time, with $\mathcal X$ the set of possible configuration spaces

Pareto Optimality¶

In a multi-objective optimization problem, there typically is no single best solution. Rather, the goal is to identify the set of Pareto optimal solutions such that any improvement in one objective means deteriorating another. Provided with the Pareto set, decision-makers can select an objective trade-off according to their preferences. In the plot below, the red dots are the Pareto optimal solutions (assuming both objectives are to be minimized). pareto front

Evaluating the Quality of a Pareto Front (Hypervolume)¶

Given a reference point $ r \in \mathbb R^M$, which we represent as a list of M ObjectiveThresholds, one for each coordinate, the hypervolume (HV) of a Pareto set $\mathcal P = \{ y_i\}_{i=1}^{|\mathcal P|}$ is the volume of the space dominated (superior in every one of our M objectives) by $\mathcal P$ and bounded from above by a point $ r$. The reference point should be set to be slightly worse (10% is reasonable) than the worst value of each objective that a decision maker would tolerate. In the figure below, the grey area is the hypervolume in this 2-objective problem. hv_figure

Set Objective Thresholds to focus candidate generation in a region of interest¶

The below plots show three different sets of points generated by the qEVHI algorithm with different objective thresholds (aka reference points). Note that here we use absolute thresholds, but thresholds can also be relative to a status_quo arm.

The first plot shows the points without the ObjectiveThresholds visible (they're set far below the origin of the graph).

The second shows the points generated with (-18, -6) as thresholds. The regions violating the thresholds are greyed out. Only the white region in the upper right exceeds both threshold, points in this region dominate the intersection of these thresholds (this intersection is the reference point). Only points in this region contribute to the hypervolume objective. A few exploration points are not in the valid region, but almost all the rest of the points are.

The third shows points generated with a very strict pair of thresholds, (-18, -2). Only the white region in the upper right exceeds both thresholds. Many points do not lie in the dominating region, but there are still more focused there than in the second examples.

Further Information¶

A deeper explanation of our the qEHVI and qParEGO algorithms this notebook explores can be found at https://arxiv.org/abs/2006.05078, and the underlying BoTorch implementation has a researcher-oriented tutorial at https://botorch.org/tutorials/multi_objective_bo.

Setup¶

In [1]:

import pandas as pd
from ax import *

import torch
import numpy as np

from ax.metrics.noisy_function import NoisyFunctionMetric
from ax.plot.exp_utils import exp_to_df
from ax.runners.synthetic import SyntheticRunner

# Plotting imports and initialization
from ax.utils.notebook.plotting import render, init_notebook_plotting
from ax.plot.contour import plot_contour
from ax.plot.pareto_utils import compute_pareto_frontier
from ax.plot.pareto_frontier import plot_pareto_frontier

init_notebook_plotting()

# Factory methods for creating multi-objective optimization modesl.
from ax.modelbridge.factory import get_MOO_EHVI, get_MOO_PAREGO

[INFO 12-08 18:55:54] ax.utils.notebook.plotting: Injecting Plotly library into cell. Do not overwrite or delete cell.

Load our sample 2-objective problem¶

In [2]:

from botorch.test_functions.multi_objective import BraninCurrin
branin_currin = BraninCurrin(negate=True).to(
    dtype=torch.double, 
    device= torch.device("cuda" if torch.cuda.is_available() else "cpu"),
)

/home/travis/virtualenv/python3.7.1/lib/python3.7/site-packages/torch/cuda/__init__.py:52: UserWarning:

CUDA initialization: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx (Triggered internally at  /pytorch/c10/cuda/CUDAFunctions.cpp:100.)

Define experiment configurations¶

Search Space¶

In [3]:

x1 = RangeParameter(name="x1", lower=0, upper=1, parameter_type=ParameterType.FLOAT)
x2 = RangeParameter(name="x2", lower=0, upper=1, parameter_type=ParameterType.FLOAT)

search_space = SearchSpace(
    parameters=[x1, x2],
)

MultiObjectiveOptimizationConfig¶

To optimize multiple objective we must create a MultiObjective containing the metrics we'll optimize and MultiObjectiveOptimizationConfig (which contains ObjectiveThresholds) instead of our more typical Objective and OptimizationConfig

We define NoisyFunctionMetrics to wrap our synthetic Branin-Currin problem's outputs. Add noise to see how robust our different optimization algorithms are.

In [4]:

class MetricA(NoisyFunctionMetric):
    def f(self, x: np.ndarray) -> float:
        return float(branin_currin(torch.tensor(x))[0])
    
class MetricB(NoisyFunctionMetric):
    def f(self, x: np.ndarray) -> float:
        return float(branin_currin(torch.tensor(x))[1])

metric_a = MetricA("a", ["x1", "x2"], noise_sd=0.0, lower_is_better=False)
metric_b = MetricB("b", ["x1", "x2"], noise_sd=0.0, lower_is_better=False)

In [5]:

mo = MultiObjective(
    metrics=[metric_a, metric_b],
)

In [6]:

objective_thresholds = [
    ObjectiveThreshold(metric=metric, bound=val, relative=False)
    for metric, val in zip(mo.metrics, branin_currin.ref_point)
]

In [7]:

optimization_config = MultiObjectiveOptimizationConfig(
    objective=mo,
    objective_thresholds=objective_thresholds,
)

Define experiment creation utilities¶

These construct our experiment, then initialize with Sobol points before we fit a Gaussian Process model to those initial points.

In [8]:

# Reasonable defaults for number of quasi-random initialization points and for subsequent model-generated trials.
N_INIT = 6
N_BATCH = 25

In [9]:

def build_experiment():
    experiment = Experiment(
        name="pareto_experiment",
        search_space=search_space,
        optimization_config=optimization_config,
        runner=SyntheticRunner(),
    )
    return experiment

In [10]:

## Initialize with Sobol samples

def initialize_experiment(experiment):
    sobol = Models.SOBOL(search_space=experiment.search_space)

    for _ in range(N_INIT):
        experiment.new_trial(sobol.gen(1)).run()

    return experiment.fetch_data()

Sobol¶

We use quasirandom points as a fast baseline for evaluating the quality of our multi-objective optimization algorithms.

In [11]:

sobol_experiment = build_experiment()
sobol_data = initialize_experiment(sobol_experiment)

In [12]:

sobol_model = Models.SOBOL(
    experiment=sobol_experiment, 
    data=sobol_data,
)
sobol_hv_list = []
for i in range(N_BATCH):
    generator_run = sobol_model.gen(1)
    trial = sobol_experiment.new_trial(generator_run=generator_run)
    trial.run()
    exp_df = exp_to_df(sobol_experiment)
    outcomes = np.array(exp_df[['a', 'b']], dtype=np.double)
    # Fit a GP-based model in order to calculate hypervolume.
    # We will not use this model to generate new points.
    dummy_model = get_MOO_EHVI(
        experiment=sobol_experiment, 
        data=sobol_experiment.fetch_data(),
    )
    try:
        hv = dummy_model.observed_hypervolume()
    except:
        hv = 0
        print("Failed to compute hv")
    sobol_hv_list.append(hv)
    print(f"Iteration: {i}, HV: {hv}")

sobol_outcomes = np.array(exp_to_df(sobol_experiment)[['a', 'b']], dtype=np.double)

Iteration: 0, HV: 0
Iteration: 1, HV: 0
Iteration: 2, HV: 0
Iteration: 3, HV: 0
Iteration: 4, HV: 0
Iteration: 5, HV: 0
Iteration: 6, HV: 0
Iteration: 7, HV: 0
Iteration: 8, HV: 0
Iteration: 9, HV: 0
Iteration: 10, HV: 0
Iteration: 11, HV: 0
Iteration: 12, HV: 0
Iteration: 13, HV: 0
Iteration: 14, HV: 0
Iteration: 15, HV: 0
Iteration: 16, HV: 0
Iteration: 17, HV: 0
Iteration: 18, HV: 0
Iteration: 19, HV: 0
Iteration: 20, HV: 0
Iteration: 21, HV: 0
Iteration: 22, HV: 0
Iteration: 23, HV: 0
Iteration: 24, HV: 0

qEHVI¶

Expected HyperVolume Improvement. This is our current recommended algorithm for multi-objective optimization, when a reference point is already known.

In [13]:

ehvi_experiment = build_experiment()
ehvi_data = initialize_experiment(ehvi_experiment)

In [14]:

ehvi_hv_list = []
ehvi_model = None
for i in range(N_BATCH):   
    ehvi_model = get_MOO_EHVI(
        experiment=ehvi_experiment, 
        data=ehvi_data,
    )
    generator_run = ehvi_model.gen(1)
    trial = ehvi_experiment.new_trial(generator_run=generator_run)
    trial.run()
    ehvi_data = Data.from_multiple_data([ehvi_data, trial.fetch_data()])
    
    exp_df = exp_to_df(ehvi_experiment)
    outcomes = np.array(exp_df[['a', 'b']], dtype=np.double)
    try:
        hv = ehvi_model.observed_hypervolume()
    except:
        hv = 0
        print("Failed to compute hv")
    ehvi_hv_list.append(hv)
    print(f"Iteration: {i}, HV: {hv}")

ehvi_outcomes = np.array(exp_to_df(ehvi_experiment)[['a', 'b']], dtype=np.double)

Iteration: 0, HV: 0
Iteration: 1, HV: 2.369795709893773
Iteration: 2, HV: 2.369795709893773
Iteration: 3, HV: 2.36979570989379
Iteration: 4, HV: 27.808707598061808
Iteration: 5, HV: 42.32910365753088
Iteration: 6, HV: 43.714595952923744
Iteration: 7, HV: 47.22933549546205
Iteration: 8, HV: 49.670985094860654
Iteration: 9, HV: 51.10263502611352
Iteration: 10, HV: 51.10263502611352
Iteration: 11, HV: 52.062469362862885
Iteration: 12, HV: 53.29783482714572
Iteration: 13, HV: 53.82613440623778
Iteration: 14, HV: 54.83016402989673
Iteration: 15, HV: 55.09680288067247
Iteration: 16, HV: 55.4212344658551
Iteration: 17, HV: 55.65524855193837
Iteration: 18, HV: 56.02054328781002
Iteration: 19, HV: 56.21164516729751
Iteration: 20, HV: 56.41195528763242
Iteration: 21, HV: 56.63236145039196
Iteration: 22, HV: 56.72667491435091
Iteration: 23, HV: 56.82492615607465
Iteration: 24, HV: 57.06941439456136

Plot qEHVI Pareto Frontier based on model posterior¶

The plotted points are samples from the fitted model's posterior, not observed samples.

In [15]:

frontier = compute_pareto_frontier(
    experiment=ehvi_experiment,
    data=ehvi_experiment.fetch_data(),
    primary_objective=metric_b,
    secondary_objective=metric_a,
    absolute_metrics=["a", "b"],
    num_points=20,
)

render(plot_pareto_frontier(frontier, CI_level=0.90))

qParEGO¶

This is a good alternative algorithm for multi-objective optimization when qEHVI runs too slowly or produces poor results.

In [16]:

parego_experiment = build_experiment()
parego_data = initialize_experiment(parego_experiment)

In [17]:

parego_hv_list = []
parego_model = None
for i in range(N_BATCH):   
    parego_model = get_MOO_PAREGO(
        experiment=parego_experiment, 
        data=parego_data,
    )
    generator_run = parego_model.gen(1)
    trial = parego_experiment.new_trial(generator_run=generator_run)
    trial.run()
    parego_data = Data.from_multiple_data([parego_data, trial.fetch_data()])
    
    exp_df = exp_to_df(parego_experiment)
    outcomes = np.array(exp_df[['a', 'b']], dtype=np.double)
    try:
        hv = parego_model.observed_hypervolume()
    except:
        hv = 0
        print("Failed to compute hv")
    parego_hv_list.append(hv)
    print(f"Iteration: {i}, HV: {hv}")

parego_outcomes = np.array(exp_to_df(parego_experiment)[['a', 'b']], dtype=np.double)

Iteration: 0, HV: 0
Iteration: 1, HV: 0
Iteration: 2, HV: 0
Iteration: 3, HV: 2.369795709893773
Iteration: 4, HV: 2.369795709893773
Iteration: 5, HV: 2.369795709893773
Iteration: 6, HV: 2.369795709893773
Iteration: 7, HV: 16.09103209482847
Iteration: 8, HV: 40.77163754822243
Iteration: 9, HV: 40.77163754822244
Iteration: 10, HV: 43.93147864174085
Iteration: 11, HV: 43.93147864174084
Iteration: 12, HV: 48.197050447090525
Iteration: 13, HV: 48.197050447090525
Iteration: 14, HV: 48.19705044709052
Iteration: 15, HV: 48.19705044709051
Iteration: 16, HV: 48.197050447090525
Iteration: 17, HV: 48.197050447090525
Iteration: 18, HV: 48.197050447090525
Iteration: 19, HV: 50.67640721019708
Iteration: 20, HV: 50.67640721019708
Iteration: 21, HV: 50.676407210197084
Iteration: 22, HV: 50.67640721019708
Iteration: 23, HV: 50.67640721019708
Iteration: 24, HV: 50.676407210197084

Plot qParEGO Pareto Frontier based on model posterior¶

The plotted points are samples from the fitted model's posterior, not observed samples.

In [18]:

frontier = compute_pareto_frontier(
    experiment=parego_experiment,
    data=parego_experiment.fetch_data(),
    primary_objective=metric_b,
    secondary_objective=metric_a,
    absolute_metrics=["a", "b"],
    num_points=20,
)

render(plot_pareto_frontier(frontier, CI_level=0.90))

Plot empirical data¶

Plot observed hypervolume, with color representing the iteration that a point was generated on.¶

To examine optimization process from another perspective, we plot the collected observations under each algorithm where the color corresponds to the BO iteration at which the point was collected. The plot on the right for $q$EHVI shows that the $q$EHVI quickly identifies the Pareto front and most of its evaluations are very close to the Pareto front. $q$ParEGO also identifies has many observations close to the Pareto front, but relies on optimizing random scalarizations, which is a less principled way of optimizing the Pareto front compared to $q$EHVI, which explicitly attempts focuses on improving the Pareto front. Sobol generates random points and has few points close to the Pareto front.

In [19]:

import numpy as np
from matplotlib import pyplot as plt
%matplotlib inline

from matplotlib.cm import ScalarMappable
fig, axes = plt.subplots(1, 3, figsize=(20,6))
algos = ["Sobol", "parEGO", "EHVI"]
outcomes_list = [sobol_outcomes, parego_outcomes, ehvi_outcomes]
cm = plt.cm.get_cmap('viridis')
BATCH_SIZE = 1

n_results = N_BATCH*BATCH_SIZE + N_INIT
batch_number = torch.cat([torch.zeros(N_INIT), torch.arange(1, N_BATCH+1).repeat(BATCH_SIZE, 1).t().reshape(-1)]).numpy()
for i, train_obj in enumerate(outcomes_list):
    x = i
    sc = axes[x].scatter(train_obj[:n_results, 0], train_obj[:n_results,1], c=batch_number[:n_results], alpha=0.8)
    axes[x].set_title(algos[i])
    axes[x].set_xlabel("Objective 1")
    axes[x].set_xlim(-150, 5)
    axes[x].set_ylim(-15, 0)
axes[0].set_ylabel("Objective 2")
norm = plt.Normalize(batch_number.min(), batch_number.max())
sm =  ScalarMappable(norm=norm, cmap=cm)
sm.set_array([])
fig.subplots_adjust(right=0.9)
cbar_ax = fig.add_axes([0.93, 0.15, 0.01, 0.7])
cbar = fig.colorbar(sm, cax=cbar_ax)
cbar.ax.set_title("Iteration")

---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
<ipython-input-19-47b2f338ce0e> in <module>
      1 import numpy as np
----> 2 from matplotlib import pyplot as plt
      3 get_ipython().run_line_magic('matplotlib', 'inline')
      4 
      5 from matplotlib.cm import ScalarMappable

ModuleNotFoundError: No module named 'matplotlib'

Hypervolume statistics¶

The hypervolume of the space dominated by points that dominate the reference point.

Plot the results¶

The plot below shows a common metric of multi-objective optimization performance when the true Pareto frontier is known: the log difference between the hypervolume of the true Pareto front and the hypervolume of the approximate Pareto front identified by each algorithm. The log hypervolume difference is plotted at each step of the optimization for each of the algorithms.

The plot show that $q$EHVI vastly outperforms $q$ParEGO which outperforms the Sobol baseline.

In [20]:

import numpy as np
from matplotlib import pyplot as plt
%matplotlib inline

iters = np.arange(1, N_BATCH + 1)
log_hv_difference_sobol = np.log10(branin_currin.max_hv - np.asarray(sobol_hv_list))[:N_BATCH + 1]
log_hv_difference_parego = np.log10(branin_currin.max_hv - np.asarray(parego_hv_list))[:N_BATCH + 1]
log_hv_difference_ehvi = np.log10(branin_currin.max_hv - np.asarray(ehvi_hv_list))[:N_BATCH + 1]

fig, ax = plt.subplots(1, 1, figsize=(8, 6))
ax.plot(iters, log_hv_difference_sobol, label="Sobol", linewidth=1.5)
ax.plot(iters, log_hv_difference_parego, label="qParEGO", linewidth=1.5)
ax.plot(iters, log_hv_difference_ehvi, label="qEHVI", linewidth=1.5)
ax.set(xlabel='number of observations (beyond initial points)', ylabel='Log Hypervolume Difference')
ax.legend(loc="lower right")

Out[20]:

<matplotlib.legend.Legend at 0x7ff7a4ba82d0>

Download Tutorial Jupyter Notebook

Download Tutorial Source Code