Cutting edge hyperparameter tuning made simple with ray tune

XiaoweiJiang7 301 views 31 slides Dec 18, 2021
Slide 1
Slide 1 of 31
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31

About This Presentation

Cutting edge hyperparameter tuning made simple with ray tune


Slide Content

Ray Tune

Cutting edge
hyperparameter tuning
made simple

Agenda
1.Hyperparameter tuning (HPO) -
whys and challenges
2.HPO methods offered by Ray Tune
3.Distributed HPO made simple by
Ray Tune
4.Ray Tune APIs and integration with
other ml libraries
5.Demo
6.Q&A

Hyperparameter tuning - what and why

What are hyperparameters?
Model
parameters
●Model type and architecture
●Learning and training related
parameters
●Pipeline related parameters
Set before training
Learnt during training

Why are hyperparameters important?
RoBERTa: A Robustly Optimized BERT
Pretraining Approach

Imputer
Categorical
encoder
Under/oversampler XGBoost
Type: Simple or
iterative
Simple strategy:
Mean or median or
constant?
Type: One-hot
encoding or label
encoding?
Type: SMOTE or
random
undersampling?
Number of
neighbors?
6 - 10 hyperparameters
to tune
Total: 15
hyperparameters to
tune!

●Covers not only model training but also
data preprocessing and feature
engineering
●Relevant in both classical ML and DL
models
●Carry significance impact on ML
model/pipeline performance

Hyperparameter tuning is expensive
●Hyperparameter tuning is the trial and error process of
finding the optimal hyperparameter configuration
through a machine learning task.
●Black box optimization with a non-convex, nonlinear,
high dimension and noisy search space.
●A lot of configurations(trials) to try out.
●The evaluation of each configuration involves model
training.

Ray Tune makes HPO easy
Cutting Edge Optimization
Algorithms
By combining efficient algorithms with effective
distributed execution!
With easy to use APIs!!

Ray Tune offers a wide collection of
HPO algorithms

Exhaustive search
●Cross product of all possible
configurations
●Needs a discrete search space
●Simple and easy to parallelize, but
inefficient
●Samples configurations randomly
●Generally superior than Grid Search
●Hard to beat with high dimension
●Easily parallelizable, but still inefficient!

Bayesian optimization
●Uses information from previous
configurations to decide the next
configuration to try next
●Builds a surrogate model
●Different approaches to build this
surrogate model
●Inherently sequential
https://www.wikiwand.com/en/Hyperparamet
er_optimization

Early stopping
●Use intermediate results (epochs, trees)
to prune underperforming trials, saving
time and computing resources
●Median stopping, HyperBand, ASHA
●Inherently parallelizable

BOHB
●Standard bandit algorithms use
random search = uninformed
decisions. Solution: BOHB: Robust
and Efficient Hyperparameter
Optimization at Scale by Falkner
et al.
●Combines HyperBand with BO -
makes informed decisions
based on partial results
●Parallelizable
BOHB: Robust and Efficient Hyperparameter Optimization at Scale

HyperSched
●Standard bandit algorithms
aren’t deadline aware.
Solution: HyperSched:
Dynamic Resource
Reallocation for Model
Development on a Deadline
by Liaw et al.
●Reallocate resources to
prioritize promising trials https://arxiv.org/abs/2001.02338

BlendSearch
●Standard HPO methods try to minimize
number of iterations, but not actual
execution time (cost). Solution:
Economic hyperparameter optimization
with blended search strategy by Wang
et al.
●Combines global search with directed
local search
●Aware of hyperparameter cost &
deadlines - tries to first choose
configurations that are cheap to
evaluate
●Can be combined with bandit pruning
https://openreview.net/forum?id=VbLH04pRA3

Ray Tune manages distributed HPO
for you

What is Ray?

Key concepts
Execute remotely functions as tasks, and
instantiate remotely classes as actors
○Support both stateful and stateless computations

Asynchronous execution using futures
○Enable parallelism

class Trainer(object):
def __init__(self, config):
self._iter = 0
self._config = config
self._setup()
def step(self):
# train for one iteration
self._iter += 1
return {“iter”: self._iter}


t = Trainer(config=config)
train_result = t.step()
Trainer Class

@ray.remote(num_gpus=1)
class Trainer(object):
def __init__(self, config):
self._iter = 0
self._config = config
self._setup()
def step(self):
# train for one iteration
self._iter += 1
return {“iter”: self._iter}

t = Trainer(config=config)
train_result = t.step()
Trainer Class → Actor

@ray.remote(num_gpus=1)
class Trainer(object):
def __init__(self, config):
self._iteration = 0
self._config = config
self._setup()
def step(self):
# train for one iteration
self._iteration += 1
return {“iter”: self._iter}


t_handle = Trainer.remote(config=config)
train_result_future = t_handle.step.remote()
train_result = ray.get(train_result_future)
Trainer Class → Actor
Runs in worker processes
Runs in driver process

Ray Tune in Ray Architecture
Head Node Worker Node
Ray Tune (Driver process)
Ray Core (takes care about distributed orchestration, scheduling and object store)
Worker
Process
Worker
Process
TrialRunner Searcher/Scheduler
Worker
Process
Worker
Process
Worker
Process
Worker
Process

Woohoo!
Let’s review what we have talked about.

What makes Ray Tune special?
●Wealth of efficient search and scheduling algorithms (everything we
have talked about previously, and more!)
●Leveraging Ray for distributed HPO, which lets you save time by
running trials in parallel
○Same code for HPO on a laptop and a cluster

Tune API example
from ray import tune

def train_model(config):
model = ConvNet(config)
for i in range(epochs):
current_loss = model.train()
tune.report(loss=current_loss)

tune.run(
train_model,
config={“alpha”: tune.uniform(0.001, 0.1)},
num_samples=100,
scheduler=“asha”,
search_alg=”optuna”)



●Simple, unified and consistent API and search spaces
○try out different algorithms by changing 1, 2 lines of code

tune-sklearn
●tune-sklearn provides a drop-in wrapper
compatible with sklearn leveraging Ray Tune
underneath

API example
param_dists = {
'loss': tune.choice(['squared_hinge', 'hinge']),
'alpha': tune.loguniform(1e-4, 1e-1),
'epsilon': tune.uniform(1e-2, 1e-1),
}
optuna_tune_search = TuneSearchCV(SGDClassifier(),
param_distributions =param_dists,
n_trials=2,
early_stopping=True, # uses Async HyperBand if set to True
max_iters=10,
search_optimization ="optuna"
)
optuna_tune_search.fit(X_train, y_train)

Drop-in replacement for sklearn
from sklearn.model_selection import GridSearchCV

parameters = {
'alpha': [1e-4, 1e-1, 1],
'epsilon':[0.01, 0.1]
}
search = GridSearchCV(
SGDClassifier(),
parameters,
n_jobs=-1
)

search.fit(X_train, y_train)

Drop-in replacement for sklearn
from tune_sklearn import TuneSearchCV

parameters = {
'alpha': [1e-4, 1e-1, 1],
'epsilon':[0.01, 0.1]
}
search = TuneSearchCV(
SGDClassifier(),
parameters,
n_jobs=-1
)

search.fit(X_train, y_train)

tune-sklearn demo

Thanks for listening!
30
Let’s keep in touch!
●https://ray.io/
●https://discuss.ray.io/
●Ray slack
●https://github.com/ray-project/tune-sklearn

Q&A