Optimization ucla cs 170 fall 2016 123456.pdf

Optimization
D.S. Parker
UCLA
March 9, 2016
D.S. Parker (UCLA) c2016 March 9, 2016 1 / 38

Least Squares
Given a rectangular (non-square)npmatrixXandn1 vectorc, solve:
Xcy:
X
0
Xc=X
0
y:c= (X
0
X)
1
X
0
y=X

y:
cminimizes squared errorjjyXcjj
2
| so has least squared error.
D.S. Parker (UCLA) c2016 March 9, 2016 2 / 38

Minimizing the Sum of Squares | with Derivatives
MinimizejjyXcjj
2
| i.e., ndcwith the least squared error.
L(c) = jjyXcjj
2
= (yXc)
0
(yXc)
=y
0
yc
0
X
0
yy
0
Xc+c
0
X
0
Xc=y
0
y2c
0
X
0
y+c
0
X
0
Xc:
@L(c)
@c
=

@L(c)
@c1
; ;
@L(c)
@cp

:
Each of these derivatives has the same three terms:
I@=@cjy
0
y= 0.I@=@cj2c
0
X
0
y=j-th entry of the vector (2X
0
y).I@=@cjc
0
X
0
Xc=j-th entry of the vector (X
0
Xc)
+j-th entry of the vector (c
0
X
0
X) (chain rule).
D.S. Parker (UCLA) c2016 March 9, 2016 3 / 38

Minimizing the Sum of Squares with an Optimizer
minimize :L(c) = jjyXcjj
2
:
L = @(c) sum( (y - X * c).^2 ) % equivalently: norm( y - X * c )^2
c0 = zeros(p,1) % assume p = number of parameters
c = fminsearch( L, c0 ) % find coefficients c from initial c0
D.S. Parker (UCLA) c2016 March 9, 2016 4 / 38

Function Functions
feval( 'log', 3.0 )mylog = @log
feval( mylog, 3.0 )
logbase = @(x,B) log(x) / log(B) % log base B
feval( logbase, 3.0, exp(1) ) % log base e
logbase( 3.0, exp(1) )
log3 = inline("log(x) / log(3)")
log3(10)
quad( @log, 1, 2 ) % integral of log(x) from 1 to 2derivative = @(f, x, h) (feval(f,x+h) - feval(f,x)) / hdoc feval
D.S. Parker (UCLA) c2016 March 9, 2016 5 / 38

Matlab'sfitdemo| t datay(t) with
P
j
Cje
jt
t = (0:.1:2)';
y = [5.8955 3.5639 2.5173 1.9790 1.8990 1.3938 1.1359 1.0096 1.0343 ...
0.8435 0.6856 0.6100 0.5392 0.3946 0.3903 0.5474 0.3459 0.1370 ...
0.2211 0.1704 0.2636]';
% Fit:
% y = C1 * exp(-lambda1 * t) + C2 * exp(-lambda2 * t)
D.S. Parker (UCLA) c2016 March 9, 2016 6 / 38

Matlab'sfitdemo| optimizes parameters1,2
function err = fitfun( lambda, t, y )
p = length(lambda); A = zeros(length(t), p);
for j = 1:p
A(:,j) = exp(-lambda(j)*t);
end
C = A \ y; % ??! compute C coefficients using least squares!?
y_estimate = A * C;
err = norm(y_estimate - y); % yield error of input coefficients lambda
% The goal is to fit the following function with two linear and
% two nonlinear parameters:
%
% y = C1 * exp(-lambda1 * t) + C2 * exp(-lambda2 * t)
%
% Given the parameter vector (lambda) and the data (t and y) as input,
% with an initial estimate of lambda, fminsearch obtains lambda.
start = [ 1; 0 ]
options = optimset( 'TolX', 0.1 ); % !? low tolerance -- for speed?
lambda = fminsearch( @(lambda) fitfun(lambda,t,y), start, options );
D.S. Parker (UCLA) c2016 March 9, 2016 7 / 38

Matlab'sfitdemo| xed to optimize all parameters
function err = fixedfitfun( params, t, y )
p = length(params)/2;
lambda = params(2:2:2*p);
C = params(1:2:2*p);
y_estimate = zeros(length(y),1);
for j = 1:p
y_estimate = y_estimate + C(j) * exp(-lambda(j) * t);
end
err = norm( y_estimate - y ); % yield error of coefficients
% The goal is to fit the following function with two linear and
% two nonlinear parameters:
%
% y = C1 * exp(-lambda1 * t) + C2 * exp(-lambda2 * t)
%
% Given the nonlinear parameter (lambda) and the data (t and y) as input,
% with an initial estimate of lambda, fminsearch obtains lambda.
start = [ 2; 2; 1; 1 ];
params = fminsearch( @(x) fixedfitfun(x,t,y), start );
C1 = params(1); lambda1 = params(2); C2 = params(3); lambda2 = params(4);
D.S. Parker (UCLA) c2016 March 9, 2016 8 / 38

Installing Optimization Functions in Octave
Octave commands for installing optimization in Octave:
pkg install -forge struct % download and install struct package
pkg install -forge optim % download and install optim package
% if you have a Mac and this dies with error messages, you may need
% to patch optim-1.5.0/src/Makefile.in to include libgfortran.a:
% LFLAGS += -L/opt/local/lib/gcc5/ -lgfortran
Octave commands for using the optim package afterwards:
pkg load optim % load functions from the optim package
L = @(c) sum( (y - X * c).^2 )
c0 = zeros(p,1)
c = fminsearch( L, c0 ) % fminsearch is available...
D.S. Parker (UCLA) c2016 March 9, 2016 9 / 38

Key Optimization Functions in Octave
help fzero % find a root of a univariate function
help fminbnd % minimization of a univariate function
help fminsearch % minimization of a multivariate function
help fminunc % minimization of a multivariate function (with gradients)
help lsqlin % linear least squares
help quadprog % quadratic programming
help lsqnonlin % nonlinear least squares
help fsolve % solve set of nonlinear equations (vector-valued function)
D.S. Parker (UCLA) c2016 March 9, 2016 10 / 38

fzero | Find a root of a Univariate Function
[x, FVAL, INFO, OUTPUT] = fzero ( F, x0, OPTIONS )
Find a zero of a univariate function.
F is a function handle, inline function, or string containing the
name of the function to evaluate.
x0 should be a 2-element vector specifying points that bracket a zero.
i.e., there must be a change in sign between x0(1) and x0(2):
sign( F( x0(1) ) ) * sign( F( x0(2) ) ) <= 0
If x0 is a single scalar then several nearby and distant values
are probed in an attempt to obtain a valid bracketing.
If this is not successful, the function fails.
On exit, the function returns x, the approximate zero point,
and FVAL, the function value thereof.
INFO is an exit flag that gives termination information.
OUTPUT is a structure containing run time information.
INFO is an exit flag that can have these values:
* 1 The algorithm converged to a solution.
* 0 Maximum number of iterations/function evaluations reached.
* -1 The algorithm has been terminated by user output function.
* -5 The algorithm may have converged to a singular point.
OUTPUT is a structure containing runtime information about the
'fzero' algorithm. Fields in the structure are:
* iterations Number of iterations through loop.
* nfev Number of function evaluations.
* bracketx A two-element vector with the final bracketing of the
zero along the x-axis.
* brackety A two-element vector with the final bracketing of the
zero along the y-axis.
OPTIONS is a structure specifying
additional options. Currently, 'fzero' recognizes these options:
"FunValCheck", "OutputFcn", "TolX", "MaxIter", "MaxFunEvals". For
a description of these options, see *note optimset: XREFoptimset.
D.S. Parker (UCLA) c2016 March 9, 2016 11 / 38

fminbnd | Minimization of a Univariate Function
[x, FVAL, INFO, OUTPUT] = fminbnd ( F, A, B, OPTIONS )
Find a minimum point of a univariate function.
F should be a function handle or name.
The search for a minimum is restricted to be in the interval
bound by A and B, using a Golden Section search strategy.
On exit, the function returns x, the approximate minimum point and
FVAL, the function value thereof.
INFO is an exit flag that can have these values:
* 1 The algorithm converged to a solution.
* 0 Maximum number of iterations/function evaluations reached.
* -1 The algorithm has been terminated from user output function.
If you only have an initial point to begin searching from
you will need to use an unconstrained minimization algorithm
such as 'fminunc' or 'fminsearch'.
'fminbnd' internally uses a Golden Section search strategy.
D.S. Parker (UCLA) c2016 March 9, 2016 12 / 38

Matlab'sfunfunsdemo
fplot(@humps,[0,2]);
z = fzero( @humps, 1, optimset('Display','off') );
hold on
plot(z,0,'r*');
%% Minimum of HUMPS
% FMINBND finds the minimum of a function in a given domain. Here, we search
% for a minimum for HUMPS in the domain (0.25,1).
m = fminbnd( @humps, 0.25, 1, optimset('Display','off') );
plot(m,humps(m),'r*');
%% Integral of HUMPS
% QUAD finds the definite integral of HUMPS in a given domain. Here it computes
% the area in the domain [0.5, 1].
q = quad(@humps,0.5,1);
title(['Area = ',num2str(q)]);
hold off
D.S. Parker (UCLA) c2016 March 9, 2016 13 / 38

Matlab'sfunfunsdemo
%% Zero of HUMPS
% The FZERO function finds a zeros of a function near an initial estimate. Our
% guess here for HUMPS is 1.
z = fzero(@humps,1,optimset('Display','off'));
fplot(@humps,[0,2]);
hold on
plot(z,0,'r*');
%% Minimum of HUMPS
% FMINBND finds the minimum of a function in a given domain.
% Here, we search for a minimum for HUMPS in the domain (0.25,1).
m = fminbnd(@humps,0.25,1,optimset('Display','off'));
plot(m,humps(m),'r*');
%% Integral of HUMPS
% QUAD finds the definite integral of HUMPS in a given domain.
% Here it computes the area in the domain [0.5, 1].
q = quad( @humps, 0.5, 1 );
title(['Area = ',num2str(q)]);
hold off
D.S. Parker (UCLA) c2016 March 9, 2016 14 / 38

fminsearch | Minimization of a Multivariate Function
[x, FVAL] = fminsearch ( F, x0, OPTIONS )
Find a value of x which minimizes the function F.
The search begins at the point x0 and iterates using the
Nelder & Mead Simplex algorithm (a derivative-free method).
This algorithm is better-suited to functions which have discontinuities
or for which a gradient-based search such as 'fminunc' fails.
Options for the search are provided in the parameter OPTIONS using
the function 'optimset'. Currently, 'fminsearch' accepts the
options: "TolX", "MaxFunEvals", "MaxIter", "Display".
For a description of these options, see 'optimset'.
On exit, the function returns X, the minimum point, and FVAL,
the function value thereof.
Example usages:
fminsearch( @(x) (x(1)-5).^2+(x(2)-8).^4, [0;0] )
fminsearch( inline("(x(1)-5).^2+(x(2)-8).^4", "x"), [0;0] )
See also: fminbnd, fminunc, optimset.
D.S. Parker (UCLA) c2016 March 9, 2016 15 / 38

fminunc | Gradient-Based Multivariate Minimization
[x, FVAL, INFO, OUTPUT, GRAD, HESS] = fminunc (F, x0, OPTIONS )
Solve an unconstrained optimization problem defined by function F.
F should accept a vector (array) defining the unknown variables,
and return the objective function value, optionally with gradient.
'fminunc' attempts to determine a vector x such that 'F(x)' is a
local minimum. x0 determines a starting guess. The shape of x0 is
preserved in all calls to F, but treated as a column vector.
Notes: If have only a single nonlinear equation of one variable
then using 'fminbnd' is usually a much better idea.
The algorithm used is a gradient search which depends on the
objective function being differentiable. If the function has
discontinuities it may be better to use a derivative-free algorithm
such as 'fminsearch'.
On return, x is the location of the minimum and FVAL contains the value
of the objective function at x. INFO may be one of the following:
1
Converged to a solution point. Relative gradient error is
less than specified by 'TolFun'.
2
Last relative step size was less than 'TolX'.
3
Last relative change in function value was less than 'TolFun'.
0
Iteration limit exceeded--either maximum numer of algorithm
iterations 'MaxIter' or maximum number of function evaluations
'MaxFunEvals'.
-1
Alogrithm terminated by 'OutputFcn'.
-3
The trust region radius became excessively small.
Optionally, 'fminunc' can return a structure with convergence
statistics (OUTPUT), the output gradient (GRAD) at the solution x,
and approximate Hessian (HESS) at the solution x.
Notes: If have only a single nonlinear equation of one variable
then using 'fminbnd' is usually a much better idea. The algorithm
used is a gradient search which depends on the objective function
being differentiable. If the function has discontinuities it may
be better to use a derivative-free algorithm such as 'fminsearch'.
D.S. Parker (UCLA) c2016 March 9, 2016 16 / 38

lsqlin | Linear Least Squares Minimization
[x, RESNORM, RESIDUAL, EXITFLAG, OUTPUT, LAMBDA]
= lsqlin (C, d, A, B, AEQ, BEQ, LB, UB, x0, OPTIONS )
Solve the linear least squares program
min 0.5 sumsq( C*x - d )
x
subject to
A * x <= B
AEQ * x = BEQ
LB <= x <= UB
The initial guess x0 and the constraint arguments A and B,
AEQ and BEQ, LB and UB can be set to empty ('[]') to omit them.
If the initial guess x0 is feasible the algorithm is faster.
Returned values:
x Position of minimum.
RESNORM
Scalar value of objective as sumsq( C*x - d ).
RESIDUAL
Vector of solution residuals C*x - d.
EXITFLAG
Status of solution / success
'0'
Maximum number of iterations reached.
'-2'
The problem is infeasible.
'1'
Global solution found.
OUTPUT
Structure with additional information, currently the only
field is 'iterations', the number of used iterations.
LAMBDA
Structure containing Lagrange multipliers corresponding to the
constraints.
This function calls the more general function 'quadprog'
internally.
OPTIONS can be set with 'optimset', currently the only option is
'MaxIter', the maximum number of iterations (default: 200).
D.S. Parker (UCLA) c2016 March 9, 2016 17 / 38

quadprog | Quadratic Optimization
[x, FVAL, EXITFLAG, OUTPUT, LAMBDA]
= quadprog (H, c, A, B, AEQ, BEQ, LB, UB, x0, OPTIONS )
Solve the quadratic program
min 1/2 * x' * H * x + x' * c
x
subject to:
A * x <= B
AEQ * x = BEQ
LB <= x <= UB
D.S. Parker (UCLA) c2016 March 9, 2016 18 / 38

lsqnonlin | Nonlinear Least Squares Minimization
[x, RESNORM, RESIDUAL, EXITFLAG, OUTPUT, LAMBDA, JACOBIAN]
= lsqnonlin ( F, x0, LB, UB, OPTIONS )
Solve nonlinear least-squares (nonlinear data-fitting) problems
min norm( F(x) )
x
subject to:
LB <= x <= UB
min ( EuclideanNorm(F(x)) ) .^ 2
x
The initial guess x0 must be provided while the bounds LB and UB)
can be set to the empty matrix ('[]') if not given.
OPTIONS can be set with 'optimset'. Follwing Matlab compatible
options are recognized:
'Algorithm' String specifying backend algorithm. Currently
available "lm_svd_feasible" only.
'TolFun' Minimum fractional improvement in objective function in an
iteration (termination criterium). Default: 1e-6.
'TypicalX' Typical values of x. Default: 1.
'MaxIter' Maximum number of iterations allowed. Default: 400.
'Jacobian' If set to "on", the objective function must return a
second output containing a user-specified Jacobian. The Jacobian
is computed using finite differences otherwise. Default: "off"
'FinDiffType' "centered" or "forward" (Default) type finite
differences estimation.
'FinDiffRelStep' Step size factor. The default is sqrt(eps) for
forward finite differences, and eps^(1/3) for central finite
differences
'OutputFcn' One or more user-defined functions, either as a
function handle or as a cell array of function handles that an
optimization function calls at each iteration. The function
definition has the following form:
'stop = outfun(x, optimValues, state)'
'x' is the point computed at the current iteration. 'optimValues'
is a structure containing data from the current iteration in the
following fields: "iteration"- number of current iteration.
"residual"- residuals. 'state' is the state of the algorithm:
"init" at start, "iter" after each iteration and "done" at the end.
'Display' String indicating the degree of verbosity. Default:
"off". Currently only supported values are "off" (no messages) and
"iter" (some messages after each iteration).
Returned values:
x
Position of minimum.
D.S. Parker (UCLA) c2016 March 9, 2016 19 / 38

Extending from Linear to Nonlinear Least Squares
minimize :L(c) = jjyXcjj
2
:
L = @(c) sum( (y - X * c).^2 ) % equivalently: norm( y - X * c )^2
c0 = zeros(p,1) % assume p = number of parameters
c = fminsearch( L, c0 ) % find coefficients c from initial c0
Generalize the objective function, and impose bounds onc:
minimize :L(c) = jjy'(X;c)jj
2
subject to :acb:
L = @(c) sum( (y - phi(X,c)).^2 )
c0 = zeros(p,1)
c_lower_bounds = a
c_upper_bounds = b
c = lsqnonlin( L, c0, c_lower_bounds, c_upper_bounds )
D.S. Parker (UCLA) c2016 March 9, 2016 20 / 38

From Linear Regression to Logistic Regression
Linear regression:
yXc:
cminimizes squared errorjjyXcjj
2
.
In classication problems, theyvalues are integers like 0 and 1.
Linear regression is not exactly what we want.
Intuitively we want to solve something like
yround(Xc):
Values near 0 are rounded to 0, and near 1 are rounded to 1.
D.S. Parker (UCLA) c2016 March 9, 2016 21 / 38

Logistic Regressionl
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
lll
l
l
l
l
l
l l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
0 2 4 6 8 10
0
2
4
6
8
10
red.X[,1]
red.X[,2]
ll
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l l
l
l
ll
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
0.5
D.S. Parker (UCLA) c2016 March 9, 2016 22 / 38

The Logistic Function | intuitively like Rounding
D.S. Parker (UCLA) c2016 March 9, 2016 23 / 38

Logistic Regression
Linear regression:
yXc:
cminimizes squared errorjjyXcjj
2
.
Logistic regression:
y(Xc):
cminimizes squared errorjjy(Xc)jj
2
.
D.S. Parker (UCLA) c2016 March 9, 2016 24 / 38

fsolve: solve set of Nonlinear Equations (vector-valued fcn)
[x, FvecVAL, INFO, OUTPUT, FJAC] = fsolve ( Fvec, x0, OPTIONS )
Solve the system of nonlinear equations defined by Fvec,
a vector-valued function.
Fvec should accept a vector (array) defining the unknown variables,
and return a vector of left-hand sides of the equations.
Right-hand sides are defined to be zeros.
In other words, this function attempts to determine a vector x
such that 'F(x)' gives (approximately) all zeros.
x0 determines a starting guess. The shape of x0 is preserved
in calls to Fvec, but otherwise it is treated as a column vector.
Note: If you only have a single nonlinear equation of one variable,
using 'fzero' is usually a much better idea.
If "Jacobian" is "on", it specifies that Fvec, called with 2 output
arguments, also returns the Jacobian matrix of right-hand sides at
the requested point. "TolX" specifies the termination tolerance in
the unknown variables, while "TolFun" is a tolerance for equations.
Default is '1e-7' for both "TolX" and "TolFun".
Note about user-supplied Jacobians: As an inherent property of the
algorithm, Jacobian is always requested for a solution vector whose
residual vector is already known, and it is the last accepted
successful step. Often this will be one of the last two calls, but
not always. If the savings by reusing intermediate results from
residual calculation in Jacobian calculation are significant, the
If "AutoScaling" is on, the variables will be automatically scaled
according to the column norms of the (estimated) Jacobian. As a
result, TolF becomes scaling-independent. By default, this option
is off, because it may sometimes deliver unexpected (though
mathematically correct) results.
If "Updating" is "on", the function will attempt to use Broyden
updates to update the Jacobian, in order to reduce the amount of
Jacobian calculations. If your user function always calculates the
Jacobian (regardless of number of output arguments), this option
provides no advantage and should be set to false.
"ComplexEqn" is "on", 'fsolve' will attempt to solve complex
equations in complex variables, assuming that the equations possess
a complex derivative (i.e., are holomorphic). If this is not what
you want, should unpack the real and imaginary parts of the system
to get a real system.
For description of the other options, see 'optimset'.
On return, FVAL contains the value of the function F evaluated at
x, and INFO may be one of the following values:
1
Converged to a solution point. Relative residual error is
less than specified by TolFun.
2
Last relative step size was less that TolX.
3
Last relative decrease in residual was less than TolF.
0
Iteration limit exceeded.
-3
The trust region radius became excessively small.
OPTIONS is a structure specifying
additional options. Currently, 'fsolve' recognizes these options:
"FunValCheck", "OutputFcn", "TolX", "TolFun", "MaxIter",
"MaxFunEvals", "Jacobian", "Updating", "ComplexEqn" "TypicalX",
"AutoScaling" and "FinDiffType".
D.S. Parker (UCLA) c2016 March 9, 2016 25 / 38

Nelder-Mead: a popular Gradient-Free Method
To minimizeL(c) over ap-dimensional space of vectorsc:
evaluateLat all points of asimplex
| a simple polyhedron withp+ 1 vertices:
Ip= 2: the simplex is a triangle.
Ip= 3: the simplex is a tetrahedron.
Downhill simplex algorithm(Nelder-Mead algorithm):
repeatedly move one simplex node downhill, improving its value ofL:
Ind the vertexchaving the largest (worst) value ofL(c)
Ireplace it by a new vertexc
0
having a valueL(c
0
) that is lower.
D.S. Parker (UCLA) c2016 March 9, 2016 26 / 38

Nelder-Mead: a popular Gradient-Free Method
Nelder-Mead algorithm:
at each iteration, the simplex nodesc1; : : : ;cp+1are ordered byL-values:
L(c1) L(cp+1)
i.e.,c1corresponds to the worstL-value.
The average of the others is:
c=

p+1
X
i=2
ci
!
:
Nelder-Mead replaces pointc1by its reectionc
0
through this average:
c
0
=c+ (cc1) = 2 cc1:
This move looks like it ought to take us downhill, but of course it may not.
D.S. Parker (UCLA) c2016 March 9, 2016 27 / 38

Nelder-Mead: a popular Gradient-Free Method
Nelder-Mead then proceeds as follows with the new valuec
0
:
1. L(c
0
)<L(cp+1), then the move can be accepted,
but also see if doubling the length of this move is even better:
c
0
=c+ 2 (cc1) = 3 c2c1
2. L(c
0
)>L(c2), try halving the length of this move:
c
0
=c+
1
2
(cc1) =
3
2
c
1
2
c1:
2.1 L(c
0
)<L(c2), then the move is accepted.2.2 L(c
0
)>L(c2), abandon the idea of using a reection.
Instead, interpolate towards the average:
c
0
=c
1
2
(cc1) =
1
2
c+
1
2
c1
2.2.12.2.2
c
0
i=ci
1
2
(cicp+1) =
1
2
ci+
1
2
cp+1 i= 1; : : : ;p:
D.S. Parker (UCLA) c2016 March 9, 2016 28 / 38

Gradient Methods
Suppose:
L(c) =jjy'(X;c)jj
2
Gradients can be used to obtain the vector of coecientsc.
g(c) =

@L(c)
@c1
; ;
@L(c)
@cp

is the gradient, or Jacobian, of the errorL(c).
Gradient descentiteration:
c
(n+1)
=c
(n)
g(c
(n)
)
Thishill-descendingapproach moves slowly when the gradient is small.
D.S. Parker (UCLA) c2016 March 9, 2016 29 / 38

Levenberg-Marquardt
Gradient descent is slow and safe; Newton's method is fast and risky:
Gradient descentc
(t+1)
=c
(t)
g(c
(t)
)
Newton's methodc
(t+1)
=c
(t)
H(c
(t)
)
1
g(c
(t)
)
The Levenberg-Marquardt method combines them:
c
(t+1)
=c
(t)
M(c
(t)
)
1
g(c
(t)
)
where
M(c) =
1
2
(H(c) +diag(H(c)) )
diag(H(c)) is a diagonal matrix containing the principal diagonal ofH(c),
andis a constant.
The Levenberg-Marquardt method begins with a large value of, and
after every step increasesif the error increases, and decreasesif the
error decreases. It nds a local minimum near the starting point relatively
quickly, but because of its `safety' the residual error can still be large.
D.S. Parker (UCLA) c2016 March 9, 2016 30 / 38

Problem: Coecients depend on the Variables used
Using the auto dataset inauto_ridge.m:
Letybe the vector1 ./ MPG(normalized).
LetXbe the matrix with ve variables (normalized):
Cylinders, Displacement,Horsepower, Weight, Acceleration.
Table of Least Squares Coecients:
Variable 5 variables4 variables3 variables
2 Cylinders 0.1596
3 Displacement -0.0253 0.1190 0.0883
4 Horsepower 0.4352 0.4207 0.3240
5 Weight 0.4240 0.4440 0.5220
6 Acceleration 0.0860 0.0826
Smaller coecients seem more noisy
D.S. Parker (UCLA) c2016 March 9, 2016 31 / 38

Changing the Objective Function with Regularization
Least-squares coecientsc:
minimize(c) =jjyXcjj
2
Newconstraint: requirejjcjj
2
to be small (minimize noisy coecients).Ridge Regression:
minimizejjyXcjj
2
+jjcjj
2
:
Iasdecreases: emphasizejjyXcjj
2
Iasincreases: emphasizejjcjj
2
.
D.S. Parker (UCLA) c2016 March 9, 2016 32 / 38

Optimization ucla cs 170 fall 2016 123456.pdf

About This Presentation

Slide Content

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

Optimization ucla cs 170 fall 2016 123456.pdf

About This Presentation

Slide Content

Slide 1

Slide 2

Slide 3

Slide 4

Slide 5

Slide 6

Slide 7

Slide 8

Slide 9

Slide 10

Slide 11

Slide 12

Slide 13

Slide 14

Slide 15

Slide 16

Slide 17

Slide 18

Slide 19

Slide 20

Slide 21

Slide 22

Slide 23

Slide 24

Slide 25

Slide 26

Slide 27

Slide 28

Slide 29

Slide 30

Slide 31

Slide 32

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

Pray For The Peace Of Jerusalem and You Will Prosper

Don_t_Waste_Your_Life_God.....powerpoint

VILLASUR_FACTORS_TO_CONSIDER_IN_PLATING_SALAD_10-13.pdf

Fertility awareness methods for women in the society

Chapter 5 Arithmetic Functions Computer Organisation and Architecture

syakira bhasa inggris (1) (1).pptx.......