Optimization Techniques
Anand J Kulkarni
PhD, MASc, BEng, DME
Professor & Associate Director
Institute of Artificial Intelligence, MIT World Peace University, Pune,
Bharat (India)
Email: [email protected]; [email protected]
What is optimization about?
•Extreme states (i.e. minimum and maximum states out of many or
possibly infinitely many)
Ex. Natural (physical) stable equilibrium state is generally a
‘minimum potential energy’ state.
•Human activities: to do the best in some sense
•set a record in a race (shortest/minimum time, etc.)
•retail business (maximize the profit, etc.)
•construction projects (minimize cost, time, etc.)
•power generator design (maximize efficiency, minimize weight,
etc.)
•Best job out of several choices
2
What is optimization about?
•Concept of (Optimization) minimization and
maximization to achieve the best possible
outcome is a positive and intrinsic human
nature.
•Study of optimization:
•Create optimum designs of products,
processes, systems, etc.
3
What is optimization about?
•Real world issues:
•Requirements and constraints imposed on
products, systems, processes, etc.
•Creating feasible design (solution)
•Creating a best possible design (solution)
•“Design optimization”: highly complex, conflicting
constraints and considerations, etc.
4
What is an Optimal Design (Tai et al.)
•In the engineering design of any component,
device, process or system the optimal design
can be defined as the one that is feasible and
the best according to some quantitative
measure of effectiveness.
5
Importance of Optimization
•Greater concern about the limited energy,
material, economic sources, etc.
•Heightened environmental/ecological
considerations
•Increased technological and market
competition
6
A Simple Example
•5 X 7 metal sheet
•can take different values between 0 and 2.5
•Infinite box designs (solutions)
•Aim: Biggest box volume (Maximization)x
7
A Simple Example()( )( )
32
527242435,02.5fx xxxxxxx=−−=−+
8
9-10
0
10
20
30
40
50
60
70
80
0 1 2 3 4 5 6
Design of a Can
10
11
Sensor Network Coverage Problem
12
Kulkarni, A.J., Tai, K., Abraham, A.: “Probability Collectives: A Distributed Multi-Agent System
Approach for Optimization”, Intelligent Systems Reference Library, 86 (2015) Springer
13 (),
ll
xy
( ),
uu
xy ( ),
ul
xy
s
r
s
r
Sensor 3 ,1c
A
,2c
A
Square
FoI ()11
,xy
Enclosing
Rectangle A
( ),
lu
xy
( )22
,xy
,3c
A
( )33
,xy s
r
Sensor 2
Sensor 1
14 (),
ll
xy
( ),
uu
xy ( ),
ul
xy
s
r
s
r
Sensor 3 ,1c
A
,2c
A
Square
FoI ()11
,xy
Enclosing
Rectangle A
( ),
lu
xy
( )22
,xy
,3c
A
( )33
,xy s
r
Sensor 2
Sensor 1
15
A Simple Example
•5 X 7 metal sheet
•can take different values between 0 and 2.5
•Infinite box designs (solutions)
•Aim: Biggest box volume (Maximization)x
16
A Simple Example
•Setting the obtain stationary points
or ()
'
0fx= 0.96x= 3.04x= () ()0.9615.02;3.043.02ff = =− ()( )( )
()
()
32
'2
2
''
2
527242435,02.5
124835
2448
fx xxxxxxx
df
fx xx
dx
df
fx x
dx
=−−=−+
==−+
==−
17-10
0
10
20
30
40
50
60
70
80
0 1 2 3 4 5 6
18-10
0
10
20
30
40
50
60
70
80
0 1 2 3 4 5 6
Illustration 1
19()
32
523fxxxx=+−
Find Maximum and Minimum of ()
' 2
1543fxxx=+−
First order derivative3 1
53
x andx=− =
First order derivative equated to zero,
i.e. locate variable values at which
slope becomes zero
Second order derivative ()
''
304fxx=+
Second order derivative value at () ()
'' ''
3 1
53
14 14andfxfx
−
=− =
Illustration 1
Maximum at
Minimum at
203
5
x=−
-4
-2
0
2
4
6
8
10
-1.5 -1 -0.5 0 0.5 1 1.51
3
x=
Illustration 2
•Aballis thrown in the air. Its height at any
time t is given by ℎ=3+14�−5�
2
. What is
the maximum height?
21
22
�ℎ
��
=14−10�=0 ,i.e.locating point at which slope is zero
0
2
4
6
8
10
12
14
0 0.5 1 1.5 2 2.5 3 3.5
�=1.4
ℎ�??????�ℎ�ℎ=12.8
�
2
ℎ
��
2
=−10 ,What does this mean?
This means the slope is continually getting smaller (−10): traveling from left to right the
slope starts out positive (the function rises), goes through zero (the flat point), and then the
slope becomes negative (the function falls)
ℎ=3+14�−5�
2
23
•1D Optimization
24
•3D view of 2D optimization
25
Basic Definitions
•A design problem is characterized by a set of
design variables (decision variables)
•Single Variable
•Multi-Variable
where() ()( )
2
min 2logfxx x=+ ( ) ()( )
() ()( )
5
12 1 2
5
12
min, 2log
2log
fxxx x
fx x
=+
=+X ( )
12
,xx=X
26
Basic Definitions
•Design variables
•Continuous (any value between a specific interval)
•Discrete (the value from a set of distinct numerical
values)
•Ex. Integer values, (1, 4.2, 6 11, 12.9, 25.007), binary (0,
1), etc.
•Combinatorial Optimization Problem
•Mixed (discrete & continuous) variables
27
Basic Definitions
•Objective Function:
•Criterion by which the performance/effectiveness
of the solution is measured
•Also referred to as ‘Cost Function’.
Multi-objective
Problem()( )( )
( ) ()( )
() ()( )
5
12 1 2
5
12
5272
, 2log
2log
fx xxx
fxxx x
fx x
=−−
=+
=+X
31
Basic Definitions
•Unconstrained Optimization Problems
•No restrictions (Constraints) imposed on
design variables
•Constrained Optimization Problems
•Restrictions (constraints) are imposed on
design variables and the final solution
should satisfy these constraints, i.e. the final
solution should at least be feasible.
•The ‘best’ solution comes further
32
Basic Definitions
•Depending on physical nature of the problem:
‘Optimal Control Problem’
•Decomposing the complex problem into a
number of simpler sub-problems
•Linear Programming (LP): If the objective and
constraints are linear
•Non-Linear programming (NLP): If any of it is non-
linear
33
General Problem Statement
•Side constraints ()
()
()
:
0,1,...,
0,1,...,
,1,...,
j
k
i
Minimizef
Subjectto
g jm
h kl
x in
=
==
=
X
X
X lu
i i i
xxx
34
Active/Inactive/Violated Constraints
•Inequality Constraints
35()()
1
2
3
4
,
1253000
10144000
50 50
50 50
ffBR
gBR
gBR
gB B
gR R
=
=+
=+
=→−−
=→−−
X
Active/Inactive/Violated Constraints
•The set of points at which an inequality
constraint is active forms a constraint
boundary which separates the feasible region
points from the infeasible region points.
36
Active/Inactive/Violated Constraints
•An inequality constraint is said to be
violated at a point , if it is not satisfied there
i.e. .
•If is strictly satisfied i.e. . Then it is
said to be inactive at the point .
•If is satisfied at equality i.e. . Then
it is said to be active at the point .
37j
g ()( )0
j
g X j
g ()( )0
j
g X X X ()( )0
j
g =X j
g X
Active/Inactive/Violated Constraints
•Based on these concepts, equality constraints
can only be either active i.e. or
violated i.e. at any point .
39()( )0
j
h =X ()( )0
j
h X X
Active/Inactive/Violated Constraints
•Equality & inequality constraints
40()( )
12
1 1 2
2 1 2
3 1 2
1 1 1
2 2 2
,
4212
1
24
00
00
ffxx
hxx
hxx
hxx
gx x
gx x
=
=+=
=−+=
=+=
=→−
=→−
X
Design Space
Feasible Region
Infeasible Region
Example
41
42
Practical Example
•Acompanymanufacturestwomachines,AandB.Usingavailable
resources,either28Aor14Bcanbemanufactureddaily.Thesales
departmentcansellupto14Amachinesor24Bmachines.The
shippingfacilitycanhandlenomorethan16machinesperday.The
companymakesaprofitof$400oneachAmachineand$600on
eachBmachine.HowmanyAandBmachinesshouldthecompany
manufactureeverydaytomaximizeitsprofit?
43
Convexity
•A set of points is a convex set, if for any two
points in the set, the entire straight line
segment joining these two points is also in the
set.
45
Convexity
•A function f(X)is convex if it is defined over a
convex set and for any two points of the graph
f(X),the straight line segment joining these
two points lies entirely above or on the graph.
46
Local and Global Optimum
•An objective function is at its local minimum
at the point if for all feasible
within its small neighborhood of
47()()( )
*
ffXX *
X f X *
X
Local and Global Optimum
•An objective is at its global minimum at the
point if for all feasible .
48()()( )
*
ffXX *
X f X
Gradient Vector and Hessian Matrix
•The techniques of finding the optimum point
largely depend on ‘calculus’.
•The usual assumption of continuity of the
objective and constraint functions is required (at
least to first order).
•The ‘derivatives’ of these functions wrteach
variable are important as they indicate the rate of
change of the function wrtthese variables and
also respective stationary points.
49
Gradient Vector and Hessian Matrix
•Gradient Vector: first order derivative
Hessian Matrix: second order derivative
50()
() () ()
12
...
n
f f f
f
x x x
=
X X X
X ()
() () ()
() () ()
() () ()
2 2 2
2
12 11
2 2 2
22
21 2 2
2 2 2
2
12
...
...
n
n
nn n
f f f
xx xxx
f f f
xx xx xf
f f f
xx xx x
=
X X X
X X X
X
X X X
Gradient Vector and Hessian Matrix
•The gradient and Hessian of the objective
function and constraint functions provides
sensitivity analysis, i.e. they indicate how
sensitive the functions are towards changes of
the design variables.
•Example:
51()
3 22
1 13 12
3 59fxxxxx=+++X
General Optimization Unconstrained
•Analytical methods are too cumbersome to be
used for non-linear optimization problems.
Hence numerical approach required.
•Reasons: Complexity grows as number of
design variables and/or constraints grows.
•Implicit constraints
52
General Optimization Unconstrained
53
General Optimization Unconstrained
1. Estimate a reasonable starting design
where is iteration counter
2. Compute the search direction in the
design space.
3. Check for the convergence.
4. Calculate the step size in the direction
5. Update the design
6. And set and go to Step 2.
54()0
X k ()k
d k
()k
d 1kk=+ () () ()1k k k
k
+
=+XXd
Step to move
further
General Optimization Unconstrained
•Desirable search direction = descent direction
•Taylor’s Expansion
•where
55()
( )
()
()
1kk
ff
+
XX () ()
( )
()
()
k k k
k
ff+XdX () ()()
( )( )
()
()
()()
0
k k k k
k
kk
ff+
Xcd X
cd () ()
()
kk
f=cX ()1k−
X ()k
X ()1k+
X ()k
d ()1k
+
d
Downhill
56
•Example:
•Check the direction at point is a
descent direction for .
57()
1222
1 12 2 1
22
xx
fxxxxxe
+
=−+−+X ()1,2=d ()0,0 ()fX
Single Variable Optimization
Unconstrained
•Basic approach behind any numerical
technique is to divide the problem into two
phases.
Phase 1: Initial bracketing or bounding of the
minimum point
Phase 2: approximation of the minimum point
58
Initial bracketing or bounding of the
minimum point
•A small value of is specified.
•Function values are evaluated at a sequence
of points along a uniform grid with a constant
search interval , i.e. at points
•The function will be in general decreasing
until it starts increasing, indicating that the
minimum is surpassed, i.e.
59 0,,2,3,...x= ()fx *
x ()( )()
() ()( )
1 ,0
1
fs fssisaninteger
andfsfs
−
+
Initial bracketing or bounding of the
minimum point
•Equal interval search method
600 2 ()1s− s ()1s+ x ()fx *
x
Initial bracketing or bounding of the
minimum point
•Whenever this condition arises at any integer
the minimum is bracketed in the interval
61s () ()
*
11sxs−+
Initial bracketing or bounding of the
minimum point
•Issues:
the efficiency depends on the number of
function evaluations which depends on the
value of .
-too small-large function evaluations
-too large-resulting bracket (interval) is
wide and may need more computations in
the second phase of locating approximate
minimum point
62
Initial bracketing or bounding of the
minimum point
•Variable interval search method
To improve upon the previous search method
-initial is specified
-subsequent search intervals are incremented
by a fixed ratio .
i.e. the function will be evaluated at a
series of points
63 a ()fx ()( )( )
2 2 3
0,,1,1 ,1 ,...x aaaaaa =++++++
Initial bracketing or bounding of the
minimum point
•Therefore
where is the iteration number
•Similar to the equal interval search the
minimum point is bounded
64() ()1
,0,1,2,3,...
ss s
xxas
+
=+ = s ()
()
()
()
()
()
()
()
() ()
1
11 *
1
ss
ss
ss
fxfx
xxx
fxfx
−
−+
+
Initial bracketing or bounding of the
minimum point
•One good ratio to use is ‘Golden Ratio’
•The ratio is based on the ‘Golden Section Method’ which is
one of the methods for the second phase, i.e. to locate the
approximate minimum.
65
Approximation of the Minimum
•Reduce the interval to a small enough size so that the
minimum will be known with an acceptable
precision.
•Golden Section Method and Polynomial
Approximation (Polynomial Interpolation)
66*
x
Approximation of the Minimum:
Golden Section Method
Evaluate the two sectioning points and
within the interval such that
The size of the interval is repeatedly reduced
by a fraction each time by
discarding portion either or …
671
x 2
x 12lu
xxxx ()()
()()
1
2
12
21
lu
lu
xaxax
xaxax
=−+−
=−+− ()20.38197a− 1l
xx 2 u
xx
Approximation of the Minimum:
Golden Section Method
… based on:
and
68()()
12
*
2l
fxfx
xxx
2
21
1
?
u
ll
xx
xx
xx
x
=
=
=
= ()()
12
*
1 u
fxfx
xxx
1
12
2
?
l
uu
xx
xx
xx
x
=
=
=
= ()fx ()fx l
x 1
x 2
x u
x l
x 1
x 2
x u
x x x
Approximation of the Minimum:
Golden Section Method
•Convergence Criteria
–Precision limit
‘current interval’ as compared to the ‘original
interval’
Finally,
69() ()00
,0.00001
ul
ul
xx
xx
−
=
− *
2
lu
xx
x
+
=
Why Golden Ratio?
•Do not need to recalculate the function values
at and in every iteration but either of
both only.
701
x 2
x
Example: Golden Section Method
•Example: Golden Section Method
•Bounded by
71()24
x
fxxe=−+ 0.5 2.618
lu
xandx==
Example: Golden Section Method
72() ()
()
() ()
()
()
()
()
()
() ()
()
()
()
()
()
() ()
()
()
()
()
()
00
00
0 0 0 0
11
0 0 0 0
22
00
12
0.5 1.649
2.618 5.237
12 1.309 0.466
2 11.809 0.868
ll
uu
lu
lu
x fx
x fx
xaxax fx
x axax fx
Nowfxfx
=→ =
=→ =
=−+−=→ =
=−+−=→ =
()fx ()0
l
x ()0
1
x ()0
2
x ()0
u
x
Example: Golden Section Method
73() () ()
()
()
()
() () ()
()
()
()
() () ()
()
()
()
()
()
()
()
()
() ()
()
1 0 1 0
1 0 1 0
2 1 2 1
1 0 1 0
22
1
1
1 1 1 1
11
0.5 1.649
1.309 0.466
1.809 0.868
?
12 1.0 0.718
l l l l
uu
lu
xx fxfx
xx fxfx
xx fxfx
x
xaxax fx
==→ ==
==→ ==
==→ ==
=
=−+−=→ = ()fx ()1
l
x ()1
1
x ()1
2
x ()1
u
x ()
()
()
()
11
12
Nowfxfx
Example: Golden Section Method
74() () ()
()
()
()
() () ()
()
()
()
() () ()
()
()
()
()
()
()
()
()
() ()
()
2 1 2 1
11
2 1 2 1
1 2 1 2
2 1 2 1
2
2
2 2 2 2
22
1.0 0.718
1.309 0.466
1.809 0.868
?
2 11.5 0.482
...
ll
u u u u
lu
xx fxfx
xx fxfx
xx fxfx
x
x axax fx
andsoon
==→ ==
==→ ==
==→ ==
=
=−+−=→ =
75
Interval Halving Method
76
◼Algorithm
◼Step 1: Let , x
m = (x
u+x
l)/2 find f(x
m)
◼Step 2: Set x
1 = x
l + (x
u-x
l)/4, x
2 = x
u - (x
u-x
l)/4.
◼Step 3: If f(x
1)<f(x
m), then x
u =x
m, x
m= x
1 go to Step 1.
◼ elseif f(x
1)>f(x
m), continue to Step 4
◼Step 4: If f(x
2)<f(x
m), then x
l=x
m, x
m=x
2 go to Step 1.
If f(x
2)>f(x
m), then x
l=x
1, x
u=x
2, go to Step 1.
Approximation of the Minimum:
Polynomial Approximation
•Golden Section method may require more
number of iterations and function evaluations
as compared to polynomial approximation
method
•Pass a polynomial curve through a
certain number of points of the function .
•And the minimum of this known curve will be
a good estimate of the minimum point of the
actual function .
77()fx ()fx ()
ˆ
fx
Approximation of the Minimum:
Polynomial Approximation
•Generally the curve is a ‘second order’
polynomial
•3 common points are required as 3 unknown
coefficients are to be found out.
78()
2
0 1 2
ˆ
fxaaxax=++ 01 2
,aaanda
Approximation of the Minimum:
Polynomial Approximation
79l
x u
x x ()fx im
x
Approximation of the Minimum:
Polynomial Approximation
are already known, So…
Solve these simultaneous equations to find
.
80()() (),,
lim u l im u
xxandxandfxfxandfx () ()
() ()
() ()
2
0 1 2
2
0 1 2
2
0 1 2
ˆ
ˆ
ˆ
l l l l
im im im im
u u u u
fxaaxaxfx
fxaaxaxfx
fxaaxaxfx
=++=
=++=
=++= 01 2
,aaanda
Approximation of the Minimum:
Polynomial Approximation
•This will give us the complete equation of the
polynomial curve .
•Find out
•Equate and compute ….
and
81()
2
0 1 2
ˆ
fxaaxax=++ () ()
' ''ˆˆ
fxandfx ()
'ˆ
0fx= *
x * 1
2
2
a
x
a
=− ()
''
2
ˆ
20fxa=
Approximation of the Minimum:
Polynomial Approximation
•The more accurate results could be found out
by further refining the approximation, i.e. …
•Now are available, so by
comparing respective
discard any of the above points and …
continue the procedure…
82*
,,
lim u
xxxandx ()()() ()
*
,,
l im u
fxfxfxandfx
Approximation of the Minimum:
Polynomial Approximation
•….till convergence, i.e. until the two successive
estimates of the minimum points are
sufficiently close to each other, i.e.
•Higher order polynomial curve is also possible.
83()*1 *r r
xx
+
−
Example: Polynomial Approximation
•Example:
•Bounded by
•can also be found by an average
84()24
x
fxxe=−+ 0.5,1.309 2.618
l im u
x x andx= = = im
x
Example: Polynomial Approximation
85() ()
()
() ()
()
() ()
()
() () ()
00
00
00
0 0 0
0 1 2
0.5 1.649
1.309 0.466
2.618 5.237
, 5.821 2.410
ll
im im
uu
x fx
x fx
x fx
findaa anda
=→ =
=→ =
=→ =
=− = ()
( )
()
()( )
*0
*0
5.821
1.208
22.410
1.2080.515
x
fxf
−
==
==
Example: Polynomial Approximation
86() ()
()
() ()
()
() ()
()
() ()
()
00
00
00
*0 *0
0.5 1.649
1.309 0.466
2.618 5.237
1.208 0.515
ll
im im
uu
x fx
x fx
x fx
x fx
=→ =
=→ =
=→ =
=→ = () () ()
()
()
()
*0 0 *0 0
im im
xxandfxfx
Example: Polynomial Approximation
87() () ()
()
()
()
() () ()
()
()
()
() () ()
()
()
()
1 *0 1 *0
1 0 1 0
1 0 1 0
1.208 0.515
1.309 0.466
2.618 5.237
...
ll
im im im im
u u u u
xx fxfx
xx fxfx
xx fxfx
andsoon
==→ = =
==→ ==
==→ ==
Bisection Method
Using the derivative information, the minimum is supposed to be bracketed in
the interval �,�if two conditions �
′
�<0and �
′
�>0along with the
convergence condition are satisfied .
Step 1Choose two points �and �such that �
′
�<0and �
′
�>0,
convergence parameter ε.
Set �
1=�and �
2=�.
Step 2Calculate �=(�
1+�
2)/2and evaluate (calculate) �
′
�.
Step 3If |�
′
�|≤εThen Terminate
Else if �
′
�<0set �
1=�and go to Step 2
Else if�
′
�>0set �
2=�and go to Step 2
Solve:
????????????=??????
�
+
��
??????
Choose �=�and �=�and ε = 0.001
92
Secant Method
In this method, both the magnitude and sign of the derivatives at two points
under consideration are used. The derivative of the function is assumed to
vary linearly between two chosen boundary points. As the derivative at these
two points changes sign, the point with zero derivative most probably lies
between these two points.
Step 1Choose two points �and �such that �
′
�<0and �
′
�>0,
convergence parameter ε.
Set �
1=�and �
2=�.
Step 2Calculate �=�
2−
??????
′
(??????2)
(??????
′
??????
2−??????
′
(??????
1))/(??????
2−??????
1)
and evaluate (calculate)
�
′
�.
Step 3If |�
′
�|≤εThen Terminate
Else if �
′
�<0set �
1=�and go to Step 2
Else if�
′
�>0set �
2=�and go to Step 2
93
Multivariable Optimization
•Steepest Descent Method
1.Estimate a starting design , select a
convergence parameter .
2.Calculate the gradient of the function at
point , i.e.
3. If , stop!!! … And accept the current
design and associated as the final
solution.
96() ()
()
()
()
()
()
()
()
12
...
k k k
kk
n
f f f
f
x x x
==
X X X
cX ()k
X 0.0001 ()fX ()k
X c ()k
c ()k
X ()fX
Multivariable Optimization
4.Based on the desirable search direction
condition, , decide the search
direction at the current point to be
.
5.Calculate/pre-specify the step size .
6.Update the new design point
and calculate the function value at , i.e.
.
7.Go to step 2.
97()()
0
kk
cd ()k
X () ()kk
=−dc () () ()1k k k
+
=+XXd ()
( )
1k
f
+
X ()1k+
X
Steepest Descent Method
98
Limitations of Steepest Descent
Method
•Limitations of Steepest Descent Method:
1.Large number of iterations required (zigzag??) and
process becomes quite slow to converge.
2.Every iteration is independent of the previous
iteration(s), so inefficient.
3.Only first order information (gradient) is
used, i.e. no Hessian is used so no information
about the condition number is exploited.
4.Process is faster initially but slows down when
reaches closer to local minimum.
5.Local minimum is guaranteed and not the global
minimum.
99
Example
100( )()
()
()
()
()
()
()
()
()
( )()
()
22
12 1 2 12
0
00
0
1 2 2 1
12
0
,2
1,0 0.05
1. 1,0
2. 22,222,2
3.22
Minimizefxxfxxxx
startingpoint and
Startingdesign
ff
xxxx
xx
==+−
=
= =−−=−
=
X
X
XX
c
c
Example
101() ()
()
() () ()
()
()
()
()
()
( )
()
()
()
()
()
()
()
( )()
()
00
1 0 0
11
12
1
1
11
1
1 2 2 1
12
1
4. 2,2
5.
..10.2520.5;00.2520.5
.. 0.5,0.5
0.5
22,220,0
0...
Set
Calculatethenewdesignpoint:
iex x
ie
andf
ff
xxxx
xx
=−=−
=+
=+−= =+=
=
=−
= =−−=
=→
dc
XXd
X
X
XX
c
c
Example
102()()
( )
()
()
()
( )
() ()()
( )
( )
()
()
1 1 1
12
1
1 1 1
1 2 1 2
1
!!! ,
.. : , ,
final finalfinal
final
stopandacceptthecurrentvariablevaluesxx
andassociatedfasfinalsolution
iefinalsolution xx xx
ff
=
= ==
=
X
X
XX
XX
103
Example
104( )()
( )
222
123 1 2 3 12 23
,, 2222
2,4,10 0.005
Minimizefxxxfxxxxxxx
startingpoint and
==++++
X
Modified Steepest Descent Method:
Conjugate Gradient Method
•Every iteration solution is dependent on the
previous iteration(s), so more efficient than the
steepest descent method. Uses the deflected
direction based on the previous iteration
direction.
•So replace with
which satisfies the desirable descent direction
condition.
105() ()kk
=−dc () () ()()
() () ()
( )
1
2
1
k k kk
k k k
where
−
−
=−+
=
dcd
cc
Newton Method
•With the steepest descent method, only first-
order derivative information is used.
•If second-order derivatives were available, we
could use them to represent the cost surface
more accurately, and a better search direction
could be found, i.e. use the Hessian
•Better Convergence (‘quadratic rate of
convergence’) if the minimum is located in a
the neighborhood of certain radius.
106
Newton Method
•The quadratic expression for the change in
design at some point can be
represented using Taylor Series as:
•If is positive definite (eigenvaluesand
determinant are positive) then the global
minimum is guaranteed.
107X X ( )() 0.5
TT
ff+=++XXXcXXHX H
Newton Method
•The optimality condition
•Reduces the Taylor’s Expansion to
108()
'
0f =X () () () ()
1
1 0 0 0
0
...
assumingisnon-singular/positivedefinite
so
−
+=
=−
=++
cHX
H
XHc
XXXXd
Newton Method
1. Estimate a starting design point
2. Select the convergence parameter
3. Calculate the gradient of the function
at point , i.e.
4. If , stop!!! … And accept the current
design and associated as the final
solution.
109()k
X 0.001= () ()
()
()
()
()
()
()
()
12
...
k k k
kk
n
f f f
f
x x x
==
X X X
cX ()fX ()k
X ()k
c ()k
c ()k
X ()fX
Newton Method
5. Calculate Hessian at point .
6. Calculate the ‘search direction’:
7. Check whether the search direction is ‘desirable’
by checking its positive definiteness or the
following condition
110()k
H ()k
X () () ()
1
k k k
−
=−
dHc () () () () ()
1
00
TT
k k k k k
−
−
cd c Hc
8. Update the design
111() () ()1k k k
+
=+XXd
Example
112()
()
()
22
1 12 2
0
3227
5,10,0.0001
fxxxx
=+++
==
X
X
Example
113()
()
()
()
()
( )
22
1 12 2
0
0
1 2 1 2
0 22
0
3227
1.510
2.62245050
5050502...
62
3.
24
4.
...
fxxxx
xxxx
convergenceconditionnotsatisfied
Checktheconditionfordesirablesearchdirection
eigenvaluespositivedefinitenessor
=+++
=
=+ +=
=+=
=
X
X
c
c
H
( )...condition
Example
115()
()
()
4 2 2 2
1 12 2 1 1
0
102010 25
1,3,0.0001
f xxxxxx
=−++−+
=− =
X
X
Adv/disadv
•Comparatively more time consuming calculations
such as Hessian (second order derivatives), may
not be possible for some problems.
•Positive definiteness of the Hessian is the must to
confirm the search direction is the desirable one.
•Very quick to converge for convex functions
•Quadratic rate of convergence and hence
converges very fast.
•In case of the large sized problems, inversions are
too difficult.
116
Quasi-Newton Method
•Use of first-order information to generate the
Hessian and/or its inverse, i.e. incorporating
the advantages of both ‘Steepest Descent’ and
‘Newton Method’. ….Hybrid???
117
Quasi-Newton Method:
Davidon-Fletcher-Powell (DFP) Method
Approximates the ‘Inverse of Hessian’ using
only first order derivatives
118
Davidon-Fletcher-Powell (DFP) Method
Iteration 1 (similar as Steepest Descent)
1.Estimate a starting design , select a
convergence parameter .
2.Calculate the gradient of the function at
point , i.e.
3. If , stop!!! … And accept the current
design and associated as the final
solution.
119() ()
()
()
()
()
()
()
()
12
...
k k k
kk
n
f f f
f
x x x
==
X X X
cX ()k
X 0.0001 ()fX ()k
X ()k
c ()k
X ()fX
Davidon-Fletcher-Powell (DFP) Method
4. Choose a symmetric positive definite
inverse of Hessian . Generally, .
5.Calculate the search direction and
check whether it is ‘desirable’ using following
condition
6.Calculate (pre-specify) the step size .
7.Update the new design point
and calculate the function value at , i.e.
.
120 () () ()1k k k
+
=+XXd ()
( )
1k
f
+
X ()1k+
X nn ()k
A ()k
=AI () () () ()()
( )00
TT
k k k kk
cd cAc () ()()k kk
=−dAc
Davidon-Fletcher-Powell (DFP) Method
Iteration 2 and further
8.Update the inverse of the Hessian
where
and…
121() () () ()1k k k k+
=++AABC ()
()()
()()
T
kk
k
kk
=
ss
B
sy ()
()()
()()
T
kk
k
kk
−
=
zz
C
yz
Davidon-Fletcher-Powell (DFP) Method
9.Go to step 2
122() ()() () () () () ()()
()
()
( )
()
( )
()
( )
1
1 1 1
1
12
; ; .
...
k kk k k k k k k
k k k
k
n
f f f
x x x
+
+ + +
+
= =− =
=
sdycczAy
X X X
c
Quasi-Newton Method: Broyden-
Fletcher-Goldfarb-Shanno(BFGS)
Method
Approximates the ‘Hessian’ using only first
order derivatives. Also referred to as ‘Direct
Hessian Updating’ method.
123
Broyden-Fletcher-Goldfarb-Shanno
(BFGS)Method
Iteration 1 (same as Steepest Descent)
1.Estimate a starting design , select a
convergence parameter .
2.Calculate the gradient of the function at
point , i.e.
3. If , stop!!! … And accept the current
design and associated as the final
solution.
124() ()
()
()
()
()
()
()
()
12
...
k k k
kk
n
f f f
f
x x x
==
X X X
cX ()k
X 0.0001 ()fX ()k
X ()k
c ()k
X ()fX
Broyden-Fletcher-Goldfarb-Shanno
(BFGS)Method
4. Choose a symmetric positive definite
Hessian . Generally, .
5.Calculate the search direction by solving the
system of linear equations to
find the search direction .
6.Calculate (pre-specify) the step size .
7.Update the new design point
and calculate the function value at , i.e.
.
125 () () ()1k k k
+
=+XXd ()
( )
1k
f
+
X ()1k+
X nn ()k
H ()k
=HI ()() ()kk k
=−Hdc ()k
d
Broyden-Fletcher-Goldfarb-Shanno
(BFGS)Method
Iteration 2 and further
8.Update the inverse of the Hessian
where
and…
126() () () ()1k k k k+
=++HHDE ()
()()
()()
T
kk
k
kk
=
yy
D
ys ()
()()
()()
T
kk
k
kk
=
cc
E
cd
Broyden-Fletcher-Goldfarb-Shanno
(BFGS)Method
9.Go to step 2
127() ()() () () ()
()
()
( )
()
( )
()
( )
1
1 1 1
1
12
;
...
k kk k k k
k k k
k
n
f f f
x x x
+
+ + +
+
= =−
=
sdycc
X X X
c
Multi-Criteria Decision Making
128
Weighted Scoring Model/Weighted
Sum Model
•Prioritizing Requirements
•Rank
•Systematic Process
129
It is very important to state here that it is
applicable only when all the data are
expressed in exactly the same unit. If this
is not the case, then the final result is
equivalent to"adding apples and oranges."
•Criteria for Weighing
–Value
–Risk
–Difficulty
–Success
–Compliance
–Relationships
–Stakeholder
–Urgency
130
Assign % or
Score to each
criterion
Example 1
Requirement Score
Criteria Weight A B C
X 50% 70 45 40
Y 30% 40 85 30
Z 20% 40 80 50
Weighted Scores 100% 55 64 39
Analytic Hierarchy Process (AHP)
•Information is decomposed into a hierarchy of
alternatives and criteria
•Information is then synthesized to determine
relative ranking of alternatives
•Both qualitative and quantitative information
can be compared using informed judgements
to derive weights and priorities
Example: Car Selection
•Objective
–Selecting a car
•Criteria
–Style, Reliability, Fuel-economy
•Alternatives
–Civic Coupe, Saturn Coupe, Ford Escort, Mazda
Miata
Mat
Cost
Mfg
Cost
ReparabilityDurabilityReliabilityTime to
Produce
Mat Cost
Mfg Cost
Reparability
Durability
Reliability
Time to
Produce
138
Mat CostMfg CostReparabilityDurabilityReliability Time to
Produce
Mat Cost 1 0.33 0.2 0.11 0.14 3
Mfg Cost 3 1 0.33 0.14 0.33 3
Reparability5 3 1 0.2 0.2 3
Durability 9 7 5 1 3 7
Reliability7 3 5 0.33 1 9
Time to
Produce
0.33 0.33 0.33 0.14 0.11 1
Sum 25.33 14.66 11.86 1.92 4.78 26
139
140
Mat CostMfg CostReparabilityDurabilityReliability Time to
Produce
Mat Cost 1 0.33 0.2 0.11 0.14 3
Mfg Cost 3 1 0.33 0.14 0.33 3
Reparability5 3 1 0.2 0.2 3
Durability 9 7 5 1 3 7
Reliability7 3 5 0.33 1 9
Time to
Produce
0.33 0.33 0.33 0.14 0.11 1
Sum 25.33 14.66 11.86 1.92 4.78 26
Mat CostMfg CostReparabilityDurabilityReliability Time to
Produce
Mat Cost0.03947890.022510.0168634060.0572920.02928870.115385
Mfg Cost0.11843660.068210.0278246210.0729170.06903770.115385
Reparability0.19739440.204640.0843170320.1041670.0418410.115385
Durability0.35530990.477490.421585160.5208330.62761510.269231
Reliability0.27635220.204640.421585160.1718750.2092050.346154
Time to
Produce
0.0130280.022510.0278246210.0729170.02301260.038462
Sum 1 1 1 1 1 1
Mat CostMfg CostReparabilityDurabilityReliability Time to
Produce
Criterion
Weights
Mat Cost0.03947890.022510.0168634060.0572920.02928870.1153850.04680292
Mfg Cost0.11843660.068210.0278246210.0729170.06903770.1153850.0786355
Reparability0.19739440.204640.0843170320.1041670.0418410.1153850.1246237
Durability0.35530990.477490.421585160.5208330.62761510.2692310.445344
Reliability0.27635220.204640.421585160.1718750.2092050.3461540.27163494
Time to
Produce
0.0130280.022510.0278246210.0729170.02301260.0384620.03295894
Sum 1 1 1 1 1 1
141
142
No Logical Errors A>B>C so A>C
{Ws} Consis CIRandom
Index (RI)
0.286 6.093
0.515 6.526 0.12191.25
0.839 6.742
3.09 6.95
1.908 7.022CI/RI= 0.09752So Consistent
0.21 6.324
Average (λ)6.6095
Number of
Criteria (n)
6
143
Random Search: Random Jumping
Method
where
152;1,2,...,
l i u
xxxi n= ()( )
12
,,...,
n
ffxxx=X ( )
( )
( )
()
1, 11, 1,
1
2 2, 22, 2,
1
1 , , ,
1
l u l
l u l
n
nl nnu nl
xrxxx
x xrxx
f
x xrxx
+−
+−
=
+−
X ()
( )()
12
12
,,..., 0,1
..,,...,0,1
n
n
rrrarerandomnumbersfrom
ierrr
Random Search: Random Jumping
Method
choose very
large value of m
153()
( )
( )
( )
()
()
()
1
1, 11, 1,
1
2 2, 22, 2,
2
2 , , ,
2
3
l u l
l u l
n
nl nnu nl
m
f
xrxxx
x xrxx
f
x Choosethebestoutthemxrxx
f
f
+−
+−
=
+−
X
X
X
X
Random Search: Random Walk
Method
1.Estimate a starting design , select a
convergence parameter .
2.Generate a set of random numbers
and formulate the search direction vector
154()k
X 0.0001 ( )()
12
,,...,1,1
n
rrr − n ()
11
22
2 2 2
12
11
...
k
n
nn
rr
rr
R rrr
rr
==
+++
d
Random Search: Random Walk
Method
3.
155()
()
( )()
( )()
2 2 2
12
12
12
.....1 .
1 ...
,,...,1,1
...
,,...,1,1
k
n
k
n
n
ifrrrieRacceptandcontinue
elseifRdiscardand
findthenewsetofrandomnumbersrrr
andcontinuetill
foundanewsetofrandomnumbersrrr
satisfyingtheconditionR
+++
−
−
d
d
1.
Random Search: Random Walk
Method
4. … once found, update
and compute
5. Go to step 2
…continue till convergence, i.e.
BUT this may happen ???
Go back to step 2.
156() () ()1k k k
+
=+XXd ()
( )
1k
f
+
X ()
( )
()
()
1kk
ff
+
XX ()
( )
()
()
1kk
ff
+
−XX
Adv.
•This method works for any kind of function
(discontinuous, non-differentiable, etc.)
•Useful when function has several local minima
•Useful especially, to locate the region in the
neighborhood of global minimum.
157
Random Search: Constrained
158
Random Search: Constrained
•Penalty Function Method:
Will be discussed very soon
159
Grid Search Method
•Set up a grid in the design space
choose points in the
range inclusive.
Requires a large
number of grid
points to be evaluated:
no. of grid points =
= no. of variables
1601
x 2
x 1,u
x 1,l
x 2,l
x 2,u
x n
m ( ),,
,
iliu
xx n n
Solution of Constrained Problems
using Unconstrained Optimization
Methods
•Basic Idea: Form a composite function using
the cost and constraint function
parameterized by a penalty parameter.
•Penalty parameter applies larger penalty for
larger constraint violation and smaller penalty
for smaller violation.
•Penalty parameter is adaptive specific to the
magnitude of the constraint violation
162
Solution of Constrained Problems
using Unconstrained Optimization
Methods
•Referred to as Penalty Function method,
163()
()
()
()()( )() () ()
() ()( )
2 2
11
01,...,
01,...,
,,
max0,
i
j
p m
ij
ij
jj
Minimizef
h ip
g jm
rfrh g
whereg g
+
==
+
==
=
=+ +
=
X
X
X
hXgX X X X
XX
Adv and Disadv
•Equality and inequality constraints can be
easily accommodated
•Can be started from any starting point (no
need to have an educated guess)
•The design space needs to be continuous
•The rate of convergence essentially depends
on the rate of change of penalty parameter.
164
Barrier Function Method
•The initial solution needs to be feasible
•Barriers are constructed around the feasible
region
165()
()
()( )
()
()( ) ()( )
1
1
01,...,
11
,
1
, log
j
m
j j
m
j
j
Minimizef
g jm
r barrierfunctionmethod
rg
r g logbarrierfunctionmethod
r
=
=
=
−
=
=−
X
X
gX
X
gX X
Barrier Function Method
•The barriers do not allow the solution to go
out of the feasibility region and maintains the
feasibility by imposing the penalty.
•For both the methods:
166() ()
*
r ff→ → XX
Adv and Disadvof Barrier Function
Method
•Applicable to inequality constraints only
•Starting design point needs to be feasible
however, this makes the final solution to be
acceptable.
167
Adv and Disadvof Barrier Function
Method
•These method tend to perturb itself at the
boundary of the feasible region where true
optimum may be located because
may impose high penalty.
•There are chances of Hessian becoming ill-
conditioned when (may become
singular).
168rveryhigh→ r very high→
Multiplier Method, Augmented
LagrangianMethod
•Some of the disadvantages are removed such
as and so the chances of Hessian
becoming ill-conditioned and may become
singular.
169r very high→
Multiplier Method, Augmented
LagrangianMethod
170()
()
()
()()( ) ( ) ( )
2
2
''
11
01,...,
01,...,
11
,,,
22
i
j
p m
ii i j j i
ij
Minimizef
h ip
g jm
rh rg
+
==
==
=
= ++ +
X
X
X
hXgXr θ
Linearization of Constrained Problem
171()
()
()
() ()
( )
()
()
()
()
()
() ()
( )
()
()
()
()
()
() ()
( )
()
()
()
()
()
01,...,
01,...,
':
01,...,
01,...,
j
j
k k k k k T
k k k k k T
j j j
k k k k k T
j j j
Minimizef
h jp
g jm
TaylorsExpansionasthebasisofiterativeoptimization
Minimizef f f
Subjectto
h h h jp
g g g jm
==
=
+ +
+ + ==
+ + =
X
X
X
XXX XX
XX X XX
XX X XX
Linearization of Constrained Problem
•Generalized Representation ??
172()
()
()
()
()
()
()
()
()
()
()
()
()
,,
k
k
k
jj
k
jj
k k k
jj
i ij ij
i i i
k
ii
ff
eh
bg
f h g
c n a
x x x
dx
=
=−
=−
= = =
=
X
X
X
X X X
Linearization of Constrained Problem
•Generalized Representation
1731
1
1
1,...,
1,...,
n
T
ii
i
n
T
iji j
i
n
T
iji j
i
Minimizefcd f
Subjectto
ndejp
adbjm
=
=
=
= =
===
=
cd
Nde
Adb
Illustrative Example
174()
()
()
()
()
()
22
1 2 12
22
1 1 2
21
32
0
3
11
1.00
66
0
0
1,1
Minimizefxxxx
Subjectto
g xx
gx
gx
x
=+−
=+−
=
=
=
X
X
X
X
Illustrative Example
175
Illustrative Example
•Gradients of Cost Function and constraint
Functions are
176( )
()
()
()
()
()
()
()
( )
()
()
1 2 2 1
1 1 2
2
3
00
0
0
1
23,23
22
,
66
1,0
0,1
1,1
1,1
11
,
33
fxxxx
gxx
g
g
f
g
=−−
=
=−
=−
==−−
=
=
Xc
X
X
Illustrative Example
177()
()
()
()
()
()
()
()
()
()
()
()
()
( )
()
()
0
0
0 0 0
1 1 2 2 3 3
00
0
1
1,1
1.0
2
, 1, 1
3
1,1
11
,
33
f
bg bg bg
f
g
=
=−
= == == =
==−−
=
X
X
X X X
Xc
X
Illustrative Example
178()
()
()
()
()
()
0
11
0
22
0
33
2
2
1
3
10 3
3
, 1 1
1
01 1
13
bg
bg
bg
==
−
= === =
−
==
X
A b X
X ()
()
0
1
11
,
33
g
=
X ()
2
1,0g=− ()
3
0,1g=−
Illustrative Example
179
1
1 2
1
1
2
11
1,...,
2
1
10 3
3
1
1
01 1
3
n
T
ii
i
n
T
iji j
i
d
Minimizefcd f
d
Subjectto
adbjm
d
d
=
=
= ==−−
=
−
−
cd
Adb
Illustrative Example
•Expanded:
•This is the linearizedproblem in terms of:
18012
12
1
2
112
,
333
1,
1
Minimizefdd
Subjectto
dd
d
d
=−−
+
−
− 12
dandd
Illustrative Example
181
•Linearization in terms of .12
xandx
Sequential Linear Programming
Algorithm
•Linear format
Must be +ve
1821
1
1
1,...,
1,...,
n
T
ii
i
n
T
iji j
i
n
T
iji j
i
Minimizefcd f
Subjectto
ndejp
adbjm
=
=
=
= =
===
=
cd
Nde
Adb
Sequential Linear Programming
Algorithm
•As the linearizedproblem may not have bounded
solution, the limits on the design needs to be
imposed.
The design is
still linear abd
can be solved
using LP
Methods
183() ()
,1,...,
kk
il i iu
d in− =
Sequential Linear Programming
Algorithm
•Selection of the proper limits
requires some
problem information/knowledge
•Should choose based on the trial and error or
should be made adaptive. This may help
avoiding the entry in the infeasible region.
184() ()
,,1,...,
kk
il iu
and iin=
189
Nature Inspired Algorithms
Bio-Inspired Algo.
Evolutionary Algo.
Genetic Algorithm
Evolution Strategies
Genetic Programming
Evolutionary Programming
Differential Evolution
Cultural / Social
Algorithm
Artificial Immune
System
Bacteria Foraging
& many others
Swarm Intelligence
Algo.
Ant Colony
Cat Swarm Opt.
Cuckoo Search
Firefly Algo
Bat algorithm
Physical, Chemical
Systems
Based Algo.
Simulated
Annealing
Harmony Search
190
Cultural / Social Algorithm (SA)
SA based on Socio-
Political Ideologies
Ideology Algorithm
Election Algorithm
Election Campaign
Algorithm
SA based on Sports
CompetitiveBehavior
Soccer League
Competition Algorithm
League Championship
Algorithm
SA based on Social and
Cultural Interaction
Cohort Intelligence
Teaching Learning
Optimization
Social Group
Optimization
Social Learning
Optimization
Cultural Evolution
Algorithm
Social Emotional
Optimization
Socio Evolution &
Learning Optimization
SA based on
Colonization
Society and civilization
Optimization Algorithm
Imperialist Competitive
Algorithm
Anarchic Society
Optimization
Metaheuristics
•Nature Inspired Methods
–Bio-inspired Methods
–Swarm Based Methods
191
Evolution
192
Darwin’s Theories
193
Darwin’s Theories
194
Evolution
195
196
Genetic Algorithm
•John Holland introduced GA
in 1960 inspired from
Darwin’s Theory of Evolution
and Survival of Fittest.
•GA became Popular when
first book published:
‘Adaptation in Natural and
Artificial Systems’ by John
Holland in 1975 and
applications came up in
1980.
197
Genetic Algorithm
•Exploits the evolution through
–Inheritance
–Selection
–crossover
–Mutation
198
199
6 5 0 7
7 7 5 7
GA Operators: Crossover &Mutation
200
Termination
•The algorithm terminates if the population has
converged, i.e. population does not produce
offspring which are significantly different from
the previous generation).
201
Illustrative Example
202
Illustrative Example 2
•The OneMaxor MaxOne Problem (or
BitCounting) is a simple problem consisting in
maximizing the number of ones of a bitstring.
203