UNIT- II OPTIMIZATION OPTIMIZATION - DERIVATIVE-BASED OPTIMIZATION – DESCENT METHODS – THE METHOD OF STEEPEST DESCENT – CLASSICAL NEWTON’S METHOD – STEP SIZE DETERMINATION – DERIVATIVE-FREE OPTIMIZATION – GENETIC ALGORITHMS – SIMULATED ANNEALING – RANDOM SEARCH – DOWNHILL SIMPLEX SEARCH.
OPTIMIZATION SOFT COMPUTING Optimization is the process of finding the best solution to a problem among a set of possible solutions. Soft computing is a branch of artificial intelligence that is based on the idea of computing with uncertain, imprecise or incomplete information. Soft computing techniques include fuzzy logic, neural networks, evolutionary algorithms, and probabilistic reasoning. In optimization, soft computing techniques can be used to search for optimal solutions in complex and uncertain problem domains. For example, evolutionary algorithms such as genetic algorithms can be used to search for optimal solutions in large search spaces, where traditional optimization methods may be too slow or impractical. Fuzzy logic can be used to handle imprecise or uncertain data in optimization problems, while neural networks can be used for pattern recognition and prediction in optimization.
some specific applications of soft computing techniques in optimization include: Multi-objective optimization: Soft computing techniques can be used to optimize multiple conflicting objectives simultaneously, which is a common problem in many real-world applications. Optimization under uncertainty: Soft computing techniques can handle uncertain and noisy data in optimization problems, which is common in many real-world applications. Optimization in dynamic environments: Soft computing techniques can be adapted to changing problem environments, which is important in optimization problems that involve dynamic systems. Overall, soft computing techniques offer a powerful set of tools for optimization problems that are complex, uncertain, or dynamic.
INTRODUCTION TO OPTIMIZATION TECHNIQUES: Derivative-based optimization is a type of optimization that relies on the use of derivatives, such as gradients and Hessians, to guide the search for the optimal solution. In soft computing, derivative-based optimization can be used with techniques such as neural networks, support vector machines, and evolutionary algorithms to optimize various objective functions. Neural networks are a type of soft computing technique that can be used in derivative-based optimization. The backpropagation algorithm, which is widely used for training neural networks, is a derivative-based optimization algorithm that adjusts the weights and biases of the network to minimize the error between the predicted and actual outputs. Other techniques such as conjugate gradient methods and Levenberg-Marquardt algorithms can also be used for training neural networks with derivative-based optimization.
Support vector machines (SVMs) are another soft computing technique that can be used with derivative-based optimization. SVMs use the concept of margin to classify data points into different classes. Derivative-based optimization can be used to find the optimal hyperplane that maximizes the margin between the classes. Techniques such as gradient descent and Newton's method can be used for optimizing the SVM objective function. Evolutionary algorithms such as genetic algorithms and particle swarm optimization can also be used with derivative-based optimization. In these algorithms, the search for the optimal solution is guided by a fitness function that evaluates the quality of each candidate solution. The derivative of the fitness function can be used to guide the search towards better solutions.
DESCENT METHODS Descent methods are a family of optimization techniques that rely on iterative updates to search for the optimal solution of an objective function. In soft computing, descent methods can be used in various techniques such as neural networks, support vector machines, and evolutionary algorithms to optimize various objective functions. Gradient descent is one of the most widely used descent methods in soft computing. It is used in many applications such as training neural networks and optimizing support vector machines. Gradient descent is based on the use of the gradient of the objective function to update the parameters of the model in the direction of steepest descent. This is done by calculating the gradient of the objective function with respect to the parameters, and then updating the parameters by subtracting a scaled version of the gradient.
Stochastic gradient descent (SGD) is a variation of gradient descent that is widely used in deep learning. SGD updates the parameters of the model using only a subset of the training data at each iteration, which makes it faster and more scalable than regular gradient descent. In SGD, the gradient is estimated using a small subset of the data, which introduces some noise into the update process. This noise can help prevent the algorithm from getting stuck in local minima. Other descent methods that are used in soft computing include conjugate gradient methods, Newton's method, and quasi-Newton methods such as Broyden-Fletcher-Goldfarb-Shanno (BFGS) method. These methods use more advanced techniques such as line search and Hessian matrix approximation to speed up the convergence of the optimization algorithm. Overall, descent methods are a powerful set of optimization techniques that are widely used in soft computing. They can be used in various techniques such as neural networks, support vector machines, and evolutionary algorithms to optimize various objective functions. The choice of the descent method depends on the specific problem and the properties of the objective function.
METHOD OF STEEPEST DESCENT The Method of Steepest Descent is a classic optimization method that is widely used in soft computing to solve optimization problems. The method is a gradient-based optimization algorithm that iteratively updates the parameters of a model in the direction of the negative gradient of the objective function. In the context of soft computing, the Method of Steepest Descent is commonly used in training neural networks and optimizing support vector machines. In these applications, the objective function is typically a loss function that measures the discrepancy between the predicted output and the actual output. The method works by starting with an initial guess for the parameters of the model and then iteratively updating the parameters in the direction of the negative gradient of the objective function. At each iteration, the method calculates the gradient of the objective function with respect to the parameters and then updates the parameters by subtracting a scaled version of the gradient.
The scaling factor is known as the learning rate and controls the size of the update at each iteration. If the learning rate is too large, the algorithm may overshoot the minimum and fail to converge. If the learning rate is too small, the algorithm may converge too slowly. The method continues to iterate until the convergence criteria are met. The convergence criteria can be based on the change in the objective function between iterations, the norm of the gradient, or other stopping criteria. The Method of Steepest Descent is a simple and efficient optimization method that is widely used in soft computing. However, it can suffer from slow convergence and may get stuck in local minima. To overcome these issues, other optimization methods such as stochastic gradient descent, conjugate gradient methods, and quasi-Newton methods can be used in conjunction with the Method of Steepest Descent.
Suppose that we would like to find the minimum of a function f(x), x ∈ Rn, and f : Rn → R. We will denote the gradient of f by gk = g(xk) = ∇f(xk). The general idea behind most minimization methods is to compute a step along a given search direction, dk, for example, xk+1 = xk + αkdk, k = 0, 1,..., (2) where the step length, αk, is chosen so that αk = arg min α f(xk + αdk). Here arg min refers to the argument of the minimum for the given function. For the steepest descent method, the search direction is given by dk = −∇f(xk). The steepest descent algorithm can now be written as follows: The two main computational advantages of the steepest descent algorithm is the ease with which a computer algorithm can be implemented and the low storage requirements necessary, O(n). The main work requirement is the line search required to compute the step length, αk and the computation of the gradient.
Algorithm 1 Steepest Descent Method Given an initial x0, d0 = −g0, and a convergence tolerance tol for k = 0 to maxiter do Set αk = argmin φ(α) = f(xk) − αgk xk+1 = xk − αkgk Compute gk+1 = ∇f(xk+1) if ||gk+1||2 ≤ tol then converged end if end for
CLASSICAL NEWTON’S METHOD Classical Newton's Method is a widely used optimization method in soft computing. It is a second-order optimization method that uses the Hessian matrix of the objective function to guide the search for the optimal solution. In the context of soft computing, Newton's Method is commonly used to optimize the parameters of neural networks and support vector machines. The method starts with an initial guess for the parameters of the model and then iteratively updates the parameters using the following formula: θ_new = θ_old - H^-1 * ∇f(θ_old) where θ_old is the vector of parameters at the previous iteration, ∇f(θ_old) is the gradient of the objective function at θ_old, and H^-1 is the inverse of the Hessian matrix of the objective function at θ_old.
The Hessian matrix of the objective function is a matrix of second-order partial derivatives that describes the curvature of the objective function around θ_old. The inverse of the Hessian matrix can be seen as an approximation to the curvature of the objective function, and it provides information about the shape of the function in the vicinity of θ_old. Newton's Method can converge faster than first-order optimization methods such as gradient descent because it takes into account the curvature of the objective function. However, the Hessian matrix can be difficult to compute for high-dimensional problems, and it may be expensive to invert the matrix at each iteration. To overcome these issues, quasi-Newton methods such as the Broyden-Fletcher-Goldfarb-Shanno (BFGS) method and the limited-memory BFGS (L-BFGS) method can be used in soft computing. These methods approximate the inverse of the Hessian matrix and avoid the costly computation of the Hessian matrix at each iteration.
STEP SIZE DETERMINATION In soft computing, step size determination is a critical aspect of optimization algorithms. The step size, also known as the learning rate, controls the size of the update at each iteration of the optimization algorithm. Choosing an appropriate step size is important because it can affect the convergence rate and the final solution of the optimization problem. There are different methods for determining the step size in soft computing, and the choice of method depends on the specific optimization algorithm and the problem being solved. Some common methods for step size determination are: Fixed step size: In this method, the step size is fixed and does not change during the optimization process. This method is simple to implement but may not be optimal because it can lead to slow convergence or overshooting of the optimal solution. Backtracking line search: This method starts with an initial step size and then iteratively reduces the step size until a suitable step size is found. The method uses a search criteria, such as the Armijo-Goldstein rule or the Wolfe conditions, to determine the appropriate step size.
Adaptive step size: This method adjusts the step size at each iteration based on the gradient of the objective function or the progress of the optimization algorithm. Examples of adaptive step size methods include AdaGrad, RMSProp, and Adam. Trust region methods: In this method, the step size is determined by solving a trust region subproblem that approximates the objective function within a trust region around the current solution. The trust region subproblem is solved using optimization methods such as conjugate gradient or Newton's method. The choice of step size determination method depends on the specific optimization problem and the algorithm being used. Fixed step size methods may work well for simple problems, while more complex problems may require adaptive or trust region methods to find an optimal solution efficiently.
DERIVATIVE-FREE OPTIMIZATION Derivative-free optimization (DFO) is a set of optimization methods that do not require the computation of derivatives of the objective function. DFO methods are commonly used in soft computing when the objective function is not differentiable, noisy, or computationally expensive to evaluate. DFO methods search for the optimal solution by iteratively evaluating the objective function at different points in the search space. The objective function is evaluated using a set of search points, called a mesh, which is updated at each iteration based on the information gathered in the previous iterations. The goal of the DFO algorithm is to find the optimal solution using the minimum number of function evaluations.
There are several DFO methods used in soft computing, including: Simplex Method: The simplex method constructs a simplex, which is a set of n+1 points in an n-dimensional space. The objective function is evaluated at the vertices of the simplex, and the simplex is iteratively modified to improve the solution. Genetic Algorithms: Genetic algorithms use an evolutionary approach to optimization by creating a population of candidate solutions and using genetic operators such as mutation and crossover to generate new solutions. The objective function is evaluated for each candidate solution, and the algorithm selects the best solutions for the next generation. Particle Swarm Optimization: Particle swarm optimization is a population-based optimization method that simulates the movement of particles in a search space. The algorithm iteratively updates the position and velocity of the particles to explore the search space and find the optimal solution.
Simulated Annealing: Simulated annealing is a stochastic optimization method that simulates the annealing process of metals. The algorithm randomly perturbs the current solution and evaluates the objective function at the perturbed solution. The perturbed solution is accepted or rejected based on a probability function, and the algorithm slowly reduces the probability of accepting worse solutions over time. DFO methods are useful in soft computing when the objective function is difficult to optimize using traditional gradient-based methods or when the objective function is noisy or has many local minima. DFO methods can find the optimal solution without relying on the gradient of the objective function, making them more robust to noise and non-smoothness in the objective function.
GENETIC ALGORITHMS Genetic Algorithms(GAs) are adaptive heuristic search algorithms that belong to the larger part of evolutionary algorithms. Genetic algorithms are based on the ideas of natural selection and genetics. These are intelligent exploitation of random search provided with historical data to direct the search into the region of better performance in solution space. They are commonly used to generate high-quality solutions for optimization problems and search problems. Genetic algorithms simulate the process of natural selection which means those species who can adapt to changes in their environment are able to survive and reproduce and go to next generation. In simple words, they simulate “survival of the fittest” among individual of consecutive generation for solving a problem. Each generation consist of a population of individuals and each individual represents a point in search space and possible solution. Each individual is represented as a string of character/integer/float/bits. This string is analogous to the Chromosome.
F oundation of Genetic Algorithms Genetic algorithms are based on an analogy with genetic structure and behaviour of chromosomes of the population. Following is the foundation of GAs based on this analogy – Individual in population compete for resources and mate Those individuals who are successful (fittest) then mate to create more offspring than others Genes from “fittest” parent propagate throughout the generation, that is sometimes parents create offspring which is better than either parent. Thus each successive generation is more suited for their environment.
Search space The population of individuals are maintained within search space. Each individual represents a solution in search space for given problem. Each individual is coded as a finite length vector (analogous to chromosome) of components. These variable components are analogous to Genes. Thus a chromosome (individual) is composed of several genes (variable components).
Fitness Score A Fitness Score is given to each individual which shows the ability of an individual to “compete”. The individual having optimal fitness score (or near optimal) are sought. The GAs maintains the population of n individuals (chromosome/solutions) along with their fitness scores.The individuals having better fitness scores are given more chance to reproduce than others. The individuals with better fitness scores are selected who mate and produce better offspring by combining chromosomes of parents. The population size is static so the room has to be created for new arrivals. So, some individuals die and get replaced by new arrivals eventually creating new generation when all the mating opportunity of the old population is exhausted. It is hoped that over successive generations better solutions will arrive while least fit die. Each new generation has on average more “better genes” than the individual (solution) of previous generations. Thus each new generations have better “partial solutions” than previous generations. Once the offspring produced having no significant difference from offspring produced by previous populations, the population is converged. The algorithm is said to be converged to a set of solutions for the problem.
Operators of Genetic Algorithms Once the initial generation is created, the algorithm evolves the generation using following operators – 1) Selection Operator: The idea is to give preference to the individuals with good fitness scores and allow them to pass their genes to successive generations. 2) Crossover Operator: This represents mating between individuals. Two individuals are selected using selection operator and crossover sites are chosen randomly. Then the genes at these crossover sites are exchanged thus creating a completely new individual (offspring). For example –
3) Mutation Operator: The key idea is to insert random genes in offspring to maintain the diversity in the population to avoid premature convergence. For example – The whole algorithm can be summarized as – 1) Randomly initialize populations p 2) Determine fitness of population 3) Until convergence repeat: a) Select parents from population b) Crossover and generate new population c) Perform mutation on new population d) Calculate fitness for new population
Why use Genetic Algorithms They are Robust Provide optimisation over large space state. Unlike traditional AI, they do not break on slight change in input or presence of noise Application of Genetic Algorithms Genetic algorithms have many applications, some of them are – Recurrent Neural Network Mutation testing Code breaking Filtering and signal processing Learning fuzzy rule base etc
SIMULATED ANNEALING Simulated annealing is a popular optimization technique used in soft computing that is inspired by the process of annealing in metallurgy. It is a stochastic optimization algorithm that can be used to find the global minimum of a non-convex and multi-modal objective function. The basic idea behind simulated annealing is to start with an initial solution and then iteratively explore the search space by randomly perturbing the current solution. The algorithm then accepts or rejects the new solution based on a probabilistic criterion that depends on the difference in the objective function between the current and the new solution and a parameter called the temperature. The temperature is used to control the acceptance probability of the new solution. At the beginning of the algorithm, the temperature is high, and the algorithm accepts solutions that are worse than the current solution with a high probability. This allows the algorithm to explore the search space and escape from local optima. As the algorithm progresses, the temperature is gradually reduced, and the acceptance probability is decreased. This makes the algorithm converge to the global optimum.
Simulated annealing can be used in many soft computing applications, including machine learning, computer vision, and natural language processing. It is particularly useful in problems where the objective function is non-differentiable, noisy, or multi-modal, and where the search space is large. The performance of simulated annealing depends on the choice of the initial solution, the cooling schedule, and the acceptance probability function. The cooling schedule determines the rate at which the temperature is reduced, while the acceptance probability function determines the probability of accepting a new solution. These parameters can be tuned to improve the performance of the algorithm.
RANDOM SEARCHING Random search is a simple optimization technique used in soft computing that involves randomly sampling points from a search space and evaluating the objective function at those points. The basic idea behind random search is to randomly sample the search space and evaluate the objective function at those points to find the optimal solution. The algorithm starts by defining a search space and a number of iterations. At each iteration, the algorithm generates a random sample from the search space and evaluates the objective function at that point. The algorithm keeps track of the best solution found so far and updates it if a better solution is found. The algorithm continues for a predetermined number of iterations or until a termination criterion is met.
Random search has several advantages over other optimization techniques. It is simple to implement and does not require any gradient information, making it particularly useful for non-differentiable or noisy objective functions. It can also be used to optimize complex objective functions that have many local optima, as it is not trapped in a local minimum like gradient-based methods. However, random search can be slow and inefficient for high-dimensional search spaces. As the number of dimensions increases, the number of samples required to achieve a good approximation of the objective function also increases exponentially. Therefore, random search is best suited for low-dimensional problems.
THE DOWNHILL SIMPLEX METHOD The Downhill Simplex method is a multidimensional optimization method which uses geometric relationships to aid in finding function minimums. One distinct advantage of this method is that it does not require taking the derivative of the function being evaluated. Instead it creates its own pseudo-derivative by evaluating enough points to define a derivative for each independent variable of the function. The method revolves around a simplex; a geometric object which contains n+ 1 vertex where n is the number of independent variables. For instance, in the case of a one dimensional optimization the simplex would contain two points and represent a line segment. In a two dimensional optimization, a three pointed triangle would be used (Figure 1 THE DOWNHILL SIMPLEX METHOD
The downhill simplex method is an optimization algorithm that requires only function evaluations, and not calculation of derivatives. [1] In the N-dimensional space, a simplex is a polyhedron with N+ 1 vertex. We chose the N + 1 point and defined an initial simplex. The method iteratively updates the worst point by four operations: reflection, expansion, one-dimensional contraction, and multiple contractions. Fig.1 illustrates these operations in three-dimensional variable space.
Reflection involves moving the worst point (vertices) of the simplex (where the value of the objective function is the highest) to a point reflected through the remaining N points. If this point is better than the best point, then the method attempts to expand the simplex along this line. This operation is called expansion. On the other hand, if the new point is not much better than the previous point, then the simplex is contracted along one dimension from the highest point. This procedure is called contraction. Moreover, if the new point is worse than the previous points, the simplex is contracted along all dimensions toward the best point and steps down the valley. By repeating this series of operations, the method finds the optimal solution.
ALGORITHM The downhill simplex method repeats a series of the following steps: 1.Order and re-label the N+1 points so that f(xN+1) > ... > f(x2) > f(x1) 2.Generate a trial point xr by reflection, such that xr = xg + a* (xg - xN+1), where xg is the centroid of the N best points in the vertices of the simplex 3.If f(x1) < f(xr) < f(xN), replace xN+1 by xr 4.If f(xr) < f(x1), generate a new point xe by expansion, such that xe = xr + b* (xr - xg) 5.If f(xe) < f(xr), replace xN+1 by xe, otherwise replace xN+1 by xr 6.If f(xr) > f(xN), generate a new point xc by contraction, such that xc = xg + g* (xN+1 - xg). 7.If f(xc) < f(xN+1), replace xN+1 by xc 8.If f(xc) > f(xN+1), contract along all dimensions toward x1. Evaluate the objective function values at the N new vertices, xi = x1 + h* (xi - x1) It has been reported that efficient values of a, b, g, and h are a = 1.0, b = 1.0, g = 0.5, and h = 0.5. In this report, we use these parameter settings.