Gradient Search Methods: Steepest descent Part 3

We have looked at the method behind steepest descent and explored two ways of calculating our step size. In this post we will look at the limitations and the best use practices for the method of steepest descent. 

Overall the method of steepest descent is a reliable and standard test method. However, the sacrifice is speed. When the functions get more complicated and the evaluations more expensive the method of steepest descent begins to lag behind other methods.  This tradeoff is an important one, as the desire to reach a minimum is counteracted by the waiting time.

The method is steepest descent is generally used when a minimum must be found. This search method would not be appropriate for functions which need to be optimized quickly and roughly. This also brings us to our step finding method. 

Between the step finding methods we can drastically change the behavior of our search method. It also may require us to take derivatives of our function (which may or may not be applicable at our test points). A standard step size might be useful to minimize derivative evaluations, but at the cost of time. Nevertheless, the method of steepest is a reliable and consistent method. 

It is necessary to note that for any applicable function, the method of steepest descent will find a minimum. This guarentee must be noted when evaluating proper search methods, and makes steepest descent necessary for anyone’s optimization quiver. 
-Marcello

Gradient Search Methods: Steepest Descent Part 2

If you followed from last post we derived (loosely) our iterative step. With that we can go about with out steepest descent method. We start by choosing an initial guess for our function, for this function i have chosen (8,8). This is plotted below on the level curve.

Screen Shot 2015-07-07 at 9.16.48 PM

From this point we must calculate the gradient. I have taken the gradient of our function (f(x,y) =  x^2 -x+cos(y+x)+y^2). This gradient must now be evaluated at our point (8,8)

Screen Shot 2015-07-07 at 9.21.46 PM

As I mentioned earlier we really need two things for our method, a direction and a step size. With the gradient evaluated we have our direction, now we need our step size. This is where we have options. For the step size we have 2 general options. We can choose a constant step size (think GSS) or we can have a dynamic step size (think FS and NM). In this post I will show both.

For constant step size we have a few issues. As you can see in our level plots we are not exactly close to our minimum. so we can choose a large step size to get close fast. However, once we get close to our minimum the large step size will cause us to over shoot. Trying to correct the over shoot the method will then over shoot in the opposite direction. So that leaves us with choosing a smaller step size. Well this brings its own problems. The small step size takes a much longer time to reach our minimum when our initial guess is far. I went with the smaller step size for the constant step size version.

What about the dynamic step size? What we are actually doing with the dynamic step size is solving an optimization problem. We are finding the step size that minimizes our function the most. We want to minimize the the following f(x-alpha*d) where d is the gradient evaluated at x. Here are a few options to minimize this. We can use an ODSM from before to calculate a estimate of the best step size or we can use them to find the exact best step size. Both of these become very expensive with little benefit. Instead we use a method called backtracking line search. This method gives us an acceptable step size while not being to computationally heavy. With backtracking line search, we start with an initial step size,alpha, and we check if x+alpha*d is an acceptable point. WE do this by using the Armijo-Goldstein Condition. This condition ensures that the step size we take makes our function decrease. If it does not satisfy this condition we half our alpha and try again. We try until the condition is satisfied. When it is satisfied we continue with out descent method.

Now that we have our two methods, lets check how they compare. Here are the first three steps. On the left is our constant step size, on the right is backtracking line search step.

Screen Shot 2015-07-07 at 10.09.11 PMScreen Shot 2015-07-07 at 10.12.09 PM

As you can see the constant step size is slowly making its way there, however, the dynamic step size has already over shot the minimum and is backtracking. Lets try another three steps.

Screen Shot 2015-07-07 at 10.19.43 PMScreen Shot 2015-07-07 at 10.19.19 PM

More of the same in these graphs. The constant step size is slowly chugging along, while the dynamic step size has arrived at the minimum and waiting for the other method to catch up. The dynamic step size plot looks almost identical to the one before. This is because the method is overshooting and backtracking over the minimum until the steps get smaller than our tolerance. After that point our method stops and gives us our minimum. If we let both methods terminate we find that the constant step size takes over 40 iterations while the dynamic only takes 6. Now what about accuracy. Our minimum for our function is around (0.99865, 0.49865). Our constant step size gives us a minimum of (0.9990,0.4991), dynamic gives us (0.9986,0.4986). It looks like we have a clear winner. In this case, the dynamically changing step size not only beats the constant in number of iterations, but in accuracy as well.

I hope this clarified steepest descent. Next post we go over the limitations to this method (it will be brief there aren’t a ton). I have added my code below which is separated into constant and dynamic. Enjoy

% steepest decent 
%f(x,y) =  x^2 -x+cos(y+x)+y^2.

%% constant step size
x = zeros(1,10);% initialize arrays
y = zeros(1,10);

x(1)=8; %initial guess
y(1)=8;

%dx = -1+2*x-sin(x+y)
%dy = 2*y-sin(x+y)

a0=0.1; %initial step size

tol(1)=1;
tol(2)=2;

i=1;

while tol(1) >0.0001 && tol(2) > 0.0001 %stop conditions

    dx = -1+2*x(i)-sin(x(i)+y(i)); %gradient dx
    dy = 2*y(i) - sin(x(i)+y(i)); %gradient dy
 
    i=i+1;
    
    x(i) = x(i-1)-dx*a0; %iterative steps
    y(i) = y(i-1)-dy*a0;
    
    tol(1) = abs(x(i)-x(i-1)); %tolerance calc
    tol(2) = abs(y(i)-y(i-1));
    
    
end

%% Backtracking line search

x = zeros(1,10);
y = zeros(1,10);

x(1)=3;
y(1)=3;

%dx = -1+2*x-sin(x+y)
%dy = 2*y-sin(x+y)

ay=1;
ax=1;

tol(1)=1;
tol(2)=1;

i=1;

while tol(1) >0.0001 | tol(2) > 0.0001

    dx = -1+2*x(i)-sin(x(i)+y(i));
    dy = 2*y(i) - sin(x(i)+y(i));
 

    %evaluate function at different points
    fc =  x(i)^2 -x(i)+cos(y(i)+x(i))+y(i)^2;
    newfx =(x(i)-dx*ax)^2 -(x(i)-dx*ax)+cos(y(i)+(x(i)-dx*ax))+y(i)^2;
    newfy =x(i)^2 -x(i)+cos((y(i)-dy*ay)+x(i))+(y(i)-dy*ay)^2;
    
  
    while newfx> fc -ax*dx %Armijo-Goldstein Condition check
        ax=.5*ax; %shrink step size
        % function eval at new step
        newfx =(x(i)-dx*ax)^2 -(x(i)-dx*ax)+cos(y(i)+(x(i)-dx*ax))+y(i)^2;
    end
    while newfy > fc -ay*dy
        ay=.5*ay;
        newfy =x(i)^2 -x(i)+cos((y(i)-dy*ay)+x(i))+(y(i)-dy*ay)^2;
    end
    
    i=i+1;
    x(i) = x(i-1)-dx*ax;
    y(i) = y(i-1)-dy*ay;
    
    tol(1) = abs(x(i)-x(i-1));
    tol(2) = abs(y(i)-y(i-1));
    
    
end

Gradient Search Methods: Steepest Descent Part 1

If you have been following along we have worked through a bunch of one dimensional search methods. However, what happens when we need to perform a search on multiple dinensions?  Well we need a new search method that can be expanded to multiple variables while still finding a minimum (or maximum).

The method should work similar to our previous methods. However, it should be applicable up to n variable. With this in mind we set out to design a search method. As we have seen before, we need two things for a successful search method. First we need to know where we are going (direction) and second we need to know how far to go in that direction (step size). With these two pieces of information we can go about finding our minimum (or maximum).

Let’s focus on the direction first. We want our function to decrease fastest on our chosen direction. If we look at a multi variable function which is both defined an differentiable in a small area around a point (otherwise known as a neighborhood), we find that the function decreases fastest along the negative gradient of the function. Let’s test this out.

Let’s take a simple multivariable function f(x,y) =  x^2 -x+cos(y+x)+y^2.  Plotting this produces the following 3D graph and contour plot. Take notice of the Contour plot it will help us to determine our direction.

Screen Shot 2015-06-30 at 9.39.18 PMScreen Shot 2015-06-30 at 9.47.29 PM

We have an infinite number of directions to move In but let’s choose four test cases. We will choose the gradient, the negative gradient, and two other directions. For our initial point lets choose (-6,8). This point lets use evaluate our gradient Mapping these directions (all with a step of 0.1 ) produces the following graph.

Screen Shot 2015-06-30 at 10.26.20 PM

Now this isn’t a rigorous proof (or a particularly fantastic one) but as you can see our function is lowest in the direction of the negative gradient. As you can see the negative gradient line (magenta) points toward the center of the contour plot and to the lowest function evaluation. All other points are off center or in completely wrong directions. However, this only holds for a small neighborhood around our point. If we take too big a step then we might end up a wrong point or higher function evaluation. This brings us to our next requirement a step size.  There are a few ways to determine a step size. The two major ways are a static step size (kinda like how we reduced our interval in GSS), and a dynamically changing step size. Regardless of which we end up choosing lets represent the step size by rho, ρ.

We now have our step size (ρ) and our direction (negative gradient). With this we can begin our search. With initial point “a” we evaluate the gradient. The gradient is then scaled by our step size and subtracted to from our initial point. This gives us our second point “b.” We then repeat this with point “b.” Evaluate the gradient, scale by the step size, subtract from point “b” to give us point “c”. This is outlined below.

Screen Shot 2015-06-30 at 7.22.50 PM

So you probably realized where we were headed. Following the steps above we have determined our iterative step. Like our previous one-dimensional search methods, our gradient search method iterates the above equation until a tolerance condition is met. The method described above is called the Method of Steepest Descent or Gradient Descent. Next Post we will go over the method with a real example.

-Marcello

One Dimensional Search Methods: The Recap

In the past couple of posts we have gone over three different search methods. We started with the Golden Section Search (GSS) as the most basic and intuitive way to find a minimum. We then optimized GSS with varying reduction step sizes and came up with the Fibonacci search method. Finally, we looked at the properties of functions and their derivatives to create our last search method, Newton’s Method.  Each method had its own shortcomings and benefits, but the question remains which method should we be focusing on? Is there a best method?

Well i have some follow up questions. What makes a method the best? The two obvious choices are speed and accuracy (and to an extent precision). So which of our methods came out on top? Well if you remember, we used the same example function throughout our journey through ODSM. This function was a simple quadratic function. Gss took 6 iterations, FS took five and NM only took 1. To be honest this was not the best example as NM will always find the exact answer to all quadratic equations in one step. So for this recap we will use a new function and see how they all stack up. I have chosen an interesting function paper, but a rather boring one graphically. The function is f(x)=x^4 -(x)^3+cos(x)-12 and is plotted below (taken from wolfram alpha).

Screen Shot 2015-06-25 at 9.41.06 PM

Using this new function with our three methods. I capped each at 5 iterations so we could see the graphs and the methods without getting to cluttered. running the three methods produces the following plot.

Screen Shot 2015-06-25 at 9.34.42 PMEach method was able to get decently close to our minimum. However, as you can see each method goes about it at a different way. Newton’s method quickly converges to our minimum. Really only taking 4 iterations, Newton’s method converges fastest. GSS and FS both come decently close in our 5 iteration limit. But what about accuracy?

The true minimum of this solution is  at x ≈ 0.969366 with f(x)  = 11.4621. Newton’s method after 5 iterations produced a minimum of 0.9694. GSS produced a minimum of 0.8754, and FS produced a min of 0.7077. As you can see there is no competition between the three in regards to this function. Newton’s method is the fastest and the most accurate at 5 iterations. However, this is not to say this is the best method. It is time I introduced the no free lunch theorem of optimization.

TheNO Free Lunch theorem introduced by David Wolpert and WIlliam G Macready, says “that for certain types of mathematical problems, the computational cost of finding a solution, averaged over all problems in the class, is the same for any solution method”(from wikipedia). How does that apply here? Well it says that even though Newton’s method is the clear winner with regards to this function, when we consider all possible functions and limitations Newton’s method is no better that GSS or FS. This is not to say that we should just choose our method without regard or rationale.

For example if we have a function who has a terrible derivative or awful second derivative, we might favor FS or Gss. On the other hand if evaluating the derivatives are easy we would choose Newton’s Method. This all depends on whether or not you believe in such a thing as a free lunch

Marcello

One Dimensional Search Method: Newton’s Method Part 3

Unlike GSS and FS, Newton’s method has different limitations and restrictions. However, before we jump in lets look at a harder more elaborate function. Part2 goes over a too simple function for the sake of consistency, but lets take a function say: f(x) = x^4+6x^3+x^2-2x +cos(x) . This function is plotted below.

Screen Shot 2015-06-19 at 6.37.08 PM

As you can see we have two minimums so we can see how our method behaves. I am going to skip the step by step and just show the results of the method. We will choose our fist point at -8. This produces the following graph with the points different colors.

Screen Shot 2015-06-19 at 6.51.12 PMI had to make this graph a little bigger so you can see whats going on. Each point represents a step in our method. This method took 5 steps to get to our minimum (not including the starting point). Newton’s Method produced a minimum of -4.3462 which is close to the actual minimum, -4.34613. With only 5 iterations this is impressive method. Lets try a new initial point,5. This produces the following.

Screen Shot 2015-06-19 at 7.00.50 PMUh oh. It looks like our method converged to the wrong point. Like FS and GSS, Netwon’s will find a minimum on the interval you choose however, this minimum might not be a global minimum. Local vs global minimum is a major issue with all three ODSM. In this case, Newton’s method got caught in a saddle-like point.

Another more obvious limitation stems from the method itself. We need to be able to find the second derivative of our function. If there is no second derivative, there is no method. Well mathematicians don’t exactly take no for an answer. The Secant method was developed to remedy this. The Secant method is basically a modified Newton’s method that does not require a second derivative, only the first.

The last limitation involves our initial conditions like with GSS and FS. Our initial value for Newton’s method greatly determines the effectiveness. This is evidenced by the two above graphs which show how sensitive the method is.

I hope you understand Newtons method a little better. You can experiment more if you like i have copied the code below.

%array to get values of our function
z = -8:.001:5;</code>

% our function x^4+8x^3-2x +cos(x)
l = z.^4+6*z.^3+z.^2-2*z+cos(z);

tol = 1;
%initialize arrays to store step values
x = zeros(1,10);
y = zeros(1,10);
%initial guess
x(1)=5;
y(1) = x(1).^4+6*x(1).^3+x(1).^2-2*x(1)+cos(x(1));
i=2;
while tol &gt;0.005 %tolerance stop condition
     %first derivative
     dx = 4*x(i-1).^3+18*x(i-1).^2+2*x(i-1)-2-sin(x(i-1));
     %second derivative
     d2x = 12*x(i-1).^2+36*x(i-1)+2-cos(x(i-1));
     %newtons method step
     x(i) = x(i-1)-(dx/d2x);
     y(i)= x(i).^4+6*x(i).^3+x(i).^2-2*x(i)+cos(x(i));

     tol = abs(x(i)-x(i-1));
     i= i +1;
end

x
%plot our results
plot(z,l,x(1),y(1),'g*',x(4),y(4),'g*',x(2),y(2),'c*',x(3),y(3),'r*',x(5),y(5),'c*',x(6),y(6),'r*',x(7),y(7),'g*',x(8),y(8),'g*',x(9),y(9),'c*',x(10),y(10),'r*')

title('Newtons Method example 2 ')
xlabel('x')
ylabel('y')

-Marcello

One Dimensional Search Method: Newton’s Method Part 2

Last post we went over how Newton’s method works and derived the iterative step. This post we will go through our example step by step through the search method. We are going to use the same function from GSS and FS, f(x) = x^2 +3x+7 over the interval [-3,1]. However this time we do no need to concern ourself with the interval. We simply need an initial guess. Here’s the graph again (from -5 to 5).

Screen Shot 2015-06-14 at 10.41.00 AM

Noticed the little dot. This will be our starting point. I chose the left side of our old interval at x=-3. With Newton’s method we have to use this point to evaluate two functions, the first derivative as well as the second derivative. Hopefully this increase in function evaluations will mean a reduction in iteration or at least a more accurate and precise answer.

Before we go any further we need to find the first and second derivative. Using our function above I am sure you can find the first and second derivative, but I have listed them below for your convenience.

Screen Shot 2015-06-18 at 9.40.23 PM

Now we need to evaluate the first and second derivative at our initial point. This gives us -3 and 2. using these values we plug them into our iterative equation from last post (x(i+1)=x(i) -f’/f”). This produces a x(2) of -1.5. Like GSS, Newton’s Method stops at a tolerance condition. Remember tolerance is defined here as the absolute value of the difference between two consecutive x values. Because this hasn’t been reached we run our method again.

Using -1.5 as our x we evaluate f’ and f ”. This produces 0 and 2 respectively. Plugging these values in gives us a new x of -1.5, the same value we started with. Thus the tolerance condition is met and our search stops. The graph is below, but rather uninteresting first point in green second/third in red.

Screen Shot 2015-06-18 at 10.19.35 PM

Now why did Newton’s work so well? It found the exact minimum in only two iterations. If you remember from part 2, Newton’s method approximates the function with a quadratic equation. Well if our function is already quadratic Newton’s method can approximate it perfectly. Because of this Newton’s method finds the minimum for quadratic equations very fast.

Well that lame example was Newton’s method. Next post we will go over hopefully a more interesting problem as well as the limitations.

-Marcello

One Dimensional Search Method: Newton’s Method Part 1

We have one more one dimensional search method to go over, and this one is important. I am sure you are familiar with Isaac Newton Well this method is named after him. Newton’s method like both GSS and FS Newton’s method helps us find the minimum of our function, but Newton’s method varies greatly from these two.

The first and most important thing when thinking about using Newton’s method is to ensure that the function is twice differentiable.  Newton’s method uses a truncated taylor series to estimate where the minimum lies. But we are getting ahead of ourselves. Let’s start from the beginning.

Newton’s method uses a single point to begin, unlike the GSS and FS. This point is used as a basis for our Taylor expansion. We need to take a brief detour to go over why we want to construct a Taylor Expansion. Just a quick reminder, a Taylor series (or expansion) is an approximation of a function as a sum of terms that are determined by the functions derivatives at a single point.  This is described by the equation below (from wikipedia):tay

As you can see, each term of the expansion has a derivative of the function evaluated at the same point. For our method, we are only going to concern ourselves with the first three terms. This is why we need our function to be twice differentiable. This allows us to create a taylor expansion based off our initial guess (and our following guesses). But why would we want to do this? What is the benefit? Making a Taylor expansion allows us to create a quadratic function through our initial point which has the same first and second derivatives as our function.  Our quadratic function (k(x)) is listed below:

Screen Shot 2015-06-16 at 8.03.46 PM

We then try to minimize our new function. Using the first order necessary condition for a minimizer (the first derivative must equal zero at the minimizer), we take the derivative of the above equation and set it equal to zero. This produces the following:Screen Shot 2015-06-16 at 8.09.35 PMIn order to find the minimum we need to solve for x.  We do this in hopes that the x we solve for is closer to the minimum of our true function. This is shown below:Screen Shot 2015-06-16 at 8.23.25 PM

This looks awfully like our iterative steps from GSS and FS. Replace x0 with xi and x with xi+1 and we have a search method. Like GSS and FS, we then repeat our method in hopes of getting closer to our minimum.  As we get closer to our minimum the first derivative of our function moves toward 0 which makes xi ≈ xi+1. This is when we know our method has found a minimum (hopefully). Like GSS, we use a tolerance termination condition rather than certain number of iterations like FS.

Hopefully you followed me on this brief overview of our third ODSM. Next post we use a real example.

-Marcello

One Dimensional Search Method: Fibonacci Search Part 3

This is gonna be a brief post as most of the limitations of Fibonacci search were covered in my post on GSS. This makes sense because Fibonacci search is just a GSS with a different reduction at each step. So if follows that the limitations for GSS will also apply to the Fibonacci search. If you weren’t here for the GSS part 3 post I will go over the basic limitations.

Fibonacci search will find a minimum on the interval you choose however, this minimum might not be a global minimum. Local vs global minimum is a major issue with Fibonacci. One of the main requirements for Fibonacci, like GSS, is that a function must be unimodal. Now what does unimodal mean?  IF you remember, this basically means that the function has a single local extremum. This makes it so the search method does not get stuck in an “incorrect” local minimum. This also depends on the interval you choose If you choose an interval that does not contain a minimum, Fibonacci search won’t (can’t) find it.

One thing unique to the Fibonacci search is that it has a set amount of iterations chosen at the beginning of the method. If the user does not choose enough iterations the error surrounding the minimum can be large. However, if too many iterations are chosen then the method becomes arduous and expensive calculation wise. Having an idea of the tolerance you need for your minimum and the relative location is invaluable when using the fibonacci search method.

To review, when should be use GSS or Fibonacci search : Function has one minimum (unimodal), you have an idea of the interval that contains the minimum, the function is of one variable.

Here is the code i used to create the graphs from Part 2. This is written in matlab code. I will post a python version later.

Code Starts here

%array to get values of our function
z = -5:.001:5;

% our function y=x^2 + 3x + 7
y = z.^2+3*z+7;

%number of iterations, feel free to change
iter = 5;
epsilon=0.05; %epislon to prevent double value at last iteration

% Fibonacci calculation sequence
fib = zeros(1,iter+2);
fib(1)=1;
fib(2)=1;
for i =3:iter+2
fib(i) = fib(i-1)+fib(i-2) ;
end
rho = zeros(1,iter);

%calculate z values from above Fibonacci Sequence
for i=0:iter
rho(i+1) = 1-(fib(end-i-1)/fib(end-i)) ;
end

%our search interval
x1=-3;
x2=1;

%finding our first two internal points
x3 = x1+rho(1)*(x2-x1);
x4 = x1 +(1-rho(1))*(x2-x1);

x30 = x1+rho(1)*(x2-x1);
x40 = x1 +(1-rho(1))*(x2-x1);
for i = 1:iter-1
% evaluate our function at the two internal points
fx3 = x3.^2+3*x3+7;
fx4 = x4.^2+3*x4+7;
%check if we are at the last iteration
if rho(i+1)==0.5
rho(i+1)= 0.5-epsilon;
end
%determine which interval we want to keep
if fx3&gt;fx4
x1=x3;
x3=x4;
x4 = x1 +(1-rho(i+1))*(x2-x1); %find new x4 point
else
x2=x4;
x4=x3;
x3 = x1+rho(i+1)*(x2-x1); %find new x3
end

x3a(i) = x3;
x4a(i)=x4;
end

min = (x3+x4)/2 %our "boxed in" min
minf = min.^2+3*min+7 % our function evaluated at our min

%plot our results
plot(z,y,-3,y,3,y,x30,y,'g',x40,y,'g',x3a(1),y,'r',x4a(1),y,'r',x3a(2),y,'c',x4a(2),y,'c',x3a(3),y,'y',x4a(3),y,'y',x3a(4),y,'m',x4a(4),y,'m')
title('Fibonacci example')
xlabel('x')
ylabel('y')

-Marcello

One Dimensional Search Method: Fibonacci Search Part 2

Last post we went over how Fibonacci Search works and how it got its name just like GSS. This post we will go through our example step by step through the search method. We are going to use the same function from GSS f(x) = x^2 +3x+7 over the interval [-3,1]. Here’s the graph again (from -5 to 5).

Screen Shot 2015-06-14 at 10.41.00 AM

So before we start anything we need to choose how many iterations we want and calculate our fibonacci sequence. To match with the GSS, we will start with 5 iterations. this means we need 6 numbers of the fibonacci sequence (N+1). This does not include F0 and F-1. Using the recursive formula this gives us 1,2,3,5,8,13 as the numbers we need. From this we can calculate all of our necessary z values. Our next step is to find our next two points. From part 1, x3 = x1+z*(x2-x1) and x4 = x1 +(1-z)*(x2-x1), where z is equal to 1 minus the ratio of two sequential fibonacci numbers starting at N+1 and working down to 1 (check back one post for formula) . Substituting -3 for x1 and 1 for x2. This gives us an initial x3 = -1.4615 and x4 = -0.5385.  I added these values as lines to our graph below.

Screen Shot 2015-06-14 at 10.42.41 AM

Just like with GSS (actually even more so), our new interval as already drastically decreased from our original interval.  Again, we need to determine which interval we keep. Graphically we can see that the green line on the left is the one we want to keep. Also we can evaluate our function at the two points and see which one gives us a lower y value. Doing this confirms our guess of the left line. We want to keep this line as it contains our minimum.So we create a new interval ensuring the green line is within the interval. This time we only have to evaluate one additional point. That point will lie between our left most boundary (black line) and the left green line. We first re evaluate our interval, x1 is still x1, x4 is now our right most boundary (x2). Now we need our two interior points. Well we already have one, the left green line, so we need to calculate the other one. Using the formula above for x3, we find x3b = -2.0769, which is plotted below. Screen Shot 2015-06-14 at 10.47.09 AM

The two “new” boundaries are shown in red. As you can see one of the green lines before is now red.  As before,If we had chosen the other line we would need to calculate a new x4. This cold be found by the equation for x4 above. But in this case we only need x3. We are narrowing down to the minimum, but we are still not there yet. This means we need to do another iteration.Screen Shot 2015-06-14 at 10.53.48 AMAs before one of our lines changed color and we have a new line and this a new interval. Notice that our interval is no longer getting smaller and smaller proportionally. This is where Fibonacci differs from GSS, we are allowed to change the reduction factor at every iteration. Unlike with GSS, Fibonacci does not terminate with a tolerance condition, but with a number of iterations (we chose 5 at the beginning). Lets jump ahead to 5 total iterations.
Screen Shot 2015-06-14 at 10.58.34 AMI ran the search 2 more times. First the yellow line, then magenta. Graphically it looks like the Fibonacci search pretty much narrowed down to exactly the point we are looking for. At this scale the magenta lines seem to be right on top of each other. However, remember that this is in the design of the search method. The z value for the last step is 1/2 – epsilon, so naturally, since epsilon is so low (i chose 0.05), these last two lines should be very close to each other. So what value did the search close in on? Fibonacci search produced a min of -1.4462.  The actual minimum of this function is -1.5, which is kinda close to the minimum we calculated. With more iterations and a stricter tolerance we can get closer to the actual minimum. However with only 5 iterations we are within 4% of the actual value.

Now you got another search method under your belt. But! There is more that we need to go over (just like with GSS). Next post we will program a simple program for it in matlab and python. Also we will go over the limitations of Fibonacci.

-Marcello

One Dimensional Search Method: Fibonacci Search Part 1

Our next one dimensional search method or line search method is the Fibonacci Search method. Like the golden section search method, the Fibonacci search method is a method of finding an  extrema of a unimodal function. Also like the GSS, Fibonacci search’s name directly describes how the search method reduces its search space. In GSS, we use the golden ratio to reduce our space. In Fibonacci search, we use, you guessed it, the fibonacci sequence.

Just a brief refresher, the Fibonacci sequence is defined by the following recursive equation:Screen Shot 2015-06-13 at 2.13.53 PM

Each number in the sequence is the sum of the two numbers in the sequence before it. By convention, when k=-1 F=0 and when k = 0 F=1. This allows us to calculate values for all k>0.

OK back to the search method. With the GSS method we were limited on the reduction of our search interval. Each interval would reduce proportionally by 0.618 (remember (1-z) from the previous pots). This limits the speed convergence and how close our search method can get to our minimum. Fibonacci search remedies this. Fibonacci search allow us to vary our z values so that the space we search decreases much faster and gets closer to our target. The follow up question is obvious, how do we determine how z changes? Well that is an optimization problem in itself.

Quickly going back to the GSS, we know that with every iteration our uncertainty range reduces by 0.618 or (1-z). So it follows that for N iterations, we have a total reduction factor of (0.618)^N or (1-z)^N. Now if we are allowed to vary z we get the following reduction factor:Screen Shot 2015-06-13 at 4.08.08 PMThe objective is to minimize the above. Minimizing this function results in a smaller final uncertainty range. However, we cannot just pick  any z values for this equations. The are subject to certain requirements or constraints. The first is that for any z it must be greater or equal to 0 and less than or equal to 1/2. 1/2 is chosen as un upper limit to ensure that only one new testing point is needed for each iteration. If we have a reduction of 1/2 the two test points coincide in the middle of the interval. This prevents the method from continuing. The new z values must also satisfy the condition that the distance between our two test points must be equal to the distance from the selected test point to the nearest boundary point. With these constraints we can form the optimization problem and find our minimum.

How does the Fibonacci sequence fit into all this? It happens to be that the optimal solution to our minimization problem involves the ratio of two sequential numbers in the sequence. The optimal z values are as follows:

Screen Shot 2015-06-13 at 8.37.38 PM

This leads is to our first limitation. Fibonacci search, unlike GSS, requires a finite number of predetermined iterations. As seen by Zn, the end of the search is the beginning of the Fibonacci sequence. We need to take a quick pause at the last Z. When we plug in F1 and F2 of the FS we get Zn = 1/2. Earlier i warned of the dangers of a 1/2 reduction. To avoid falling into that pit, we introduce an ε (epsilon) which gently moves our Zn away from 1/2. In order for the epsilon fix to work, ε must be small.  Now that we have our step reduction we can find our test points.

We find our test points just like in GSS. X1 and X2 are the initial boundaries, X3 and X4 are calculated by the following equations:

Screen Shot 2015-06-13 at 9.08.00 PM

With all this information we can form our search. The method takes on the same form of GSS, where the range is split into sections and one section is chosen and the other rejected. The accepted section is then further split and the method repeats. Now that we have the basics down and a rough outline of how to proceed we can try our method. Thats for part 2.

-Marcello