Gradient Search Methods: Steepest Descent Part 2

If you followed from last post we derived (loosely) our iterative step. With that we can go about with out steepest descent method. We start by choosing an initial guess for our function, for this function i have chosen (8,8). This is plotted below on the level curve.

From this point we must calculate the gradient. I have taken the gradient of our function (f(x,y) = x^2 -x+cos(y+x)+y^2). This gradient must now be evaluated at our point (8,8)

As I mentioned earlier we really need two things for our method, a direction and a step size. With the gradient evaluated we have our direction, now we need our step size. This is where we have options. For the step size we have 2 general options. We can choose a constant step size (think GSS) or we can have a dynamic step size (think FS and NM). In this post I will show both.

For constant step size we have a few issues. As you can see in our level plots we are not exactly close to our minimum. so we can choose a large step size to get close fast. However, once we get close to our minimum the large step size will cause us to over shoot. Trying to correct the over shoot the method will then over shoot in the opposite direction. So that leaves us with choosing a smaller step size. Well this brings its own problems. The small step size takes a much longer time to reach our minimum when our initial guess is far. I went with the smaller step size for the constant step size version.

What about the dynamic step size? What we are actually doing with the dynamic step size is solving an optimization problem. We are finding the step size that minimizes our function the most. We want to minimize the the following f(x-alpha*d) where d is the gradient evaluated at x. Here are a few options to minimize this. We can use an ODSM from before to calculate a estimate of the best step size or we can use them to find the exact best step size. Both of these become very expensive with little benefit. Instead we use a method called backtracking line search. This method gives us an acceptable step size while not being to computationally heavy. With backtracking line search, we start with an initial step size,alpha, and we check if x+alpha*d is an acceptable point. WE do this by using the Armijo-Goldstein Condition. This condition ensures that the step size we take makes our function decrease. If it does not satisfy this condition we half our alpha and try again. We try until the condition is satisfied. When it is satisfied we continue with out descent method.

Now that we have our two methods, lets check how they compare. Here are the first three steps. On the left is our constant step size, on the right is backtracking line search step.

As you can see the constant step size is slowly making its way there, however, the dynamic step size has already over shot the minimum and is backtracking. Lets try another three steps.

More of the same in these graphs. The constant step size is slowly chugging along, while the dynamic step size has arrived at the minimum and waiting for the other method to catch up. The dynamic step size plot looks almost identical to the one before. This is because the method is overshooting and backtracking over the minimum until the steps get smaller than our tolerance. After that point our method stops and gives us our minimum. If we let both methods terminate we find that the constant step size takes over 40 iterations while the dynamic only takes 6. Now what about accuracy. Our minimum for our function is around (0.99865, 0.49865). Our constant step size gives us a minimum of (0.9990,0.4991), dynamic gives us (0.9986,0.4986). It looks like we have a clear winner. In this case, the dynamically changing step size not only beats the constant in number of iterations, but in accuracy as well.

I hope this clarified steepest descent. Next post we go over the limitations to this method (it will be brief there aren’t a ton). I have added my code below which is separated into constant and dynamic. Enjoy

% steepest decent 
%f(x,y) =  x^2 -x+cos(y+x)+y^2.

%% constant step size
x = zeros(1,10);% initialize arrays
y = zeros(1,10);

x(1)=8; %initial guess
y(1)=8;

%dx = -1+2*x-sin(x+y)
%dy = 2*y-sin(x+y)

a0=0.1; %initial step size

tol(1)=1;
tol(2)=2;

i=1;

while tol(1) >0.0001 && tol(2) > 0.0001 %stop conditions

    dx = -1+2*x(i)-sin(x(i)+y(i)); %gradient dx
    dy = 2*y(i) - sin(x(i)+y(i)); %gradient dy
 
    i=i+1;
    
    x(i) = x(i-1)-dx*a0; %iterative steps
    y(i) = y(i-1)-dy*a0;
    
    tol(1) = abs(x(i)-x(i-1)); %tolerance calc
    tol(2) = abs(y(i)-y(i-1));
    
    
end

%% Backtracking line search

x = zeros(1,10);
y = zeros(1,10);

x(1)=3;
y(1)=3;

%dx = -1+2*x-sin(x+y)
%dy = 2*y-sin(x+y)

ay=1;
ax=1;

tol(1)=1;
tol(2)=1;

i=1;

while tol(1) >0.0001 | tol(2) > 0.0001

    dx = -1+2*x(i)-sin(x(i)+y(i));
    dy = 2*y(i) - sin(x(i)+y(i));
 

    %evaluate function at different points
    fc =  x(i)^2 -x(i)+cos(y(i)+x(i))+y(i)^2;
    newfx =(x(i)-dx*ax)^2 -(x(i)-dx*ax)+cos(y(i)+(x(i)-dx*ax))+y(i)^2;
    newfy =x(i)^2 -x(i)+cos((y(i)-dy*ay)+x(i))+(y(i)-dy*ay)^2;
    
  
    while newfx> fc -ax*dx %Armijo-Goldstein Condition check
        ax=.5*ax; %shrink step size
        % function eval at new step
        newfx =(x(i)-dx*ax)^2 -(x(i)-dx*ax)+cos(y(i)+(x(i)-dx*ax))+y(i)^2;
    end
    while newfy > fc -ay*dy
        ay=.5*ay;
        newfy =x(i)^2 -x(i)+cos((y(i)-dy*ay)+x(i))+(y(i)-dy*ay)^2;
    end
    
    i=i+1;
    x(i) = x(i-1)-dx*ax;
    y(i) = y(i-1)-dy*ay;
    
    tol(1) = abs(x(i)-x(i-1));
    tol(2) = abs(y(i)-y(i-1));
    
    
end