Gradient Search Methods: Steepest Descent Part 1

If you have been following along we have worked through a bunch of one dimensional search methods. However, what happens when we need to perform a search on multiple dinensions? Well we need a new search method that can be expanded to multiple variables while still finding a minimum (or maximum).

The method should work similar to our previous methods. However, it should be applicable up to n variable. With this in mind we set out to design a search method. As we have seen before, we need two things for a successful search method. First we need to know where we are going (direction) and second we need to know how far to go in that direction (step size). With these two pieces of information we can go about finding our minimum (or maximum).

Let’s focus on the direction first. We want our function to decrease fastest on our chosen direction. If we look at a multi variable function which is both defined an differentiable in a small area around a point (otherwise known as a neighborhood), we find that the function decreases fastest along the negative gradient of the function. Let’s test this out.

Let’s take a simple multivariable function f(x,y) = x^2 -x+cos(y+x)+y^2. Plotting this produces the following 3D graph and contour plot. Take notice of the Contour plot it will help us to determine our direction.

We have an infinite number of directions to move In but let’s choose four test cases. We will choose the gradient, the negative gradient, and two other directions. For our initial point lets choose (-6,8). This point lets use evaluate our gradient Mapping these directions (all with a step of 0.1 ) produces the following graph.

Now this isn’t a rigorous proof (or a particularly fantastic one) but as you can see our function is lowest in the direction of the negative gradient. As you can see the negative gradient line (magenta) points toward the center of the contour plot and to the lowest function evaluation. All other points are off center or in completely wrong directions. However, this only holds for a small neighborhood around our point. If we take too big a step then we might end up a wrong point or higher function evaluation. This brings us to our next requirement a step size. There are a few ways to determine a step size. The two major ways are a static step size (kinda like how we reduced our interval in GSS), and a dynamically changing step size. Regardless of which we end up choosing lets represent the step size by rho, ρ.

We now have our step size (ρ) and our direction (negative gradient). With this we can begin our search. With initial point “a” we evaluate the gradient. The gradient is then scaled by our step size and subtracted to from our initial point. This gives us our second point “b.” We then repeat this with point “b.” Evaluate the gradient, scale by the step size, subtract from point “b” to give us point “c”. This is outlined below.

So you probably realized where we were headed. Following the steps above we have determined our iterative step. Like our previous one-dimensional search methods, our gradient search method iterates the above equation until a tolerance condition is met. The method described above is called the Method of Steepest Descent or Gradient Descent. Next Post we will go over the method with a real example.

-Marcello