Gradient Search Methods: Newton-Raphson Method Part 1

This method is gonna feel more than a little familiar. But before we get to the nitty gritty I have to give a disclaimer. This method is not purely a gradient method, as it does not solely rely on the gradient as a method of optimization. The method uses the gradient along with other properties of the function to arrive at our minimum. Now that that has been cleared up let move onto the method.

We are going to take a similar approach as we did with our one dimensional search methods. Steepest Descent reminds us of the golden section search and the fibonacci search. While they are used differently there are prominent similarities. When we looked at Steepest descent we had two cases, one with a constant step size and another with a variable step size. Eventually we reached an optimal case for these methods and we needed a new method. At this point we moved on to Newton’s method. Well, we are going to do that again, only in higher dimensions.

The next part is gonna be verrrrry familiar.

As you know the first and most important thing when thinking about using Newton-Raphson method is to ensure that the function is twice differentiable.  Like in one dimension, Newton-Raphson method uses a truncated taylor series to estimate where the minimum lies.

Newton-Raphson method uses a single point to begin. This point is used as a basis for our Taylor expansion. We need to take a brief detour to go over why we want to construct a Taylor Expansion. By doing this we create a quadratic approximation of our function locally, which we can in turn minimize. We repeat this until we hit a stationary point or our minimum.  So lets start with the Taylor series in N-dimensions, however we are only going to focus on the first three terms (this gives us up to the quadratic approximation).

Screen Shot 2015-07-16 at 9.19.58 PM

The above equation looks a little different than our previous Taylor expansion. As you can see there are no squares and no derivatives or  second derivatives (well not completely true).The first derivative is here is none other than our Gradient which is denoted by g. However, if you remember from the last time we dealt with the taylor series the third term deals with the second derivative. In n dimensions we need  an equivalent to the second derivative, for this we use the Hessian Matrix, which I have denoted as F. Now we use our first order necessary condition for optimality, this sets the gradient of our function equal to zero. From this we can determine our iterative step.

Screen Shot 2015-07-16 at 9.56.46 PM

This looks awfully like our iterative steps from our one dimensional Newton’s method. .  As we get closer to our minimum the gradient, g, of our function moves toward 0 which makes xk≈ xk+1. This is when we know our method has found a minimum (hopefully). Hopefully this all feels familiar. Next post we will see how Newton Raphson works.

-Marcello