GRADIENT SEARCH METHODS: NEWTON-RAPHSON METHOD PART 3

Most of the limitations of NRM in multiple dimensions are the same as those for NM in one dimension. The main issues stem from three major areas (per usual), initial point, derivatives, and function evaluations.

Our initial value for Newton’s method greatly determines the effectiveness.The closer our initial guess is to our minima the better the method will work. However, this requires some intuition of the function which can get quite complex. Lets take a look at two different initial points (20,20) and (2,2). The graph of (20,20) method is below.

Screen Shot 2015-07-18 at 11.16.51 AM

As you can see the method is flying back and forth past our minima. However, the method eventually converges at (0.9987, 0.4987), our target minima. Below I have included a matrix containing our initial step and all of the following steps (taken from matlab output).

Screen Shot 2015-07-18 at 11.26.46 AM

Now lets try it again with our other initial point (2,2). This is already close to our minimum so it shouldn’t take too many iterations.

Screen Shot 2015-07-18 at 11.29.31 AMScreen Shot 2015-07-18 at 11.30.56 AM

The closer initial staring point converged in a fifth of the iterations which means a less function evaluations. This brings us to our next two issues, derivatives and function evaluations. We need to be able to find the Hessian of our function as well as the gradient. If we cannot take these two we will need to pick a different method or modify our existing. Most problems stem from the Hessian. Sometime the Hessian cannot be found, other times it can be found but not inverted, but most of the time it is just too expensive to evaluate or invert. Mathematicians have come up with alternative methods called quasi-newton methods (these actually stem from the secant methods I mentioned before) which seek to remedy this issue. Like the one dimensional version, there are a lot of function evaluations.  Each derivative of the gradient must be evaluated as well as all of the second derivatives that make up the hessians. The hessian then needs to be inverted which is expensive at higher dimensions.

Now you have two tools to deal with finding minima at higher dimensions. Up next we will explore new more advanced methods for optimization.

-Marcello