Examine the following:

That’s pretty dull. It is something familiar to any school child, even those not majoring in math or science. The horizontal line is the x-axis of a coordinate system, and the vertical line is the y-axis. They are said to be orthogonal. They are orthogonal in two senses of the word: they form a right angle, and they are mathematically orthogonal. In the second sense measurements along the x-axis are independent of measurements along the y-axis. You can move all you want in the x-direction without affecting your position in the y-direction.

There’s more. For example, suppose x and y represent east and north. Then up can be represented by z. Now we have a system of three coordinates, and they are mutually orthogonal. Movement in the up direction does not affect your position in the east and north directions.

You may need to close your eyes to imagine the next part, but you can add a fourth orthogonal coordinate to this representation of space, and you will have a four-dimensional space. I can’t draw it, so I have to deal with it in a purely mathematical manner. The principles of geometry that hold for three-dimensional space can be extended to higher-dimensional spaces. And that’s what I did.

Next look at this image.

The curve comes down from the left, reaches a low point and then continues upward to the right. The task is to locate the low point of the curve. You want to do it by an automated process for reasons that will be made clear later. The straight line is tangent to the curve as shown, and the short vertical line is the x-coordinate at the point of tangency. If you have a mathematical function for the line you can compute the derivative of the function, and the derivative is the slope of the tangent line. You can see that the slope of the tangent line will be zero at the low point, so if you find the point at which the slope is zero, then you have determined, mathematically, the x-coordinate of the low point—the minimum value of the function.

See the diagram.

The problem of finding the minimum point of a function devolves into finding the point at which another function, the derivative, is zero. There is a mechanical process for finding the zero of a function, and it’s called the Newton-Raphson method, after Isaac Newton and Joseph Raphson. The utility of this method is it can be performed by a computer.

Now suppose that what you have is not a line function but a surface function. My Microsoft Excel was not able to draw such a surface, so I obtained this one from Wikipedia.

You can find the low point (minimum) of this surface by a method similar to the one above. And that’s what I had to do. There was a project I worked on, no government secrets involved, and we were flying a Convair transport plane carrying multiple imaging systems. We collected visible, infra-red, and laser radar images, and when we got them all back to the lab we needed to determine what the system was looking at at for each image. To do this we needed to know where the airplane was at the time the images were made. As accurately as possible.

We had multiple sources of information. We had an inertial guidance system (INS) and a transponder locater system. And we had more. The problem was the sources of information were in disagreement about where the airplane was with respect to time. It was going to be necessary to apply corrections to the various inputs to minimize the error.

Without getting into the details of how I spent three months of my life, I developed an error function based on the data inputs just mentioned, and my goal was to apply corrections to the input data in order to minimize the error function. See where this fits in? I needed to compute the minimum of a function.

But this was a function in multiple dimensions. Ultimately it grew to a function in ten dimensions. I needed to minimize a function in ten dimensions.

Inspiration came from the surface function. You can use the Newton method to compute the minimum of a surface if you follow this approach, and if you have a well-behaved function.

Start with one axis, for example the x-axis. Compute the minimum for a given value of the y-axis. That is, move along the x-direction while holding y constant. Compute the minimum of the curve defined by the x-values alone. Once you have done that, switch to computing the minimum along the y-axis, using the x-value for the minimum just computed. Repeat until you have finally reached the minimum of the surface.

What I figured would work was to just continue the process into the higher dimensions. Take each axis in turn, holding the other coordinates constant, and compute the minimum. Then go to the next dimension and the next and the next until you have gone through all the dimensions of the error function. Repeat until the solution does not get any better. You have computed the minimum of a function in multiple dimensions.

There was one small hitch. I did not have math functions for the data, just values from a table. To make this computable I needed to supply math functions, and to get math functions I needed to fit curves to the data points. And that’s another story, which I will delve into in another post.

We ran this problem on a VAX computer, which was no slouch of a system in those days, but which would be in the shade of the laptop computer I’m now using to describe all this. And the VAX chugged through solution after solution and computed the position on the ground the cameras were looking at, and when we pulled up the images associated with well-surveyed points, there were the objects that were supposed to be there. To a certain degree of accuracy. It was never perfect. But that’s the way it is in engineering.

Wait, there’s more. After I did all this I learned of an even better approach. It was in a book I already had on my shelf but had not read from cover to cover. It’s a method by Nelder and Mead, and code is available from the book:

We give in §10.4 a somewhat neglected downhill simplex method due to Nelder and Mead. (This use of the word “simplex” is not to be confused with the simplex method of linear programming This method just crawls downhill in a straightforward fashio that makes almost no special assumptions about your function This can be extremely slow, but it can also, in some cases, be ex­tremely robust. Not to be overlooked is the fact that the code is concise and completely self-contained: a general N-dimensional minimization program in under 100 program lines! This method is most useful when the minimization calculation is only an inci­dental part of your overall problem. The storage requirement is of order N2, and derivative calculations are not required.

[Press, William H., et al: Numerical Recipes in C, Cambridge University Press. 1988 p 292]