Fit plane to a set of points in 3D: scipy.optimize.minimize vs scipy.linalg.lstsq

14,306

Solution 1

Least squares (scipy.linalg.lstsq) is guaranteed to converge. In fact, there is a closed form analytical solution (given by (A^T A)^-1 A^Tb (where ^T is matrix transpose and ^-1 is matrix inversion)

The standard optimization problem, however, is not generally solvable - we are not guaranteed to find a minimizing value. However, for the given equation, find some a, b, c such that z = a*x + b*y + c, we have a linear optimization problem (the constraints and objective are linear in the variables that we're trying to optimize for). Linear optimization problems are generally solvable, so scipy.optimize.minimize should converge to the optimal value.

Note: this is linear in our constraints even if we do z = a*x + b*y + d*x^2 + e*y^2 + f -- we don't have to restrict ourselves to a linear space of (x,y), since we will have these points (x, y, x^2, y^2) already. To our algorithm, these just look like points in the matrix A. So we can actually get a higher order polynomial using least squares!

A brief aside: In reality, all of the solvers which don't use an exact analytical solution generally stop within some acceptable range of the actual answer, so it is rarely the case that we get the exact solution, but it tends to be so close that we accept it as exact in practice. Additionally, even least-squares solvers rarely use the analytical solution and instead resort to something quicker like Newton's Method.

If you were to change the optimization problem, this would not be true. There are certain classes of problems for which we can generally find an optimal value (the largest class of these are called convex optimization problems -- although there are many non-convex problems which we can find an optimal value for under certain conditions).

If you're interested in learning more, take a look at Convex Optimization by Boyd and Vandenberghe. The first chapter doesn't require much mathematical background and it overviews the general optimization problem and how it relates to solvable optimization problems like linear and convex programming.

Solution 2

I would like to complete the answer with an alternative method in order to find the best plane that fit a set of points in R^3. Actually, the lstsq approach works pretty well except in specific cases where (for example) the x coordinate of all points is 0 (or the same). In such a case, the columns of the A matrix used in lstsq are not linearly independent. For example:

A = [[ 0   y_0    1]
     [ 0   y_1    1]
     ...
     [ 0   y_k    1] 
     ...
     [ 0   y_N    1]]

To circumvent this problem, you can use directly svd on the centered coordinates of the set of points. Actually, svd is used in the lstsq but not in the same matrix.

This is a python example given the coordinates of the points in the coords array :

# barycenter of the points
# compute centered coordinates
G = coords.sum(axis=0) / coords.shape[0]

# run SVD
u, s, vh = np.linalg.svd(coords - G)

# unitary normal vector
u_norm = vh[2, :]

Using this approach, the vh matrix is a 3x3 matrix which contains in its rows orthonormal vectors. The first two vectors form an orthonormal basis in the plane, the third one is a unit vector normal to the plane.

If you really need the a, b, c parameters, you can get them from the normal vector because the coordinates of the normal vector are (a, b, c), assuming the equation of the plane is ax + by + cz + d = 0.

14,306

Author by

Gabriel

Updated on July 24, 2022

Comments

Gabriel almost 2 years
Given a set of points in 3D, the general problem is to find the a, b, c coefficients of a plane equation in the form:
```
z = a*x + b*y + c
```
such that the resulting plane is the best fit possible to that set of points.
1. In this SO answer, the function scipy.optimize.minimize is used to solve this problem.
  
  It relies on initial guesses for the coefficients, and minimizes an error function that sums the distances of each point to the surface of the plane.
2. In this code (based on this other SO answer) the scipy.linalg.lstsq function is used to tackle the same problem (when restricted to a 1st order polynomial).
  
  It solves for C in the equation z = A*C, where A is the concatenation of the x,y coordinates of the set of points, z is the z coordinate of the set, and C are the a,b,c coefficients.
  
  Unlike the code in the method above, this one does not seem to require initial guesses for the plane coefficients.
Since the minimize function requires an initial guess, this means that it may or may not converge to the optimal solution (depending on how good the guess is). Does the second method have a similar caveat or will it return an always exact solution?
Alex Alifimoff over 8 years

Can someone verify that my link the Boyd book is publicly accessible?
Gabriel over 8 years

Great answer Alex! Just to clarify: scipy.linalg.lstsq is giving me the analytical solution of the problem, while scipy.optimize.minimize is simply finding the minimum of the error function (the results will be equal in this case)? I can confirm that the link to the book is accesible.
Alex Alifimoff over 8 years

I took a look at the source code really quickly. It seems like, yes, lstsq is giving you the analytical solution by way of singular value decomposition (see this paper). However, note that this isn't necessarily the case for all least squares solvers. There are solvers that are guaranteed to solve least squares, but solve it using iterative methods instead of simply calculated the matrix products and inverses ...
Alex Alifimoff over 8 years

minimize on the other hand, is solving a more general problem with an iterative solver by specifically minimizing the least squares error -- your interpretation in this case is correct.
Gabriel over 8 years

Thanks again Alex, for the confirmation and for the links!
Gabriel almost 6 years

Thank you Ger. I think it is important to note that this method (singular value decomposition, svd) if I recall correctly, minimizes the squared distance from the points to the plane rather than the absolute valued distance. It's a small sidenote, but one that confused me at first.