Integer step size in scipy optimize minimize

python optimization numpy machine-learning scipy

18,669

Solution 1

Assuming that the function to minimize is arbitrarily complex (nonlinear), this is a very hard problem in general. It cannot be guaranteed to be solved optimal unless you try every possible option. I do not know if there are any integer constrained nonlinear optimizer (somewhat doubt it) and I will assume you know that Nelder-Mead should work fine if it was a contiguous function.

Edit: Considering the comment from @Dougal I will just add here: Set up a coarse+fine grid search first, if you then feel like trying if your Nelder-Mead works (and converges faster), the points below may help...

But maybe some points that help:

Considering how the whole integer constraint is very difficult, maybe it would be an option to do some simple interpolation to help the optimizer. It should still converge to an integer solution. Of course this requires to calculate extra points, but it might solve many other problems. (even in linear integer programming its common to solve the unconstrained system first AFAIK)
Nelder-Mead starts with N+1 points, these are hard wired in scipy (at least older versions) to (1+0.05) * x0[j] (for j in all dimensions, unless x0[j] is 0), which you will see in your first evaluation steps. Maybe these can be supplied in newer versions, otherwise you could just change/copy the scipy code (it is pure python) and set it to something more reasonable. Or if you feel that is simpler, scale all input variables down so that (1+0.05)*x0 is of sensible size.
Maybe you should cache all function evaluations, since if you use Nelder-Mead I would guess you can always run into duplicat evaluation (at least at the end).
You have to check how likely Nelder-Mead will just shrink to a single value and give up, because it always finds the same result.
You generally must check if your function is well behaved at all... This optimization is doomed if the function does not change smooth over the parameter space, and even then it can easily run into local minima if you should have of those. (since you cached all evaluations - see 2. - you could at least plot those and have a look at the error landscape without needing to do any extra evluations)

Solution 2

Unfortunately, Scipy's built-in optimization tools don't easily allow for this. But never fear; it sounds like you have a convex problem, and so you should be able to find a unique optimum, even if it won't be mathematically pretty.

Two options that I've implemented for different problems are creating a custom gradient descent algorithm, and using bisection on a series of univariate problems. If you're doing cross-validation in your tuning, your loss function unfortunately won't be smooth (because of noise from cross-validation on different datasets), but will be generally convex.

To implement gradient descent numerically (without having an analytical method for evaluating the gradient), choose a test point and a second point that is delta away from your test point in all dimensions. Evaluating your loss function at these two points can allow you to numerically compute a local subgradient. It is important that delta be large enough that it steps outside of local minima created by cross-validation noise.

A slower but potentially more robust alternative is to implement bisection for each parameter you're testing. If you know that the problem in jointly convex in your two parameters (or n parameters), you can separate this into n univariate optimization problems, and write a bisection algorithm which recursively hones in on the optimal parameters. This can help handle some types of quasiconvexity (e.g. if your loss function takes a background noise value for part of its domain, and is convex in another region), but requires a good guess as to the bounds for the initial iteration.

If you simply snap the requested x values to an integer grid without fixing xtol to map to that gridsize, you risk having the solver request two points within a grid cell, receiving the same output value, and concluding that it is at a minimum.

No easy answer, unfortunately.

Solution 3

Snap your floats x, y (a.k.a. winsize, threshold) to an integer grid inside your function, like this:

def func( x, y ):
    x = round( x )
    y = round( (y - 1) / 2 ) * 2 + 1  # 1 3 5 ...
    ...

Then Nelder-Mead will see function values only on the grid, and should give you near-integer x, y.

(If you'd care to post your code someplace, I'm looking for test cases for a Nelder-Mead with restarts.)

Solution 4

The Nelder-Mead minimize method now lets you specify the initial simplex vertex points, so you should be able to set the simplex points far apart, and the simplex will then flop around and find the minimum and converge when the simplex size drops below 1.

https://docs.scipy.org/doc/scipy/reference/optimize.minimize-neldermead.html#optimize-minimize-neldermead

Solution 5

The problem is that the algorithm gets stuck trying to shrink its (N+1) simplex. I'd highly recommend for anyone new to the concept to learn more about the geographical shape of a simplex and figure out how the input parameters relate to the points on the simplex. Once you get a grasp of that then as I.P. Freeley suggested the problem can be solved by defining strong initial points for your simplex, Note that this is different than defining your x0 and goes into nelder-mead's dedicated options. Here is an example of a higher --4-- dimensional problem. Also note that the initial simplex has to have N+1 points in this case 5 and in your case 3.

init_simplex = np.array([[1, .1, .3, .3], [.1, 1, .3, .3], [.1, .1, 5, .3],
                         [.1, .1, .3, 5], [1, 1, 5, 5]])
minimum = minimize(Optimize.simplex_objective, x0=np.array([.01, .01, .01, .01]),
                   method='Nelder-Mead',
                   options={'adaptive': True, 'xatol': 0.1, 'fatol': .00001,
                            'initial_simplex': init_simplex})

In this example the x0 gets ignored by the definition of the initial_simplex. Other useful option in high dimensional problems is the 'adaptive' option, which takes the number of parameters into acount while trying to set the models operational coefficients (ie. α, γ,ρ and σ for reflection, expansion, contraction and shrink respectively). And if you haven't already, I'd also recommend familiarizing yourself with the steps of the algorithm.

Now as for the reason this problem is happening its because the method gets no good results in an expansion so it keeps shrinking the simplex smaller and smaller trying to find out a better solution that may or may not exist.

View more solutions

18,669

Author by

kscottz

Director of R&D at Ingenuitas. Specializes in computer vision, graphics, and augmented reality. Former Defense Contractor / Boffin. Columbia University CS graduate student and University of Michigan engineering alum.

Updated on June 12, 2022

Comments

kscottz almost 2 years
I have a computer vision algorithm I want to tune up using scipy.optimize.minimize. Right now I only want to tune up two parameters but the number of parameters might eventually grow so I would like to use a technique that can do high-dimensional gradient searches. The Nelder-Mead implementation in SciPy seemed like a good fit.

I got the code all set up but it seems that the minimize function really wants to use floating point values with a step size that is less than one.The current set of parameters are both integers and one has a step size of one and the other has a step size of two (i.e. the value must be odd, if it isn't the thing I am trying to optimize will convert it to an odd number). Roughly one parameter is a window size in pixels and the other parameter is a threshold (a value from 0-255).

For what it is worth I am using a fresh build of scipy from the git repo. Does anyone know how to tell scipy to use a specific step size for each parameter? Is there some way I can roll my own gradient function? Is there a scipy flag that could help me out? I am aware that this could be done with a simple parameter sweep, but I would eventually like to apply this code to much larger sets of parameters.

The code itself is dead simple:
```
import numpy as np
from scipy.optimize import minimize
from ScannerUtil import straightenImg 
import bson

def doSingleIteration(parameters):
    # do some machine vision magic
    # return the difference between my value and the truth value

parameters = np.array([11,10])
res = minimize( doSingleIteration, parameters, method='Nelder-Mead',options={'xtol': 1e-2, 'disp': True,'ftol':1.0,}) #not sure if these params do anything
print "~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~"
print res
```
This is what my output looks like. As you can see we are repeating a lot of runs and not getting anywhere in the minimization.
```
*+++++++++++++++++++++++++++++++++++++++++
[ 11.  10.]  <-- Output from scipy minimize
{'block_size': 11, 'degree': 10} <-- input to my algorithm rounded and made int
+++++++++++++++++++++++++++++++++++++++++
120  <-- output of the function I am trying to minimize
+++++++++++++++++++++++++++++++++++++++++
[ 11.55  10.  ]
{'block_size': 11, 'degree': 10}
+++++++++++++++++++++++++++++++++++++++++
120
+++++++++++++++++++++++++++++++++++++++++
[ 11.   10.5]
{'block_size': 11, 'degree': 10}
+++++++++++++++++++++++++++++++++++++++++
120
+++++++++++++++++++++++++++++++++++++++++
[ 11.55   9.5 ]
{'block_size': 11, 'degree': 9}
+++++++++++++++++++++++++++++++++++++++++
120
+++++++++++++++++++++++++++++++++++++++++
[ 11.1375  10.25  ]
{'block_size': 11, 'degree': 10}
+++++++++++++++++++++++++++++++++++++++++
120
+++++++++++++++++++++++++++++++++++++++++
[ 11.275  10.   ]
{'block_size': 11, 'degree': 10}
+++++++++++++++++++++++++++++++++++++++++
120
+++++++++++++++++++++++++++++++++++++++++
[ 11.    10.25]
{'block_size': 11, 'degree': 10}
+++++++++++++++++++++++++++++++++++++++++
120
+++++++++++++++++++++++++++++++++++++++++
[ 11.275   9.75 ]
{'block_size': 11, 'degree': 9}
+++++++++++++++++++++++++++++++++++++++++
120
+++++++++++++++++++++++++++++++++++++++++
~~~
SNIP 
~~~
+++++++++++++++++++++++++++++++++++++++++
[ 11.         10.0078125]
{'block_size': 11, 'degree': 10}
+++++++++++++++++++++++++++++++++++++++++
120
Optimization terminated successfully.
         Current function value: 120.000000
         Iterations: 7
         Function evaluations: 27
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  status: 0
    nfev: 27
 success: True
     fun: 120.0
       x: array([ 11.,  10.])
 message: 'Optimization terminated successfully.'
     nit: 7*
```
- Eric G over 11 years
  
  According to the docs, SciPy's Nelder-Mead method uses the Simplex linear programming algorithm. It relies on using non-integral points/step sizes to optimize the function. I'm not familiar with SciPy in general, so there may be a configuration option to get it do what you want. You might also want to look into integer programming (en.wikipedia.org/wiki/Integer_programming) as that sounds like what you're trying to accomplish.
- seberg over 11 years
  
  @EricG actually I think thats just a name mixup, the Nelder-Mead "Simplex" works with the geometrical structure of a Simplex. It has nothing to do with the Simplex algorithm from linear programming, and this is nonlinear optimization anyways.
- Danica over 11 years
  
  Because of issues like this, parameter tuning for ML algorithms is typically done just with a grid search (often on a logarithmic grid, but for your parameters that doesn't seem necessary). You might do a coarse grid search to find a good region first and then a finer-grained grid search in said region.