Best way to interpolate a numpy.ndarray along an axis

python arrays optimization numpy interpolation

18,127

Solution 1

scipy's interp1d can help:

import numpy as np
from scipy.interpolate import interp1d

ntime, nheight_in, nlat, nlon = (10, 20, 30, 40)

heights = np.linspace(0, 1, nheight_in)

t_in = np.random.normal(size=(ntime, nheight_in, nlat, nlon))
f_out = interp1d(heights, t_in, axis=1)

nheight_out = 50
new_heights = np.linspace(0, 1, nheight_out)
t_out = f_out(new_heights)

Solution 2

I was looking for a similar function that works with irregularly spaced coordinates, and ended up writing my own function. As far as I see, the interpolation is handled nicely and the performance in terms of memory and speed is also quite good. I thought I'd share it here in case anyone else comes across this question looking for a similar function:

import numpy as np
import warnings

def interp_along_axis(y, x, newx, axis, inverse=False, method='linear'):
    """ Interpolate vertical profiles, e.g. of atmospheric variables
    using vectorized numpy operations

    This function assumes that the x-xoordinate increases monotonically

    ps:
    * Updated to work with irregularly spaced x-coordinate.
    * Updated to work with irregularly spaced newx-coordinate
    * Updated to easily inverse the direction of the x-coordinate
    * Updated to fill with nans outside extrapolation range
    * Updated to include a linear interpolation method as well
        (it was initially written for a cubic function)

    Peter Kalverla
    March 2018

    --------------------
    More info:
    Algorithm from: http://www.paulinternet.nl/?page=bicubic
    It approximates y = f(x) = ax^3 + bx^2 + cx + d
    where y may be an ndarray input vector
    Returns f(newx)

    The algorithm uses the derivative f'(x) = 3ax^2 + 2bx + c
    and uses the fact that:
    f(0) = d
    f(1) = a + b + c + d
    f'(0) = c
    f'(1) = 3a + 2b + c

    Rewriting this yields expressions for a, b, c, d:
    a = 2f(0) - 2f(1) + f'(0) + f'(1)
    b = -3f(0) + 3f(1) - 2f'(0) - f'(1)
    c = f'(0)
    d = f(0)

    These can be evaluated at two neighbouring points in x and
    as such constitute the piecewise cubic interpolator.
    """

    # View of x and y with axis as first dimension
    if inverse:
        _x = np.moveaxis(x, axis, 0)[::-1, ...]
        _y = np.moveaxis(y, axis, 0)[::-1, ...]
        _newx = np.moveaxis(newx, axis, 0)[::-1, ...]
    else:
        _y = np.moveaxis(y, axis, 0)
        _x = np.moveaxis(x, axis, 0)
        _newx = np.moveaxis(newx, axis, 0)

    # Sanity checks
    if np.any(_newx[0] < _x[0]) or np.any(_newx[-1] > _x[-1]):
        # raise ValueError('This function cannot extrapolate')
        warnings.warn("Some values are outside the interpolation range. "
                      "These will be filled with NaN")
    if np.any(np.diff(_x, axis=0) < 0):
        raise ValueError('x should increase monotonically')
    if np.any(np.diff(_newx, axis=0) < 0):
        raise ValueError('newx should increase monotonically')

    # Cubic interpolation needs the gradient of y in addition to its values
    if method == 'cubic':
        # For now, simply use a numpy function to get the derivatives
        # This produces the largest memory overhead of the function and
        # could alternatively be done in passing.
        ydx = np.gradient(_y, axis=0, edge_order=2)

    # This will later be concatenated with a dynamic '0th' index
    ind = [i for i in np.indices(_y.shape[1:])]

    # Allocate the output array
    original_dims = _y.shape
    newdims = list(original_dims)
    newdims[0] = len(_newx)
    newy = np.zeros(newdims)

    # set initial bounds
    i_lower = np.zeros(_x.shape[1:], dtype=int)
    i_upper = np.ones(_x.shape[1:], dtype=int)
    x_lower = _x[0, ...]
    x_upper = _x[1, ...]

    for i, xi in enumerate(_newx):
        # Start at the 'bottom' of the array and work upwards
        # This only works if x and newx increase monotonically

        # Update bounds where necessary and possible
        needs_update = (xi > x_upper) & (i_upper+1<len(_x))
        # print x_upper.max(), np.any(needs_update)
        while np.any(needs_update):
            i_lower = np.where(needs_update, i_lower+1, i_lower)
            i_upper = i_lower + 1
            x_lower = _x[[i_lower]+ind]
            x_upper = _x[[i_upper]+ind]

            # Check again
            needs_update = (xi > x_upper) & (i_upper+1<len(_x))

        # Express the position of xi relative to its neighbours
        xj = (xi-x_lower)/(x_upper - x_lower)

        # Determine where there is a valid interpolation range
        within_bounds = (_x[0, ...] < xi) & (xi < _x[-1, ...])

        if method == 'linear':
            f0, f1 = _y[[i_lower]+ind], _y[[i_upper]+ind]
            a = f1 - f0
            b = f0

            newy[i, ...] = np.where(within_bounds, a*xj+b, np.nan)

        elif method=='cubic':
            f0, f1 = _y[[i_lower]+ind], _y[[i_upper]+ind]
            df0, df1 = ydx[[i_lower]+ind], ydx[[i_upper]+ind]

            a = 2*f0 - 2*f1 + df0 + df1
            b = -3*f0 + 3*f1 - 2*df0 - df1
            c = df0
            d = f0

            newy[i, ...] = np.where(within_bounds, a*xj**3 + b*xj**2 + c*xj + d, np.nan)

        else:
            raise ValueError("invalid interpolation method"
                             "(choose 'linear' or 'cubic')")

    if inverse:
        newy = newy[::-1, ...]

    return np.moveaxis(newy, 0, axis)

And this is a small example to test it:

import numpy as np
import matplotlib.pyplot as plt
from scipy.interpolate import interp1d as scipy1d

# toy coordinates and data
nx, ny, nz = 25, 30, 10
x = np.arange(nx)
y = np.arange(ny)
z = np.tile(np.arange(nz), (nx,ny,1)) + np.random.randn(nx, ny, nz)*.1
testdata = np.random.randn(nx,ny,nz) # x,y,z

# Desired z-coordinates (must be between bounds of z)
znew = np.tile(np.linspace(2,nz-2,50), (nx,ny,1)) + np.random.randn(nx, ny, 50)*0.01

# Inverse the coordinates for testing
z = z[..., ::-1]
znew = znew[..., ::-1]

# Now use own routine
ynew = interp_along_axis(testdata, z, znew, axis=2, inverse=True)

# Check some random profiles
for i in range(5):
    randx = np.random.randint(nx)
    randy = np.random.randint(ny)

    checkfunc = scipy1d(z[randx, randy], testdata[randx,randy], kind='cubic')
    checkdata = checkfunc(znew)

    fig, ax = plt.subplots()
    ax.plot(testdata[randx, randy], z[randx, randy], 'x', label='original data')
    ax.plot(checkdata[randx, randy], znew[randx, randy], label='scipy')
    ax.plot(ynew[randx, randy], znew[randx, randy], '--', label='Peter')
    ax.legend()
    plt.show()

18,127

chw21

Updated on October 06, 2022

Comments

chw21 over 1 year
I have 4-dimensional data, say for the temperature, in an numpy.ndarray. The shape of the array is (ntime, nheight_in, nlat, nlon).

I have corresponding 1D arrays for each of the dimensions that tell me which time, height, latitude, and longitude a certain value corresponds to, for this example I need height_in giving the height in metres.

Now I need to bring it onto a different height dimension, height_out, with a different length.

The following seems to do what I want:
```
ntime, nheight_in, nlat, nlon = t_in.shape

nheight_out = len(height_out)
t_out = np.empty((ntime, nheight_out, nlat, nlon))

for time in range(ntime):
    for lat in range(nlat):
        for lon in range(nlon):
            t_out[time, :, lat, lon] = np.interp(
                height_out, height_in, t[time, :, lat, lon]
            )
```
But with 3 nested loops, and lots of switching between python and numpy, I don't think this is the best way to do it.

Any suggestions on how to improve this? Thanks
chw21 about 9 years

Thanks a lot, but is it possible that your method uses more memory than the one I showed above. While testing it, I noticed that your method was considerably faster than mine until the array sizes exceeded a certain level (90x50x181x360 in my case), where suddenly yours became much slower than mine.
fjarri about 9 years

It is possible that interp1d interpolates a linear sequence of "data points", each of which is a whole 3D array. So, if these arrays become big, you start to hit swap file. You can avoid this by running it for chunks of the full array, or maybe find some high-level libraries that will do that for you.
dashesy almost 9 years

interp1d requires a lot of memory, and does not extrapolate
fjarri almost 9 years

@dashesy: "requires a lot of memory" - OP was concerned about speed. "does not extrapolate" - that's why it's called "interp". That said, if you have a better solution with numpy/scipy functions, by all means post it, and I'll be the first to upvote it.
dashesy almost 9 years

@fjarri no there is no better solution, just wanted to let future readers know about the caveats; specially memory consumption which is very high and can be replaced with InterpolatedUnivariateSpline on 1d slices.
Fei Yao almost 5 years

Just a supplementary. Readers can check this link if they are interested in linear interpolation along an axis with a lot of missing data in the data array.
xer-t over 4 years

Nice!! Very useful! Thaanks!