How do you do bicubic (or other non-linear) interpolation of re-sampled audio data?

16,262

Solution 1

My favorite resource for audio interpolating (especially in resampling applications) is Olli Niemitalo's "Elephant" paper.

I've used a couple of these and they sound terrific (much better than a straight cubic solution, which is relatively noisy). There are spline forms, Hermite forms, Watte, parabolic, etc. And they are discussed from an audio point-of-view. This is not just your typical naive polynomial fitting.

And code is included!

To decide which to use, you probably want to start with the table on page 60 which groups the algorithms into operator complexity (how many multiplies, and how many adds). Then choose among the best signal-to-noise solutions--use your ears as a guide to make the final choice. Note: Generally, the higher SNR, the better.

Solution 2

double InterpCubic(double x0, double x1, double x2, double x3, double t)
{
   double a0, a1, a2, a3;

   a0 = x3 - x2 - x0 + x1;
   a1 = x0 - x1 - a0;
   a2 = x2 - x0;
   a3 = x1;

   return a0*(t^3) + a1*(t^2) + a2*t + a3;
}

where x1 and x2 are the samples being interpolated between, x0 is x1's left neighbor, and x3 is x2's right neighbor. t is [0, 1], denoting the interpolation position between x1 and x2.

Solution 3

Honestly, cubic interpolation isn't generally much better for audio than linear. A simple suggestion for improving your linear interpolation would be to use an antialiasing filter (before or after the interpolation, depending on whether you are shortening the signal or lengthening it). Another option (though more computationally expensive) is sinc-interpolation, which can be done with very high quality.

We have released some simple, LGPL resampling code that can do both of these as part of WDL (see resample.h).

Solution 4

You're looking for polynomial interpolation. The idea is that you pick a number of known data points around the point you want to interpolate, compute an interpolated polynomial using the data points, and then find out the value of the polynomial and the interpolation point.

There are other methods. If you can stomach the math, look at signal reconstruction, or google for "signal interpolation".

Solution 5

I don't have enough reputation to comment on Donnie's answer so I hope it's okay if I at least partially reference it here in addition to answering the question. I co-maintain the Godot engine's audio system which used the same coefficients as the polynomial provided in that answer for a while and just wanted to throw it out there that the coefficients are wrong in that code snippet. The code given has some pretty severe artifacts that show up especially with low-frequency audio. And for that matter, at least one of the algorithms in the paper Nosredna's answer gives have some pretty severe low-pass. Godot has switched back to a simple cubic resampling scheme and it seems to be working well for most users.

I highly recommend using cubic resampling with the following polynomial:

a0 = 3 * y1 - 3 * y2 + y3 - y0;
a1 = 2 * y0 - 5 * y1 + 4 * y2 - y3;
a2 = y2 - y0;
a3 = 2 * y1;

out = (a0 * mu * mu2 + a1 * mu2 + a2 * mu + a3) / 2;

Where y0, y1, y2, and y3 are successive samples in the original audio, from earliest to latest, mu is the fractional component of the time in samples, and mu2 is the square of mu (which exists entirely to save a multiplication if the compiler fails to optimize correctly).

The math is beyond me but these coefficients have been working well in Godot for some time now with no user complaints.

Share:
16,262
MusiGenesis
Author by

MusiGenesis

Former anthropologist, amateur programmer since 1979, professional programmer since 1995. In my last job, despite my vigorous protests, I was given the actual title of "visionary" - I will be trying to live this down for the rest of my life. I do C#, software synthesis, animation, windows mobile, iPhone/iPad (Objective-C), Android, BlackBerry and Windows Phone 7. Also java server stuff. If you live in Philly and play disc golf or Ultimate (frisbee), we may know each other already.

Updated on June 05, 2022

Comments

  • MusiGenesis
    MusiGenesis almost 2 years

    I'm writing some code that plays back WAV files at different speeds, so that the wave is either slower and lower-pitched, or faster and higher-pitched. I'm currently using simple linear interpolation, like so:

                int newlength = (int)Math.Round(rawdata.Length * lengthMultiplier);
                float[] output = new float[newlength];
    
                for (int i = 0; i < newlength; i++)
                {
                    float realPos = i / lengthMultiplier;
                    int iLow = (int)realPos;
                    int iHigh = iLow + 1;
                    float remainder = realPos - (float)iLow;
    
                    float lowval = 0;
                    float highval = 0;
                    if ((iLow >= 0) && (iLow < rawdata.Length))
                    {
                        lowval = rawdata[iLow];
                    }
                    if ((iHigh >= 0) && (iHigh < rawdata.Length))
                    {
                        highval = rawdata[iHigh];
                    }
    
                    output[i] = (highval * remainder) + (lowval * (1 - remainder));
                }
    

    This works fine, but it tends to sound OK only when I lower the frequency of the playback (i.e. slow it down). If I raise the pitch on playback, this method tends to produce high-frequency artifacts, presumably because of the loss of sample information.

    I know that bicubic and other interpolation methods resample using more than just the two nearest sample values as in my code example, but I can't find any good code samples (C# preferably) that I could plug in to replace my linear interpolation method here.

    Does anyone know of any good examples, or can anyone write a simple bicubic interpolation method? I'll bounty this if I have to. :)

    Update: here are a couple of C# implementations of interpolation methods (thanks to Donnie DeBoer for the first one and nosredna for the second):

        public static float InterpolateCubic(float x0, float x1, float x2, float x3, float t)
        {
            float a0, a1, a2, a3;
            a0 = x3 - x2 - x0 + x1;
            a1 = x0 - x1 - a0;
            a2 = x2 - x0;
            a3 = x1;
            return (a0 * (t * t * t)) + (a1 * (t * t)) + (a2 * t) + (a3);
        }
    
        public static float InterpolateHermite4pt3oX(float x0, float x1, float x2, float x3, float t)
        {
            float c0 = x1;
            float c1 = .5F * (x2 - x0);
            float c2 = x0 - (2.5F * x1) + (2 * x2) - (.5F * x3);
            float c3 = (.5F * (x3 - x0)) + (1.5F * (x1 - x2));
            return (((((c3 * t) + c2) * t) + c1) * t) + c0;
        }
    

    In these functions, x1 is the sample value ahead of the point you're trying to estimate and x2 is the sample value after your point. x0 is left of x1, and x3 is right of x2. t goes from 0 to 1 and is the distance between the point you're estimating and the x1 point.

    The Hermite method seems to work pretty well, and appears to reduce the noise somewhat. More importantly it seems to sound better when the wave is sped up.

    • Goz
      Goz almost 15 years
      Isn't bicubic what you do to a 2D signal (ie image)? Surely you mean cubic for a 1D (ie audio) signal? I may be wrong though ...
    • MusiGenesis
      MusiGenesis almost 15 years
      @Goz: it might just be "cubic" for 1D audio (I dunno). Part of my problem is that all the code samples I've seen are for 2D graphics, and I just need the one D.
    • Nosredna
      Nosredna almost 15 years
      @MusiGenesis, yeah you DON'T want to go with a solution for graphics. The ear is very picky. You want to use a solution that gives you a large signal-to-noise ratio. See my answer below.
    • Nosredna
      Nosredna almost 15 years
      I just reread and noticed that you were going to offer a bounty. Man, why did I answer so quickly?
    • MusiGenesis
      MusiGenesis almost 15 years
      I haven't picked an answer yet, so the bounty may still come up. I may put my entire rep up.
    • Nosredna
      Nosredna almost 15 years
      I don't think you should put your entire rep up. The best you can do to take an existing sample and create an oversampled version is to use a curve that adds as little noise to your original sample as possible. You can't bring back any missing information. Therefore, you're unlikely to do any better than the wonderful Elephant formulae. However, you could ask the question, "Why does my audio sound 'cheap' when I raise the frequency of it?"
    • MusiGenesis
      MusiGenesis almost 15 years
      I've been asking that question for years. :)
    • Nosredna
      Nosredna almost 15 years
      Many instruments have characteristic sounds that are screwed up when you raise their pitch. The human voice is one. Male and female voices have formants. If you just raise the pitch, it can sound cheap or goofy. But if you raise the pitch and keep the formants, it can sound better. Some synths have formant filter sections.
  • MusiGenesis
    MusiGenesis almost 15 years
    I can stomach the math, but mainly I'm looking for code samples (preferably C# or C).
  • MusiGenesis
    MusiGenesis almost 15 years
    Me likey. I'm going to try this out.
  • Donnie DeBoer
    Donnie DeBoer almost 15 years
    One note: since cubic interpolation uses 4 samples (the 2 being interpolated between and their 2 closest neighbors), you must figure out how to handle the very first interpolation interval and the very last interpolation interval of the waveform data. Often, people just invent phantom samples to the left and right.
  • MusiGenesis
    MusiGenesis almost 15 years
    Well, it works, but unfortunately it sounds even worse than linear interpolation. Linear interpolation seems to produce more of a general low-level hiss, whereas this cubic formula tends to produce a high-pitched ringing.
  • MusiGenesis
    MusiGenesis almost 15 years
    Great link, again. I'm digging into it now. I just implemented the cubic interpolation from another answer, and it sounded terrible.
  • Nosredna
    Nosredna almost 15 years
    If you look at your processed sample with a spectrum analyzer, you'll see the hiss as a noise floor. The ringing is probably the result of a few added frequencies from the cubic that modulate in amplitude.
  • Nosredna
    Nosredna almost 15 years
    If you use any of these with a SNR over, say, 60, you should not be able to hear artifacts. If you do, you've probably made a mistake implementing. And don't underestimate the ease with which you can screw up which point is which and what your "between" value is. I screwed up my first try. You may have even messed up the cubic. It helps to graph sections of your input and output. :-)
  • MusiGenesis
    MusiGenesis almost 15 years
    Huh, like I evver make mistaks!
  • MusiGenesis
    MusiGenesis almost 15 years
    I think I implemented the cubic right - it was only a slight modification of my original code.
  • Nosredna
    Nosredna almost 15 years
    A bad result from the cubic would not surprise me. But it's incredibly easy to be off by one or get your "between value" at 1-x when it should be x, etc.
  • MusiGenesis
    MusiGenesis almost 15 years
    I might have missed this in the text, but what's the difference between an x-form implementation and a z-form implementation?
  • Nosredna
    Nosredna almost 15 years
    That's discussed on 35 and 36. If I recall correctly, they are equivalent in result, but one can be cheaper to calculate than the other depending on the specific interpolator. To you it should not make a difference. He does both for each equation and presents the cheaper solution.
  • MusiGenesis
    MusiGenesis almost 15 years
    Also, pardon my density here, but the linked article recommends these interpolators for oversampled data, but I'm working on pre-recorded WAV files which can't be oversampled (unless I'm misunderstanding the term?). The author says oversampling is left to the reader, so maybe I need to ask another question here.
  • MusiGenesis
    MusiGenesis almost 15 years
    The author mentions high-N oversampling with simple linear interpolation as a high-quality approach to this problem. This may be a better approach for my purposes, since I could do the oversampling once on the original audio, and then produce different playback speeds on the fly quickly with linear interpolation.
  • MusiGenesis
    MusiGenesis almost 15 years
    I've saved a copy on my laptop - that one's a keeper.
  • Nosredna
    Nosredna almost 15 years
    It's ideal to have oversampled data. If you don't, there's nothing you can do about it. I've used these in real-time ring modulators, oscillators, etc.
  • MusiGenesis
    MusiGenesis almost 15 years
    Is it possible to produce oversampled data from a pre-recorded WAV file? I'll ask this again as a real question - I just don't want to look especially stupid.
  • Nosredna
    Nosredna almost 15 years
    @MusiGnesis, your situation is different from mine. Whatever you end up doing, if it gets good results I want to know what you did. What I like about these equations is you won't get noise or weird overtones.
  • Nosredna
    Nosredna almost 15 years
    No, you can't get oversampled data from a prerecorded WAV file. Nyquist, essentially, says you don't have any real info for any frequencies above 1/2 the sampling rate.
  • Nosredna
    Nosredna almost 15 years
    Since you have existing data (and it's the same problem as real-time generated data, except that we can choose to be running our oscillators at 2x, 4x, 8x, or whatever the processor will handle), all you can do is draw a curve through your points that adds the least amount of noise possible. You can't get any extra data from your sample--that data no longer exists--it may as well be in a black hole.
  • MusiGenesis
    MusiGenesis almost 15 years
    I'm going to implement the 4-point Hermite, which he seems to say is the best for un-oversampled data. The funny thing is, I've never really found linear interpolation to be all that awful, but I'm using WAV file data for pitched synthesizer sounds and I can usually get away with only lowering the frequency. I'd like to be able to raise the pitch without sounding like a 1993 soundcard FM synth.
  • Nosredna
    Nosredna almost 15 years
    Is it possible that what's bothering you about raising the pitch is the Alvin and the Chipmunks effect? Do you know anything about formant filters? (But the linear interpolation is absolutely adding noise 10dB under your signal.)
  • MusiGenesis
    MusiGenesis almost 15 years
    I'm doing a version of wavetable synthesis, but my code forms the amplitude envelopes for all the notes, so the overall shape is the same regardless of pitch. The particular problem I'm having is with the sustain part of a note sounding kind of "scratchy" (if that makes sense) when I create a note at a higher frequency. The Hermite interpolation seems to make this part sound much better and clearer, but it might just be that my audio neurons are dying at a faster rate today.
  • Nosredna
    Nosredna almost 15 years
    How are you doing sustain? Looping the same section over and over? Where does that section come from? You make be lifting some low frequency noise into audible range.
  • MusiGenesis
    MusiGenesis almost 15 years
    It's a loop approach. I don't think the problem is low frequency noise lifted into the audible. The "scratchiness" is very high frequency - I can get rid of it with a lowpass filter, but I don't like the overall dulling effect and I'm trying to avoid having to do an FFT.
  • Nosredna
    Nosredna almost 15 years
    Any frequencies near Nyquist could do that to you when you speed up. You have to do the lowpass before you speed up, or else you'll be cutting out sounds that belong there, not just the ones that folded over (that's why you're getting a lack of brightness, probably). Suppose you have a sample you want to play at 44100. You have a sample you want to play at double speed. Put the lowpass cutoff at 11025, do the filter, then speed up. You don't want any frequencies going over Nyquist.
  • MusiGenesis
    MusiGenesis almost 13 years
    yeah, I agree. I ultimately went back to straight interpolation, as the improvement over linear interpolation was essentially imperceptible.
  • Tobia
    Tobia over 3 years
    Thank you. This is the updated address, which contains a revised paper and accompanying source code zip file: yehar.com/blog/?p=197
  • Diljeet
    Diljeet over 3 years
    Hi i checked all algorithms, and i settled on B-spline for simple and good quality, but it has an error, because when i calculate sample between x0,x1,x2,x3, all the values between x1 to x2 are having a little offset, coz if i try to draw audio wave curves from this algo, it shows that the curve never reaches the values x1 & x2, but is a little lower than the original waveform. Currently using Hermite with no such error.Can anybody guide me how & when to use the Optimal algo, it it useful? it has many variation 2x,4x,8x how & when to use it?
  • Diljeet
    Diljeet over 3 years
    @Nosredna And please can anybody suggest, which algorithm is best for downsampling if i use Optimal 2x for all resampling from any sample rate to any sample rate will it be ok, or should i use all the algorithms and switch to them 2x,4x,8x..whenever needed? and is Optimal 2x algo good for downsampling?
  • Karsten Becker
    Karsten Becker almost 3 years
    Be careful when copying this code, the ^operator is often used outside of C-languages to indicate "to the power of" but in most languages it is the XOR operator. So replace t^3 with ttt and t^2 with t*t.
  • Kevin Yin
    Kevin Yin almost 2 years
    These coefficients correspond to cubic Hermite interpolation.