How to get the FFT of a numpy array to work?

13,638

Solution 1

Presumably there are some missing values in your csv file. By default, np.genfromtxt will replace the missing values with NaN.

If there are any NaNs or Infs in an array, the fft will be all NaNs or Infs.

For example:

import numpy as np

x = [0.1, 0.2, np.nan, 0.4, 0.5]
print np.fft.fft(x)

And we'll get:

array([ nan +0.j,  nan+nanj,  nan+nanj,  nan+nanj,  nan+nanj])

However, because an FFT operates on a regularly-spaced series of values, removing the non-finite values from an array is a bit more complex than just dropping them.

pandas has several specialized operations to do this, if you're open to using it (e.g. fillna). However, it's not too difficult to do with "pure" numpy.

First, I'm going to assume that you're working with a continuous series of data because you're taking the FFT of the values. In that case, we'd want to interpolate the NaN values based on the values around them. Linear interpolation (np.interp) may not be ideal in all situations, but it's not a bad default choice:

For example:

import numpy as np

x = np.array([0.1, 0.2, np.nan, 0.4, 0.5])
xi = np.arange(len(x))

mask = np.isfinite(x)
xfiltered = np.interp(xi, xi[mask], x[mask])

And we'll get:

In [18]: xfiltered
Out[18]: array([ 0.1,  0.2,  0.3,  0.4,  0.5])

We can then calculate the FFT normally:

In [19]: np.fft.fft(xfiltered)
Out[19]: 
array([ 1.50+0.j        , -0.25+0.34409548j, -0.25+0.08122992j,
       -0.25-0.08122992j, -0.25-0.34409548j])

...and get a valid result.

Solution 2

If your data contains NaN values, you need to interpolate them. Alternatively, you can calculate the spectrum using the Fourier equation where np.sum is replaced with np.nansum. With this approach you don't need to interpolate NaN values, although the amount of missing data will effect the spectrum. More missing data will result in a noisy spectrum and hence inaccurate spectral values.

Below is a MWE to illustrate the concept, with a graph showing the result. The MWE illustrates how to calculate the single-sided amplitude spectrum of a simple reference signal containing a number of missing values.

#!/usr/bin/python
# Python code to plot amplitude spectrum of signal containing NaN values
# Python version 2.7.13

from __future__ import division
import numpy as np
import pylab as pl
import random

LW = 2 #line width
AC = 0.5 #alpha channel
pi = np.pi

def periodogramSS(inputsignal,fsamp):
 N = len(inputsignal)
 N_notnan = np.count_nonzero(~np.isnan(inputsignal))
 hr = fsamp/N #frequency resolution
 t = np.arange(0,N*Ts,Ts)
 #flow,fhih = -fsamp/2,(fsamp/2)+hr #Double-sided spectrum
 flow,fhih = 0,fsamp/2+hr #Single-sided spectrum
 #flow,fhih = hr,fsamp/2
 frange = np.arange(flow,fhih,hr)
 fN = len(frange)
 Aspec = np.zeros(fN)
 n = 0
 for f in frange:
  Aspec[n] = np.abs(np.nansum(inputsignal*np.exp(-2j*pi*f*t)))/N_notnan
  n+=1
 Aspec *= 2 #single-sided spectrum
 Aspec[0] /= 2 #DC component restored (i.e. halved)
 return (frange,Aspec)

#construct reference signal:
f1 = 10 #Hz
T = 1/f1
fs = 10*f1
Ts = 1/fs
t = np.arange(0,20*T,Ts)
DC = 3.0
x = DC + 1.5*np.cos(2*pi*f1*t)

#randomly delete values from signal x:
ndel = 10 #number of samples to replace with NaN
random.seed(0)
L = len(x)
randidx = random.sample(range(0,L),ndel)
for idx in randidx:
 x[idx] = np.nan

(fax,Aspectrum) = periodogramSS(x,fs)

fig1 = pl.figure(1,figsize=(6*3.13,4*3.13)) #full screen
pl.ion()

pl.subplot(211)
pl.plot(t, x, 'b.-', lw=LW, ms=2, label='ref', alpha=AC)

#mark NaN values:
for (t_,x_) in zip(t,x):
 if np.isnan(x_):
  pl.axvline(x=t_,color='g',alpha=AC,ls='-',lw=2)

pl.grid()
pl.xlabel('Time [s]')
pl.ylabel('Reference signal')

pl.subplot(212)
pl.stem(fax, Aspectrum, basefmt=' ', markerfmt='r.', linefmt='r-')
pl.grid()
pl.xlabel('Frequency [Hz]')
pl.ylabel('Amplitude spectrum')

fig1name = './signal.png'
print 'Saving Fig. 1 to:', fig1name
fig1.savefig(fig1name)

Figure displayed temporal signal with amplitude spectrum The reference signal (real) is shown in blue with missing values marked with green. The single-sided amplitude spectrum is shown in red. The DC component and amplitude value at 10 Hz are clearly visible. The other values are caused by the reference signal being broken up by the missing data.

Share:
13,638

Related videos on Youtube

Jessica Flores
Author by

Jessica Flores

Updated on June 30, 2022

Comments

  • Jessica Flores
    Jessica Flores almost 2 years

    I'm reading a specific column of a csv file as a numpy array. When I try to do the fft of this array I get an array of NaNs. How do I get the fft to work? Here's what I have so far:

    #!/usr/bin/env python
    from __future__ import division
    
    import numpy as np
    from numpy import fft
    
    import matplotlib.pyplot as plt
    
    
    fileName = '/Users/Name/Documents/file.csv'
    
    #read csv file
    df = np.genfromtxt(fileName, dtype = float, delimiter = ',', names = True)
    
    X = df['X'] #get X from file
    
    rate = 1000. #rate of data collection in points per second
    Hx = abs(fft.fft(X))
    freqX = fft.fftfreq(len(Hx), 1/rate)
    
    plt.plot(freqX,Hx) #plot freqX vs Hx
    
    • cel
      cel over 8 years
      have you checked if there are nans in your vector?
  • Jessica Flores
    Jessica Flores over 8 years
    Yes, I do have a continuous series of data. The data series that I have was collected at 1000 times per second over a time of 20 seconds. How would handle these missing values purely with numpy?
  • Joe Kington
    Joe Kington over 8 years
    You should be able to use a combination of np.isfinite and np.interp as the second example shows. (I edited it a few minutes ago, so you might need to refresh to see the update.) Hope that helps!