Loop through netcdf files and run calculations - Python or R

11,464

Solution 1

You can use the very nice MFDataset feature in netCDF4 to treat a bunch of files as one aggregated file, without the need to use ncrcat. So you code would look like this:

from pylab import *
import netCDF4

f = netCDF4.MFDataset('/usgs/data2/rsignell/models/ncep/narr/air.2m.19??.nc')
# print variables
f.variables.keys()

atemp = f.variables['air']
print atemp

ntimes, ny, nx = shape(atemp)
cold_days = zeros((ny,nx),dtype=int)

for i in xrange(ntimes):
    cold_days += atemp[i,:,:].data-273.15 < 0

pcolormesh(cold_days)
colorbar()

generated image of cold days

And here's one way to write the file (there might be easier ways):

# create NetCDF file
nco = netCDF4.Dataset('/usgs/data2/notebook/cold_days.nc','w',clobber=True)
nco.createDimension('x',nx)
nco.createDimension('y',ny)

cold_days_v = nco.createVariable('cold_days', 'i4',  ( 'y', 'x'))
cold_days_v.units='days'
cold_days_v.long_name='total number of days below 0 degC'
cold_days_v.grid_mapping = 'Lambert_Conformal'

lono = nco.createVariable('lon','f4',('y','x'))
lato = nco.createVariable('lat','f4',('y','x'))
xo = nco.createVariable('x','f4',('x'))
yo = nco.createVariable('y','f4',('y'))
lco = nco.createVariable('Lambert_Conformal','i4')

# copy all the variable attributes from original file
for var in ['lon','lat','x','y','Lambert_Conformal']:
    for att in f.variables[var].ncattrs():
        setattr(nco.variables[var],att,getattr(f.variables[var],att))

# copy variable data for lon,lat,x and y
lono[:]=f.variables['lon'][:]
lato[:]=f.variables['lat'][:]
xo[:]=f.variables['x'][:]
yo[:]=f.variables['y'][:]

#  write the cold_days data
cold_days_v[:,:]=cold_days

# copy Global attributes from original file
for att in f.ncattrs():
    setattr(nco,att,getattr(f,att))

nco.Conventions='CF-1.6'
nco.close()

If I try looking at the resulting file in the Unidata NetCDF-Java Tools-UI GUI, it seems to be okay: enter image description here Also note that here I just downloaded two of the datasets for testing, so I used

f = netCDF4.MFDataset('/usgs/data2/rsignell/models/ncep/narr/air.2m.19??.nc')

as an example. For all the data, you could use

f = netCDF4.MFDataset('/usgs/data2/rsignell/models/ncep/narr/air.2m.????.nc')

or

f = netCDF4.MFDataset('/usgs/data2/rsignell/models/ncep/narr/air.2m.*.nc')

Solution 2

Here is an R solution.

infiles <- list.files("data", pattern = "nc", full.names = TRUE, include.dirs = TRUE)

outfile <- "data/air.colddays.nc"     

library(raster)

r <- raster::stack(infiles) 
r <- sum((r - 273.15) < 0)

plot(r)

enter image description here

Solution 3

I know this is rather late for this thread from 2013, but I just want to point out that the accepted solution doesn't provide the solution to the exact question posed. The question seems to want the length of each continuous period of temperatures below zero (note in the question the counter resets if the temperature exceeds zero), which can be important for climate applications (e.g. for farming) whereas the accepted solution only gives the total number of days in a year that the temperature is below zero. If this is really what mkmitchell wants (it has been accepted as the answer) then it can be done in from the command line in cdo without having to worry about NETCDF input/output:

 cdo timsum -lec,273.15 in.nc out.nc

so a looped script would be:

files=`ls *.nc` # pick up all the netcdf files in a directory
for file in $files ; do
    # I use 273.15 as from the question seems T is in Kelvin 
    cdo timsum -lec,273.15 $file ${file%???}_numdays.nc
done 

If you then want the total number over the whole period you can then cat the _numdays files instead which are much smaller:

cdo cat *_numdays.nc total.nc 
cdo timsum total.nc total_below_zero.nc 

But again, the question seems to want accumulated days per event, which is different, but not provided by the accepted answer.

Share:
11,464
mkmitchell
Author by

mkmitchell

I'm a GIS/Remote Sensing Analyst for Ducks Unlimited

Updated on June 25, 2022

Comments

  • mkmitchell
    mkmitchell about 2 years

    This is my first time using netCDF and I'm trying to wrap my head around working with it.

    I have multiple version 3 netcdf files (NOAA NARR air.2m daily averages for an entire year). Each file spans a year between 1979 - 2012. They are 349 x 277 grids with approximately a 32km resolution. Data was downloaded from here.

    The dimension is time (hours since 1/1/1800) and my variable of interest is air. I need to calculate accumulated days with a temperature < 0. For example

        Day 1 = +4 degrees, accumulated days = 0
        Day 2 = -1 degrees, accumulated days = 1
        Day 3 = -2 degrees, accumulated days = 2
        Day 4 = -4 degrees, accumulated days = 3
        Day 5 = +2 degrees, accumulated days = 0
        Day 6 = -3 degrees, accumulated days = 1
    

    I need to store this data in a new netcdf file. I am familiar with Python and somewhat with R. What is the best way to loop through each day, check the previous days value, and based on that, output a value to a new netcdf file with the exact same dimension and variable.... or perhaps just add another variable to the original netcdf file with the output I'm looking for.

    Is it best to leave all the files separate or combine them? I combined them with ncrcat and it worked fine, but the file is 2.3gb.

    Thanks for the input.

    My current progress in python:

    import numpy
    import netCDF4
    #Change my working DIR
    f = netCDF4.Dataset('air7912.nc', 'r')
    for a in f.variables:
      print(a)
    
    #output =
         lat
         long
         x
         y
         Lambert_Conformal
         time
         time_bnds
         air
    
    f.variables['air'][1, 1, 1]
    #Output
         298.37473
    

    To help me understand this better what type of data structure am I working with? Is ['air'] the key in the above example and [1,1,1] are also keys? to get the value of 298.37473. How can I then loop through [1,1,1]?

  • mkmitchell
    mkmitchell almost 11 years
    Thank you sir! That's exactly what I was looking for and much more in-depth than I was expecting. You have saved me a great deal of time. I'm always impressed by the community.
  • Eli S
    Eli S about 8 years
    I've mentioned this on another thread, but it is a bummer that MFDataset won't work for NetCDF4 in python, even with some restrictions. There are lots of good examples of MFDataset usage, and these are good for the many legacy file but not for the latest standard.
  • Adrian Tompkins
    Adrian Tompkins about 7 years
    I put a comment above to state that this solution (while elegant and detailed) does not answer the question posed, as it provides the total days in a year below zero, and not the length of each continuous period below freezing, which can be important for agriculture for example.