Combine multiple NetCDF files into timeseries multidimensional array python

23,345

Solution 1

Nice start, I would recommend the following to help solve your issues.

First, check out ncrcat to quickly concatenate your individual netCDF files into a single file. I highly recommend downloading NCO for netCDF manipulations, especially in this instance where it will ease your Python coding later on.

Let's say the files are named precip_1.nc, precip_2.nc, precip_3.nc, and precip_4.nc. You could concatenate them along the record dimension to form a new precip_all.nc with a record dimension of length 58400 with

ncrcat precip_1.nc precip_2.nc precip_3.nc precip_4.nc -O precip_all.nc

In Python we now just need to read in that new single file and then extract and store the time series for the desired grid cells. Something like this:

import netCDF4
import numpy as np

yindexlist = [1,2,3]
xindexlist = [4,5,6]
ngridcell = len(xidx)
ntimestep = 58400

# Define an empty 2D array to store time series of precip for a set of grid cells
timeseries_per_grid_cell = np.zeros([ngridcell, ntimestep])

ncfile = netCDF4.Dataset('path/to/file/precip_all.nc', 'r')

# Note that precip is 3D, so need to read in all dimensions
precip = ncfile.variables['precip'][:,:,:]

for i in range(ngridcell):
     timeseries_per_grid_cell[i,:] = precip[:, yindexlist[i], xindexlist[i]]

ncfile.close()

If you have to use Python only, you'll need to keep track of the chunks of time indices that the individual files form to make the full time series. 58400/4 = 14600 time steps per file. So you'll have another loop to read in each individual file and store the corresponding slice of times, i.e. the first file will populate 0-14599, the second 14600-29199, etc.

Solution 2

You can easily merge multiple netCDF files into one using netCDF4 package in Python. See example below:

I have four netCDF files like 1.nc, 2.nc, 3.nc, 4.nc. Using command below all four files will be merge into one dataset.

import netCDF4
from netCDF4 import Dataset

dataset = netCDF4.MFDataset(['1.nc','2.nc','3.nc','4.nc'])

Solution 3

In parallel to the answer of N1B4, you can also concatenate 4 files along their time dimension using CDO from the command line

cdo mergetime precip1.nc precip2.nc precip3.nc precip4.nc merged_file.nc 

or with wildcards

cdo mergetime precip?.nc merged_file.nc 

and then proceed to read it in as per that answer.

You can add another step from the command line to extract the location of choice by using

cdo remapnn,lon=X/lat=Y merged_file.nc my_location.nc

this picks out the gridcell nearest to your specified lon/lat (X,Y) coordinate, or you can use bilinear interpolation if you prefer:

cdo remapbil,lon=X/lat=Y merged_file.nc my_location.nc 
Share:
23,345
LCook
Author by

LCook

Ph.D. student in Civil & Environmental Engineering working with climate data, netCDF files, python, and MATLAB.

Updated on December 19, 2020

Comments

  • LCook
    LCook over 3 years

    I am using data from multiple netcdf files (in a folder on my computer). Each file holds data for the entire USA, for a time period of 5 years. Locations are referenced based on the index of an x and y coordinate. I am trying to create a time series for multiple locations(grid cells), compiling the 5 year periods into a 20 year period (this would be combining 4 files). Right now I am able to extract the data from all files for one location and compile this into an array using numpy append. However, I would like to extract the data for multiple locations, placing this into a matrix where the rows are the locations and the columns contain the time series precipitation data. I think I have to create a list or dictionary, but I am not really sure how to allocate the data to the list/dictionary within a loop.

    I am new to python and netCDF, so forgive me if this is an easy solution. I have been using this code as a guide, but haven't figured out how to format it for what I'd like to do: Python Reading Multiple NetCDF Rainfall files of variable size

    Here is my code:

    import glob
    from netCDF4 import Dataset
    import numpy as np
    
    # Define x & y index for grid cell of interest 
        # Pittsburgh is 37,89
    yindex = 37  #first number
    xindex = 89  #second number
    
    # Path
    path = '/Users/LMC/Research Data/NARCCAP/'  
    folder = 'MM5I_ccsm/'
    
    ## load data file names    
    all_files = glob.glob(path + folder+'*.nc')
    all_files.sort()
    
    ## initialize np arrays of timeperiods and locations
    yindexlist = [yindex,'38','39'] # y indices for all grid cells of interest
    xindexlist = [xindex,xindex,xindex] # x indices for all grid cells of interest
    ngridcell = len(yindexlist)
    ntimestep = 58400  # This is for 4 files of 14600 timesteps
    
    ## Initialize np array
    timeseries_per_gridcell = np.empty(0)
    
    ## START LOOP FOR FILE IMPORT
    for timestep, datafile in enumerate(all_files):    
        fh = Dataset(datafile,mode='r')  
        days = fh.variables['time'][:]
        lons = fh.variables['lon'][:]
        lats = fh.variables['lat'][:]
        precip = fh.variables['pr'][:]
    
        for i in range(1):
            timeseries_per_gridcell = np.append(timeseries_per_gridcell,precip[:,yindexlist[i],xindexlist[i]]*10800)
    
        fh.close()
    
    print timeseries_per_gridcell     
    

    I put 3 files on dropbox so you could access them, but I am only allowed to post 2 links. Here are they are:

    https://www.dropbox.com/s/rso0hce8bq7yi2h/pr_MM5I_ccsm_2041010103.nc?dl=0 https://www.dropbox.com/s/j56undjvv7iph0f/pr_MM5I_ccsm_2046010103.nc?dl=0