Opinions on NetCDF vs HDF5 for storing scientific data?

24,347

Solution 1

I strongly suggest you HDF5 instead of NetCDF. NetCDF is flat, and it gets very dirty after a while if you are not able to classify stuff. Of course classification is also a matter of debate, but at least you have this flexibility.

We performed an accurate evaluation of HDF5 vs. NetCDF when I wrote Q5Cost, and the final result was for HDF5 hands down.

Solution 2

I'll have to admit using HDF5 is very much easier in the long run. It's not hard to get simple data structures into NetCDF format, but manipulating them down the road is kind of a pain.

The "H" in HDF5 stands for "heirarchical", which translated (for me anyway) into a REALLY easy way to manipulate data, by just moving nodes around and referencing nodes from other places.

Can I ask what kind of project this is? I use these both for a lot of HPC scientific modeling tasks. Can I assume you're doing the same? If so, the trend I'm seeing is people moving to HDF5, but that might be different in your particular domain.

However you end up going, best of luck!

Solution 3

NetCDF, starting with version 4.0 (2008) can read and write most HDF5 files, and provides access to the hierarchical features of HDF5 via the enhanced data model.

HDF5 is extremely feature-rich, and has some great performance features.

NetCDF has a simpler API, and a much wider tool base. There are many tools that handle netCDF data.

Solution 4

I know this is an older post, and the original poster has indicated they've moved on, but for anyone that ends up here...the netCDF-Java library (as of 4.3.13) has netCDF-4 write support via the netCDF C library. It's still in beta, but it does work and feedback is certainly appreciated!

Please see the netCDF-Java reference docs for more details.

Solution 5

1) Netcdf-4 C library is a layer on top of HDF-5 C library. The API is considered simpler than the HDF5 library, but in the end you have pretty much the same functionality. Netcdf does not support graphs, but HDF5 does. In fact, HDF does not prevent cycles in your graph i think.

2) the HDF group has a Java API on top of HDF-5 C library.

3) Unidata has Netcdf-Java library which is pure Java, but can only read HDF-5.

Share:
24,347
Jason S
Author by

Jason S

Updated on July 05, 2022

Comments

  • Jason S
    Jason S almost 2 years

    Anyone out there have enough experience w/ NetCDF and HDF5 to give some pluses / minuses about them as a way of storing scientific data?

    I've used HDF5 and would like to read/write via Java but the interface is essentially a wrapper around the C libraries, which I have found confusing, so NetCDF seems intriguing but I know almost nothing about it.

    edit: my application is "only" for datalogging, so that I get a file that has a self-describing format. Important features for me are being able to add arbitrary metadata, having fast write access for appending to byte arrays, and having single-writer / multiple-reader concurrency (strongly preferred but not a must-have. NetCDF docs say they have SWMR but don't say whether they support any mechanism for ensuring that two writers can't open the same file at once with disastrous results). I like the hierarchical aspect of HDF5 (in particular I love the directed-acyclic-graph hierarchy, much more flexible than a "regular" filesystem-like hierarchy), am reading the NetCDF docs now... if it only allows one dataset per file then it probably won't work for me. :(

    update — looks like NetCDF-Java reads from netCDF-4 files but only writes from netCDF-3 files which don't support hierarchical groups. darn.

    update 2009-Jul-14: I am starting to get really upset with HDF5 in Java. The library available isn't that great and it has some major stumbling blocks that have to do with Java's abstraction layers (compound data types). A great file format for C but looks like I just lose. >:(

  • mdsumner
    mdsumner over 13 years
    afaik, NetCDF4 is a kind of dumbed down HDF5 so that it is familiar to those used to previous versions of NetCDF. unidata.ucar.edu/mailing_lists/archives/netcdfgroup/2010/…
  • Jason S
    Jason S almost 13 years
    Last I checked, the Java library didn't allow for writing HDF5 files. Anyway, it's a moot point as I've moved on to other things. :-/
  • Abe
    Abe over 10 years
    the answer is outdated - netCDF is now built on HDF5
  • badgley
    badgley over 10 years
    @abe not necessarily. netcdf4 still has some backward compatibility w netcdf3. that means some compression options still aren't availble to nc files.
  • naught101
    naught101 over 10 years
    Thanks for the concise answer, that's very useful info, although it'd be even better if it had some references :)
  • naught101
    naught101 over 10 years
    Um... half your answer says that NetCDF doesn't support unsigned values, and the other half suggests it doesn't support signed values. Which is it gonna be? The first link only says that NetCDF 3 doesn't have unsigned integers, not values generally. Also, the second link indicates the problem is with java, not netCDF4. And really, what does it matter anyway? It means you have half as many integers for indexing, but you still have 2^31 (= 2 billion) or 2^63 (9 * 10^18), depending on your system.
  • Sean A.
    Sean A. about 9 years
    @badgley - what compression options are missing from netCDF when using it to write netCDF-4 files?
  • John Caron
    John Caron over 8 years
    Because HDF5 does not implement shared dimensions, there is an argument (disclaimer: by me) that you should write netCDF-4, not directly HDF5, details here: unidata.ucar.edu/blogs/developer/en/entry/dimensions_scales.
  • spinkus
    spinkus about 8 years
    It is, but its more they've tries to impose structure than dumb down - unidata.ucar.edu/software/netcdf/docs/….
  • spinkus
    spinkus about 8 years
    "can read and write most HDF5 files". No it can't. NetCDF4 use HDF5 like an application uses a filesystem. It reads and writes a specific structure imposed on HDF5 1.8
  • spinkus
    spinkus about 8 years
    @StefanoBorini Would be great if you could clarify whether your evaluation still applies to NetCDF-4/HDF5 or only earlier versions.
  • Edward Hartnett
    Edward Hartnett about 8 years
    To clarify, the netCDF-4 C library supports unsigned integers (8, 16, 32, and 64 bit). The netCDF Java library cannot create unsigned types, but can read unsigned types of size 8, 16, and 32 bits by promoting them to signed types of the next larger size. (That is, a 16-bit unsigned integer field in the netCDF file will look like a 32-bit signed field in java.) This is all due to the fact that Java does not support unsigned types.
  • Edward Hartnett
    Edward Hartnett about 8 years
    NetCDF-4 exposes almost all the features of HDF5, including compression. H5utils will work on netCDF-4 files, which are also perfectly valid HDF5 files.
  • Edward Hartnett
    Edward Hartnett about 8 years
    Parallel IO is also supported directly by Unidata's netCDF library, which uses either HDF5 or parallel-netcdf under the covers to provide parallel IO.
  • Edward Hartnett
    Edward Hartnett about 8 years
    NetCDF-4 exposes almost all HDF5 features, except for some petty obscure exceptions.
  • Edward Hartnett
    Edward Hartnett about 8 years
    NetCDF-4 can read all HDF5 files that don't use references or have circular group structure. For a full list of restrictions on HDF5 files that can be read by netCDF-4, see the FAQ: unidata.ucar.edu/software/netcdf/docs/…