How to fix symbol lookup error: undefined symbol errors in a cluster environment

149,076

After two dozens of comments to understand the situation, it was found that the libhdf5.so.7 was actually a symlink (with several levels of indirection) to a file that was not shared between the queued processes and the interactive processes. This means even though the symlink itself lies on a shared filesystem, the contents of the file do not and as a result the process was seeing different versions of the library.

For future reference: other than checking LD_LIBRARY_PATH, it's always a good idea to check a library with nm -D to see if the symbols actually exist. In this case it was found that they do exist in interactive mode but not when run in the queue. A quick md5sum revealed that the files were actually different.

Share:
149,076
agnussmcferguss
Author by

agnussmcferguss

Updated on July 12, 2022

Comments

  • agnussmcferguss
    agnussmcferguss almost 2 years

    I'm working on some python code that extracts some image data from an ECW file using GDAL (http://www.gdal.org/) and its python bindings. GDAL was built from source to have ECW support.

    The program is run on a cluster server that I ssh into. I have tested the program through the ssh terminal and it runs fine. However, I would now like to submit a job to the cluster using qsub, but it reports the following:

    Traceback (most recent call last):
      File "./gdal-test.py", line 5, in <module>
        from osgeo import gdal
      File "/home/h3/ctargett/.local/lib/python2.6/site-packages/GDAL-1.11.1-py2.6-linux-x86_64.egg/osgeo/__init__.py", line 21, in <module>
        _gdal = swig_import_helper()
      File "/home/h3/ctargett/.local/lib/python2.6/site-packages/GDAL-1.11.1-py2.6-linux-x86_64.egg/osgeo/__init__.py", line 17, in swig_import_helper
        _mod = imp.load_module('_gdal', fp, pathname, description)
    ImportError: /mnt/aeropix/prgs/.local/lib/libgdal.so.1: undefined symbol: H5Eset_auto2
    

    I did a bit more digging and tried using LD_DEBUG=symbols to try and work out where the difference was, but that's about as far as my knowledge/understanding has got me.

    For reference, here's what happens with LD_DEBUG=symbols and running the code in the ssh terminal (piping through grep H5Eset_auto2 to reduce some of the output):

    Symbol debug output for code running in ssh terminal:

     11359: symbol=H5Eset_auto2;  lookup in file=/usr/bin/python26 [0]
     11359: symbol=H5Eset_auto2;  lookup in file=/usr/lib64/libpython2.6.so.1.0 [0]
     11359: symbol=H5Eset_auto2;  lookup in file=/lib64/libpthread.so.0 [0]
     11359: symbol=H5Eset_auto2;  lookup in file=/lib64/libdl.so.2 [0]
     11359: symbol=H5Eset_auto2;  lookup in file=/lib64/libutil.so.1 [0]
     11359: symbol=H5Eset_auto2;  lookup in file=/lib64/libm.so.6 [0]
     11359: symbol=H5Eset_auto2;  lookup in file=/lib64/libc.so.6 [0]
     11359: symbol=H5Eset_auto2;  lookup in file=/lib64/ld-linux-x86-64.so.2 [0]
     11359: symbol=H5Eset_auto2;  lookup in file=/home/h3/ctargett/.local/lib/python2.6/site-packages/GDAL-1.11.1-py2.6-linux-x86_64.egg/osgeo/_gdal.so [0]
     11359: symbol=H5Eset_auto2;  lookup in file=/usr/lib64/libpython2.6.so.1.0 [0]
     11359: symbol=H5Eset_auto2;  lookup in file=/mnt/aeropix/prgs/.local/lib/libgdal.so.1 [0]
     11359: symbol=H5Eset_auto2;  lookup in file=/usr/lib64/libstdc++.so.6 [0]
     11359: symbol=H5Eset_auto2;  lookup in file=/lib64/libm.so.6 [0]
     11359: symbol=H5Eset_auto2;  lookup in file=/lib64/libgcc_s.so.1 [0]
     11359: symbol=H5Eset_auto2;  lookup in file=/lib64/libpthread.so.0 [0]
     11359: symbol=H5Eset_auto2;  lookup in file=/lib64/libc.so.6 [0]
     11359: symbol=H5Eset_auto2;  lookup in file=/lib64/libdl.so.2 [0]
     11359: symbol=H5Eset_auto2;  lookup in file=/lib64/libutil.so.1 [0]
     11359: symbol=H5Eset_auto2;  lookup in file=/mnt/aeropix/prgs/.local/lib/libhdf5.so.7 [0]
     11359: symbol=H5Eset_auto2;  lookup in file=/usr/bin/python26 [0]
     11359: symbol=H5Eset_auto2;  lookup in file=/usr/lib64/libpython2.6.so.1.0 [0]
     11359: symbol=H5Eset_auto2;  lookup in file=/lib64/libpthread.so.0 [0]
     11359: symbol=H5Eset_auto2;  lookup in file=/lib64/libdl.so.2 [0]
     11359: symbol=H5Eset_auto2;  lookup in file=/lib64/libutil.so.1 [0]
     11359: symbol=H5Eset_auto2;  lookup in file=/lib64/libm.so.6 [0]
     11359: symbol=H5Eset_auto2;  lookup in file=/lib64/libc.so.6 [0]
     11359: symbol=H5Eset_auto2;  lookup in file=/lib64/ld-linux-x86-64.so.2 [0]
     11359: symbol=H5Eset_auto2;  lookup in file=/home/h3/ctargett/.local/lib/python2.6/site-packages/GDAL-1.11.1-py2.6-linux-x86_64.egg/osgeo/_gdal.so [0]
     11359: symbol=H5Eset_auto2;  lookup in file=/usr/lib64/libpython2.6.so.1.0 [0]
     11359: symbol=H5Eset_auto2;  lookup in file=/mnt/aeropix/prgs/.local/lib/libgdal.so.1 [0]
     11359: symbol=H5Eset_auto2;  lookup in file=/usr/lib64/libstdc++.so.6 [0]
     11359: symbol=H5Eset_auto2;  lookup in file=/lib64/libm.so.6 [0]
     11359: symbol=H5Eset_auto2;  lookup in file=/lib64/libgcc_s.so.1 [0]
     11359: symbol=H5Eset_auto2;  lookup in file=/lib64/libpthread.so.0 [0]
     11359: symbol=H5Eset_auto2;  lookup in file=/lib64/libc.so.6 [0]
     11359: symbol=H5Eset_auto2;  lookup in file=/lib64/libdl.so.2 [0]
     11359: symbol=H5Eset_auto2;  lookup in file=/lib64/libutil.so.1 [0]
     11359: symbol=H5Eset_auto2;  lookup in file=/mnt/aeropix/prgs/.local/lib/libhdf5.so.7 [0]
    

    Symbol debug output for code submitted using qsub:

     16915: symbol=H5Eset_auto2;  lookup in file=/usr/bin/python26 [0]
     16915: symbol=H5Eset_auto2;  lookup in file=/usr/lib64/libpython2.6.so.1.0 [0]
     16915: symbol=H5Eset_auto2;  lookup in file=/lib64/libpthread.so.0 [0]
     16915: symbol=H5Eset_auto2;  lookup in file=/lib64/libdl.so.2 [0]
     16915: symbol=H5Eset_auto2;  lookup in file=/lib64/libutil.so.1 [0]
     16915: symbol=H5Eset_auto2;  lookup in file=/lib64/libm.so.6 [0]
     16915: symbol=H5Eset_auto2;  lookup in file=/lib64/libc.so.6 [0]
     16915: symbol=H5Eset_auto2;  lookup in file=/lib64/ld-linux-x86-64.so.2 [0]
     16915: symbol=H5Eset_auto2;  lookup in file=/home/h3/ctargett/.local/lib/python2.6/site-packages/GDAL-1.11.1-py2.6-linux-x86_64.egg/osgeo/_gdal.so [0]
     16915: symbol=H5Eset_auto2;  lookup in file=/usr/lib64/libpython2.6.so.1.0 [0]
     16915: symbol=H5Eset_auto2;  lookup in file=/mnt/aeropix/prgs/.local/lib/libgdal.so.1 [0]
     16915: symbol=H5Eset_auto2;  lookup in file=/usr/lib64/libstdc++.so.6 [0]
     16915: symbol=H5Eset_auto2;  lookup in file=/lib64/libm.so.6 [0]
     16915: symbol=H5Eset_auto2;  lookup in file=/lib64/libgcc_s.so.1 [0]
     16915: symbol=H5Eset_auto2;  lookup in file=/lib64/libpthread.so.0 [0]
     16915: symbol=H5Eset_auto2;  lookup in file=/lib64/libc.so.6 [0]
     16915: symbol=H5Eset_auto2;  lookup in file=/lib64/libdl.so.2 [0]
     16915: symbol=H5Eset_auto2;  lookup in file=/lib64/libutil.so.1 [0]
     16915: symbol=H5Eset_auto2;  lookup in file=/mnt/aeropix/prgs/.local/lib/libhdf5.so.7 [0]
     16915: symbol=H5Eset_auto2;  lookup in file=/usr/lib64/libjpeg.so.62 [0]
     16915: symbol=H5Eset_auto2;  lookup in file=/usr/lib64/libpng12.so.0 [0]
     16915: symbol=H5Eset_auto2;  lookup in file=/usr/lib64/libpq.so.4 [0]
     16915: symbol=H5Eset_auto2;  lookup in file=/usr/lib64/libcurl.so.3 [0]
     16915: symbol=H5Eset_auto2;  lookup in file=/usr/lib64/libgssapi_krb5.so.2 [0]
     16915: symbol=H5Eset_auto2;  lookup in file=/usr/lib64/libkrb5.so.3 [0]
     16915: symbol=H5Eset_auto2;  lookup in file=/usr/lib64/libk5crypto.so.3 [0]
     16915: symbol=H5Eset_auto2;  lookup in file=/lib64/libcom_err.so.2 [0]
     16915: symbol=H5Eset_auto2;  lookup in file=/usr/lib64/libidn.so.11 [0]
     16915: symbol=H5Eset_auto2;  lookup in file=/lib64/libssl.so.6 [0]
     16915: symbol=H5Eset_auto2;  lookup in file=/lib64/libcrypto.so.6 [0]
     16915: symbol=H5Eset_auto2;  lookup in file=/mnt/aeropix/prgs/.local/lib/libNCSEcw.so.0 [0]
     16915: symbol=H5Eset_auto2;  lookup in file=/mnt/aeropix/prgs/.local/lib/libNCSEcwC.so.0 [0]
     16915: symbol=H5Eset_auto2;  lookup in file=/mnt/aeropix/prgs/.local/lib/libNCSCnet.so.0 [0]
     16915: symbol=H5Eset_auto2;  lookup in file=/mnt/aeropix/prgs/.local/lib/libNCSUtil.so.0 [0]
     16915: symbol=H5Eset_auto2;  lookup in file=/lib64/librt.so.1 [0]
     16915: symbol=H5Eset_auto2;  lookup in file=/usr/lib64/libxml2.so.2 [0]
     16915: symbol=H5Eset_auto2;  lookup in file=/mnt/aeropix/prgs/.local/lib/libz.so.1 [0]
     16915: symbol=H5Eset_auto2;  lookup in file=/lib64/ld-linux-x86-64.so.2 [0]
     16915: symbol=H5Eset_auto2;  lookup in file=/lib64/libcrypt.so.1 [0]
     16915: symbol=H5Eset_auto2;  lookup in file=/lib64/libresolv.so.2 [0]
     16915: symbol=H5Eset_auto2;  lookup in file=/lib64/libnsl.so.1 [0]
     16915: symbol=H5Eset_auto2;  lookup in file=/usr/lib64/libkrb5support.so.0 [0]
     16915: symbol=H5Eset_auto2;  lookup in file=/lib64/libkeyutils.so.1 [0]
     16915: symbol=H5Eset_auto2;  lookup in file=/lib64/libselinux.so.1 [0]
     16915: symbol=H5Eset_auto2;  lookup in file=/lib64/libsepol.so.1 [0]
     16915: /mnt/aeropix/prgs/.local/lib/libgdal.so.1: error: symbol lookup error: undefined symbol: H5Eset_auto2 (fatal)
    ImportError: /mnt/aeropix/prgs/.local/lib/libgdal.so.1: undefined symbol: H5Eset_auto2
    

    I guess I'm not sure why it seems to stop looking in libgdal.so.1 when submitted using qsub, when it continues to look when just run in the terminal. I also note that the qsub job is able to correctly locate libhdf5.so.7 (which is where it should find H5Eset_auto2 ) as it can find a different symbol, H5Eprint :

     16915: symbol=H5Eprint;  lookup in file=/usr/lib64/libpython2.6.so.1.0 [0]
     16915: symbol=H5Eprint;  lookup in file=/mnt/aeropix/prgs/.local/lib/libgdal.so.1 [0]
     16915: symbol=H5Eprint;  lookup in file=/usr/lib64/libstdc++.so.6 [0]
     16915: symbol=H5Eprint;  lookup in file=/lib64/libm.so.6 [0]
     16915: symbol=H5Eprint;  lookup in file=/lib64/libgcc_s.so.1 [0]
     16915: symbol=H5Eprint;  lookup in file=/lib64/libpthread.so.0 [0]
     16915: symbol=H5Eprint;  lookup in file=/lib64/libc.so.6 [0]
     16915: symbol=H5Eprint;  lookup in file=/lib64/libdl.so.2 [0]
     16915: symbol=H5Eprint;  lookup in file=/lib64/libutil.so.1 [0]
     16915: symbol=H5Eprint;  lookup in file=/mnt/aeropix/prgs/.local/lib/libhdf5.so.7 [0]
    

    Any pointers on this would be incredibly useful at this stage (I hope that's enough information - I'm more than happy to provide more information, I'm just not sure what else might be useful at this stage).

    EDIT:

    It seems that the contents of /usr/bin are different for jobs submitted using qsub (specifically libtool is missing). This is being investigated.

  • agnussmcferguss
    agnussmcferguss over 9 years
    Also note, when following symlinks, remember to keep following them until you've reached the end.
  • mbroshi
    mbroshi about 6 years
    The "For future reference" bit was great advice that helped me solve my problem. Amazing how specific this question is, yet given all the upvotes, how ubiquitously it applies.
  • daemondave
    daemondave over 5 years
    e.g. nm -D /usr/lib/x86_64-linux-gnu/mesa/libGL.so.1 | grep _glapi_tls_Dispatch