What does Module Load do?

18,001

Solution 1

I am assuming you are logging into some university computing clusters. At Stanford, we have a system that uses the module command to load different programs as you are describing.

Basically, the module command modifies your environment so that the path and other variables are set so that you can use a program such as gcc, matlab, or mathematica. To see some of the changes, run env to see your environment variables, then run module load matlab or some other available package, and then run env again to see the updated variables.

I'm not sure of the details, but you can try

module help

or visit http://www.tacc.utexas.edu/tacc-projects/lmod for more information

Solution 2

The terribly generically named project that provides module load is the Tcl-based "Environment Modules" project: http://modules.sourceforge.net/ At least the source has moved to GitHub https://github.com/cea-hpc/modules under an official CEA account, which is a respected French atomic energy research organization, so it's a good sign.

That project basically exists to allow users in cluster systems to "install" specific versions software without sudo.

Its packages work by modifying environment variables such as PATH and LD_LIBRARY_PATH which are made to point to pre-compiled versions of the software you want to use, which are then made available to all nodes through some mount such as NFS.

The Environment Modules system is useful in cluster systems that rely on not allowing users to have sudo to control their resource usage and security.

This is the case for example of Platform LSF which relies on Linux' sched_setaffinity to control CPU usage: https://stackoverflow.com/questions/1006289/how-to-find-out-the-number-of-cpus-using-python/55423170#55423170

A more modern alternative to such cluster systems are containers such as Docker, which allow users to have sudo inside their own somewhat isolated virtualized systems. This means that users can more readily install what they want from the web, whereas in a module system you generally have to ask organization internal admins to package things for you one by one, which is more costly.

Note that is generally considered however that Docker is not a suitable mechanism against breakout to run fully untrusted code: https://security.stackexchange.com/questions/107850/docker-as-a-sandbox-for-untrusted-code so you have to decide the sensitivity of your information, and how much you trust the software providers. But at the very least admins who whitelist those things will only have to whitelist source/projects rather than port software themselves.

Since it is less virtualized, the module setup is potentially faster, but I don't have benchmarks. As usual, unless you are Uber experienced and know there will be a bottleneck, I'd start with the easiest to implement solution (Docker), and then do a comparison run if it seems slower than it should be.

Share:
18,001

Related videos on Youtube

Kraken
Author by

Kraken

Updated on September 18, 2022

Comments

  • Kraken
    Kraken almost 2 years

    What exactly module load does? Is it basically that instead of going through the directory her

    I have seen people do module load gcc/versionNumber etc.

  • elessartelkontar
    elessartelkontar over 2 years
    It is not precisely to install... In HPC applications, you need to set the environment variables pointing to compilers and libraries available in the cluster. The managing team installs these libraries. You may ask them to install other libraries, and then the corresponding environment modules are created to set the environment when loaded. Therefore, when one compiles a large climate model, one doesn't need to set all the environment by hand but only follow a recipe that directs you to load the proper modules, compile and run.
  • elessartelkontar
    elessartelkontar over 2 years
    At least for complex climate models, I will also see it difficult to do it with the container approach.