What does Module Load do?
Solution 1
I am assuming you are logging into some university computing clusters. At Stanford, we have a system that uses the module command to load different programs as you are describing.
Basically, the module command modifies your environment so that the path and other variables are set so that you can use a program such as gcc, matlab, or mathematica. To see some of the changes, run env
to see your environment variables, then run module load matlab
or some other available package, and then run env
again to see the updated variables.
I'm not sure of the details, but you can try
module help
or visit http://www.tacc.utexas.edu/tacc-projects/lmod for more information
Solution 2
The terribly generically named project that provides module load
is the Tcl-based "Environment Modules" project: http://modules.sourceforge.net/ At least the source has moved to GitHub https://github.com/cea-hpc/modules under an official CEA account, which is a respected French atomic energy research organization, so it's a good sign.
That project basically exists to allow users in cluster systems to "install" specific versions software without sudo.
Its packages work by modifying environment variables such as PATH
and LD_LIBRARY_PATH
which are made to point to pre-compiled versions of the software you want to use, which are then made available to all nodes through some mount such as NFS.
The Environment Modules system is useful in cluster systems that rely on not allowing users to have sudo to control their resource usage and security.
This is the case for example of Platform LSF which relies on Linux' sched_setaffinity
to control CPU usage: https://stackoverflow.com/questions/1006289/how-to-find-out-the-number-of-cpus-using-python/55423170#55423170
A more modern alternative to such cluster systems are containers such as Docker, which allow users to have sudo inside their own somewhat isolated virtualized systems. This means that users can more readily install what they want from the web, whereas in a module system you generally have to ask organization internal admins to package things for you one by one, which is more costly.
Note that is generally considered however that Docker is not a suitable mechanism against breakout to run fully untrusted code: https://security.stackexchange.com/questions/107850/docker-as-a-sandbox-for-untrusted-code so you have to decide the sensitivity of your information, and how much you trust the software providers. But at the very least admins who whitelist those things will only have to whitelist source/projects rather than port software themselves.
Since it is less virtualized, the module setup is potentially faster, but I don't have benchmarks. As usual, unless you are Uber experienced and know there will be a bottleneck, I'd start with the easiest to implement solution (Docker), and then do a comparison run if it seems slower than it should be.
Related videos on Youtube
Kraken
Updated on September 18, 2022Comments
-
Kraken almost 2 years
What exactly module load does? Is it basically that instead of going through the directory her
I have seen people do
module load gcc/versionNumber
etc. -
elessartelkontar over 2 yearsIt is not precisely to install... In HPC applications, you need to set the environment variables pointing to compilers and libraries available in the cluster. The managing team installs these libraries. You may ask them to install other libraries, and then the corresponding environment modules are created to set the environment when loaded. Therefore, when one compiles a large climate model, one doesn't need to set all the environment by hand but only follow a recipe that directs you to load the proper modules, compile and run.
-
elessartelkontar over 2 yearsAt least for complex climate models, I will also see it difficult to do it with the container approach.