Sun Grid Engine set memory requirements per jobs

16,095

On our cluster we use h_vmem to enforce job memory allocation.

The thing you appear to be missing is setting the available amount as a consumable resource. In qconf -mc or in the qmon complexes dialog you need to set the resource as requestable and consumable.

Then, on each host with qconf -me you need to set the amount of available memory in complex_values.

For example we have host definitions that look like:

hostname              node004
load_scaling          NONE
complex_values        h_vmem=30.9G,exclusive=1,version=3.3.0
Share:
16,095
grs
Author by

grs

Updated on September 17, 2022

Comments

  • grs
    grs almost 2 years

    I want to be able to set up memory requirements per job.

    For instance: to run 5 jobs, for which I know each will need 4 GB memory. I have 16 GB RAM on the Ubuntu server and 16 GB swap. I want to avoid using the swap. Can I do something like:

    qsub -l mem_required_for_my_job=4G job1
    qsub -l mem_required_for_my_job=4G job2
    qsub -l mem_required_for_my_job=4G job3
    qsub -l mem_required_for_my_job=4G job4
    qsub -l mem_required_for_my_job=4G job5
    

    ? The jobs will require 4 GB at some moment, but not in the beginning.

    How to tell SGE what my requirements are? How to avoid scheduling 5 x 4 GB when only 16 GB available?

    I read the user guide and tried s_vmem, h_vmem, mem_free, mem_used. None of them is what I want. I do not want my jobs to get killed in the middle of the processing. I want them not to be scheduled, unless the maximum resources needed are available.

    Can I do this?

    Thank you all!

  • grs
    grs over 13 years
    Please check the comment on the other answer.
  • grs
    grs over 13 years
    I tried with h_vmem, virtual_free and quotas. None of them carry the task in the way I want it. h_vmem protects the system, killing the jobs if they overcome their requirements. Both quota and virtual_free didn't do anything to prevent the memory exhausting. The OS killed the job in this case. I want to avoid killing the jobs. I would like to make them stay in the queue, waiting for their requested resources to be available. Is this possible?
  • grs
    grs over 13 years
    Exactly. I want to be able to allocate memory requirements per job, but do not want to kill the job if exceeds these requirements (for now). What I can't understand is how GE allocates memory. If I have 5 jobs x 4GB each in their peak, how many would run on 16 GB simultaneously, without swapping? If 4 jobs run and takes total of 10GB in their beginning, would the 5th one go in? How GE will know what is the expected memory usage peak per job?
  • Rahim
    Rahim over 13 years
    GridEngine will run exactly as many jobs as match the available complex resources. So if you define a node to have 16GB of h_vmem available, and you submit 5 jobs requesting 4GB, GridEngine will only place 4 of them on the node at once. Your job should always request the peak amount it expects to use. If you have swap space on your nodes, you can include that amount in the h_vmem complex value.
  • grs
    grs over 13 years
    Great! What arguments should I use to specify my job requirements? I believe it should be something like qsub -l mem_required=4G job1. What I have to use after the -l part?
  • Rahim
    Rahim over 13 years
    The arguments are the same as the name of the complex. So if you have h_vmem configured as a consumable resource and under a host's complex_values, your command would look like: qsub -l h_vmem=4G job1.
  • Rahim
    Rahim over 13 years
    Also here's a quick blog post from gridengine.info regarding memory limits: gridengine.info/2009/12/01/…
  • grs
    grs over 13 years
    Just to complete the discussion: if I setup s_vmem and h_vmem in my complex resources via qconf -me hostname, then I must pass both of them: qsub -l s_vmem=2G,h_vmem=3G job1. If it is just one or none, the job just quits without any sign why. So -l <arg1>,<arg2> become mandatory for everyone. Thanks!