how to restrict the concurrent running map tasks?

16,109

Solution 1

You can set the value of mapred.jobtracker.maxtasks.per.job to something other than -1 (the default). This limits the number of simultaneous map or reduce tasks a job can employ.

This variable is described as:

The maximum number of tasks for a single job. A value of -1 indicates that there is no maximum.

I think there were plans to add mapred.max.maps.per.node and mapred.max.reduces.per.node to job configs, but they never made it to release.

Solution 2

If you are using Hadoop 2.7 or newer, you can use mapreduce.job.running.map.limit and mapreduce.job.running.reduce.limit to restrict map and reduce tasks at each job level.

Fix JIRA ticket.

Solution 3

mapred.tasktracker.map.tasks.maximum is the property to restrict the number of map tasks that can run at a time. Have it configured in your mapred-site.xml.

Refer 2.7 in http://wiki.apache.org/hadoop/FAQ

Share:
16,109
HaiWang
Author by

HaiWang

Updated on June 04, 2022

Comments

  • HaiWang
    HaiWang almost 2 years

    My hadoop version is 1.0.2. Now I want at most 10 map tasks running at the same time. I have found 2 variable related to this question.

    a) mapred.job.map.capacity

    but in my hadoop version, this parameter seems abandoned.

    b) mapred.jobtracker.taskScheduler.maxRunningTasksPerJob (http://grepcode.com/file/repo1.maven.org/maven2/com.ning/metrics.collector/1.0.2/mapred-default.xml)

    I set this variable like below:

    Configuration conf = new Configuration();
    conf.set("date", date);
    conf.set("mapred.job.queue.name", "hadoop");
    conf.set("mapred.jobtracker.taskScheduler.maxRunningTasksPerJob", "10");
    
    DistributedCache.createSymlink(conf);
    Job job = new Job(conf, "ConstructApkDownload_" + date);
    ...
    

    The problem is that it doesn't work. There is still more than 50 maps running as the job starts.

    After looking through the hadoop document, I can't find another to limit the concurrent running map tasks. Hope someone can help me ,Thanks.

    =====================

    I hava found the answer about this question, here share to others who may be interested.

    Using the fair scheduler, with configuration parameter maxMaps to set the a pool's maximum concurrent task slots, in the Allocation File (fair-scheduler.xml). Then when you submit jobs, just set the job's queue to the according pool.

  • HaiWang
    HaiWang over 11 years
    I think this variable is controling the number of map tasks that ran "in one tasktracker", not "in one job".
  • HaiWang
    HaiWang over 11 years
    this parameter's description: The maximum number of map tasks that will be run simultaneously by a task tracker.
  • Jim Pivarski
    Jim Pivarski almost 11 years
    @HaiWang: From my reading of your original question, mapred.tasktracker.map.tasks.maximum solves the problem: it doesn't control the total number of mappers but the number of mappers that are run concurrently. Thus, it doesn't affect the logic or granularity of the job, but the rate at which resources are used. I had the same problem, and this parameter worked for me (easier than setting up a fair scheduler).
  • Asaf Mesika
    Asaf Mesika about 10 years
    He is talking about limiting the number of concurrent running map tasks and not total amount of map tasks
  • Todd Owen
    Todd Owen over 7 years
    Deprecated in Hadoop 2.7.2, replaced with mapreduce.jobtracker.maxtasks.perjob (ref)
  • Todd Owen
    Todd Owen over 7 years
    Also, reading the description carefully, I'm not sure this is the limit on simultaneous tasks. It may actually be a limit on the total tasks. There is another property mapreduce.jobtracker.taskscheduler.maxrunningtasks.perjob described as "The maximum number of running tasks for a job before it gets preempted."