How does sched_setaffinity() work?

c linux kernel multicore system-calls

10,823

Solution 1

sched_setaffinity() simply tells the scheduler which CPUs is that process/thread allowed to run on, then calls for a re-schedule.

The scheduler actually runs on each one of the CPUs, so it gets a chance to decide what task to execute next on that particular CPU.

If you're interested in how you can actually call some code on other CPUs, I suggest you take a look at smp_call_function_single(). In case we want to call something on another CPU, this calls generic_exec_single(). The latter simply adds the function to the target CPU's call queue and forces a reschedule through some IPI stuff (if the queue was empty).

Bottom line is: there no actual SMP variant of the _jmp_ instruction. Instead, code running on other CPUs cooperates in order to accomplish the task.

Solution 2

I think the thing you are not understanding is that the kernel is running on all the CPU cores. At every timer interrupt (~1000 per second), the scheduler runs on each CPU and chooses a process to run. There is no one CPU that somehow tells the others to start running a process. sched_setaffinity() works by just setting flags on the process. The scheduler reads these flags and will not run that process on its CPU if it is set not to.

Solution 3

Where, in the assembly code, are we specifying which core performs that operation?

There is no assembly involved here. Every task (thread) is assigned to a single CPU (or core in your terms) at a time. To stop running on a given CPU and resume on another, the task has to "migrate" (also this). When a task migrates from one CPU to another, the scheduler picks the CPU which is more idle among the CPUs allowed by sched_setaffinity().

There is no magic assembly instructions issued. The kernel has a more low-level view of the hardware, each CPU is a separate object, very different than how it looks like for user-space processes (in user-space, CPUs are almost invisible).

10,823

Author by

poundifdef

Updated on June 08, 2022

Comments

poundifdef about 2 years

I am trying to understand how the linux syscall sched_setaffinity() works. This is a follow-on from my question here.

I have this guide, which explains how to use the syscall and has a pretty neat (working!) example.

So I downloaded the Linux 2.6.27.19 kernel sources.

I did a 'grep' for lines containing that syscall, and I got 91 results. Not promising.

Ultimately, I'm trying to understand how the kernel is able to set the instruction pointer for a specific core (or processor.)

I am familiar with how single-core-single-thread programs work. One might issue a 'jmp foo' instruction, and this basically sets the IP to the memory address of the 'foo' label. But when one has multiple cores, one has to say "fetch the next instruction at memory address foo, and set the instruction pointer for core number 2 to begin execution there."

Where, in the assembly code, are we specifying which core performs that operation?

Back to the kernel code: what is important here? The file 'kernel/sched.c' has a function called sched_setaffinity(), but returns type "long" - which is inconsistent with its manual page. So what is important here? Which of these modules shows the assembly instructions issued? What module is reading the 'task_struct', looking at the 'cpus_allowed' member, and then translating that into an instruction? (I've also thumbed through the glibc source - but I think it just makes a call to the kernel code to accomplish this task.)
kumar about 13 years

<< At every timer interrupt (~1000 per second), the scheduler runs on each CPU and chooses a process to run.<EOF> I have question here: Scheduler is a piece of code executing in the kernel, A piece of code always executes at Single CPU at any instant.Correct? So Is that mean scheduler(invoked by timer interrupt handler) switches between the CPU's...Then Is that mean if I have more CPUs it is more overhead running scheduler on ALL CPUs....bit confused here.
osgx almost 13 years

The different CPUs runs different copies of scheduler. The timer interrupt is raised on every CPU individually. So, if you have 4 CPUs, you have a 4 schedulers. Actually code of scheduler is the same, but data is different.
fisakhan almost 4 years

This is not an answer. Its like "search it on internet".