How does the CPU knows which physical address is mapped to which virtual address?

linux linux-kernel cpu virtual-memory

6,228

Solution 1

In Linux, the kernel maintains a five-level page table (regardless of the CPU’s capabilities; superfluous levels are stripped out at compile-time). The top level is the page global directory, and each process has its own directory, pgd in mm_struct. Thus each process can have its own mappings, so address 12345 in different processes points to different physical addresses.

CPUs aren’t really aware of processes, but they do tend to have features to support them. On x86-style CPUs, there are various task-related features, but they actually tend to be ignored. Since process scheduling is managed by the kernel, it can keep track of page table changes itself, and update whatever CPU state is required to switch to a new process’s page table when it switches tasks. On x86 PCs, that involves updating the CR3 control register which points to the page directory.

The Page Table Management chapter in Mel Gorman’s Understanding the Linux Virtual Memory Manager book gives a good overview.

Solution 2

The MMU accesses a table that describes how translate virtual addresses to physical addresses. (It doesn't need to translate physical addresses to virtual addresses, and this would be impossible in general since the same physical address can be accessed via multiple virtual addresses or can be unmapped.) The layout of this table depends on the CPU architecture, but the general principle is always the same: there's a CPU register which contains the physical address of a table, which contains the physical addresses of further tables, and so on (for 2 to 4 levels total on existing architectures) until a level of tables that contains the physical addresses where the data is located. At each level, which element of the table to use is determined by some of the bits in the virtual address.

The MMU doesn't know about operating system processes as such. When the CPU switches to executing a different process, i.e. when a context switch occurs, it is the job of the operating system's context switching code to update the MMU tables as necessary. In practice, I think all Unix systems keep a copy of the tables in memory for each process, and just update the MMU register to point to the top-level table for the current process.

There is actually a part of the MMU that cares about operating system processes: the TLB. Looking up entries in the MMU table is rather costly since it involves multiple memory accesses. The TLB is a cache of these lookups. On a context switch, the operating system must invalidate the TLB (i.e. remove all cache entries), since the mapping will be different for the new process. Many architectures allow the OS to put an indicator in each MMU table entry to say “this entry belongs to process N”. A TLB entry is then skipped if the process number that it contains is not the current process number. A CPU register contains the current process number and the context switch code updates it. This mechanism means that the TLB can contain information about multiple processes at once, which improves performance when switching back and forth between these processes. Because there are often fewer bits available to store N than needed to store all OS process IDs, N is not the process ID, but a number generated by the OS for this purpose and that changes over time, if it's used at all.

Solution 3

Each process in an OS has a data-structure called PCB: https://en.wikipedia.org/wiki/Process_control_block

PCB contains - along several other things - the information of page table, memory limits, Segment table depending on memory used by the operating system. Please note, PCB is a per process data-structure, every process has one.

A virtual memory address 12345 is split into page + offset - say, 123 is page number and 45 is offset. For each process, it's page table will be consulted to find the equivalent page (called frame) in physical memory.

Keeping above points in view, even if virtual addresses are same, physical addresses will be different.

As looking up page table can be quite slow when it comes to translating each page, MMU keeps a cache for recently translated pages. Called Translation Look Aside Buffer (TLB): https://en.wikipedia.org/wiki/Translation_lookaside_buffer

6,228

user230989

Updated on September 18, 2022

Comments

user230989 almost 2 years

Based on my understanding, each process accesses memory through virtual addresses and not physical addresses, and it is the responsibility of the CPU to translate these virtual addresses to physical addresses through the MMU unit, and two or more processes can have the same virtual address.

So let's say Process A is trying to access the virtual address 12345, and also Process B is trying to access the virtual address 12345.

How will the MMU translate the virtual address of each process into a physical address, does it have a mapping table for each process that maps virtual addresses into physical addresses (because I thought that the CPU does not even know what a "process" is, and its only responsibility is to execute instructions blindly without caring which instruction belongs to which process, and a "process" is only an OS concept)?
user230989 about 7 years

Let me see if I get this, the CPU only knows about the mapping table (that maps virtual addresses into physical addresses) for the currently running process and not for all processes, the address of this mapping table is contained in the CR3 register. And when the scheduler runs, it changes the value of the CR3 register to point to the new mapping table for the new process that the scheduler chooses to run, am I correct?
Stephen Kitt about 7 years

That’s right. On x86 CPUs, setting CR3 also flushes the TLB so that addresses are recalculated correctly for the new process.