Understanding Page Tables in xv6 on RISC-V
Introduction to Virtual Memory
Virtual memory is a memory management technique that provides an abstraction layer between memory addresses used by a program (virtual addresses) and the physical memory addresses in RAM (physical addresses). In modern operating systems like xv6, virtual memory serves several critical purposes:
- Process isolation: Each process has its own address space, isolated from other processes
- Memory protection: The OS can control which memory regions are accessible to a process
- Overcommitment: The system can allocate more virtual memory than physically available
- Simplified memory model: Programs can use a contiguous address space regardless of physical memory fragmentation
How Page Tables Work: A General Overview
The Concept of Pages and Frames
In virtual memory systems, both virtual and physical memory are divided into fixed-size blocks:
- Pages: Fixed-size blocks of virtual memory (typically 4KB)
- Frames: Fixed-size blocks of physical memory (same size as pages)
Page tables perform the crucial task of mapping virtual pages to physical frames.
Basic Page Table Structure
At its most basic level, a page table is simply an array that serves as a lookup table:
- The index into the array is derived from the virtual page number
- The value stored at that index contains the physical frame number
For example, in a system with 4KB pages, a 32-bit virtual address could be divided into:
- Upper 20 bits: Virtual Page Number (VPN)
- Lower 12 bits: Offset within the page (2^12 = 4096 bytes)
Similarly, a physical address would consist of:
- Upper bits: Physical Frame Number (PFN)
- Lower 12 bits: Same offset within the frame
Address Translation Process
When a program accesses memory using a virtual address, the following happens:
- The MMU extracts the virtual page number from the virtual address
- It looks up the VPN in the page table to find the corresponding physical frame
- It combines the physical frame number with the original offset to form the physical address
- The physical address is used to access the actual memory location
Virtual Address: [Virtual Page Number | Offset]
|
v
Page Table
|
v
Physical Address: [Physical Frame Number | Offset]
Multi-level Page Tables
Single-level page tables become impractical for large address spaces because:
- They require a very large contiguous area of memory
- Most of the table entries might never be used
Multi-level page tables solve this problem by dividing the page table into a hierarchy:
- The top-level page table contains pointers to second-level page tables
- The second-level tables contain pointers to third-level tables, and so on
- The lowest-level tables contain the actual physical frame numbers
This hierarchical approach allows the OS to allocate page tables on demand, saving memory for unused portions of the address space.
Two-Level Page Table (32-bit Virtual Address)
In a 32-bit architecture with 4KB pages, a typical two-level page table might divide the address as follows:
- 10 bits for Level 1 index (addressing 1024 entries in the top page table)
- 10 bits for Level 0 index (addressing 1024 entries in the second-level page table)
- 12 bits for page offset (4KB pages)
flowchart LR
VA["Virtual Address<br>[31:22 | 21:12 | 11:0]"] --> L1["L1 Index<br>[31:22]"]
VA --> L0["L0 Index<br>[21:12]"]
VA --> OFF["Page Offset<br>[11:0]"]
L1 -->|"Index into"| PT1["Level 1 Page Table<br>(Root Page Table)"]
PT1 -->|"Entry points to"| PT0["Level 0 Page Table"]
L0 -->|"Index into"| PT0
PT0 -->|"Entry contains"| PFN["Physical Frame Number"]
PFN -->|"Combined with"| OFF
OFF -->|"Forms"| PA["Physical Address<br>[PFN | Page Offset]"]
class VA,PA highlight
Three-Level Page Table (RISC-V Sv39 - 39-bit Virtual Address)
In the RISC-V Sv39 scheme (as used in xv6), the 39-bit virtual address is divided as:
- 9 bits for Level 2 index (bits 30-38)
- 9 bits for Level 1 index (bits 21-29)
- 9 bits for Level 0 index (bits 12-20)
- 12 bits for page offset (bits 0-11)
flowchart LR
VA["Virtual Address<br>[38:30 | 29:21 | 20:12 | 11:0]"] --> L2["L2 Index<br>[38:30]"]
VA --> L1["L1 Index<br>[29:21]"]
VA --> L0["L0 Index<br>[20:12]"]
VA --> OFF["Page Offset<br>[11:0]"]
L2 -->|"Index into"| PT2["Level 2 Page Table<br>(Root Page Table)"]
PT2 -->|"Entry points to"| PT1["Level 1 Page Table"]
L1 -->|"Index into"| PT1
PT1 -->|"Entry points to"| PT0["Level 0 Page Table"]
L0 -->|"Index into"| PT0
PT0 -->|"Entry contains"| PFN["Physical Frame Number"]
PFN -->|"Combined with"| OFF
OFF -->|"Forms"| PA["Physical Address<br>[PFN | Page Offset]"]
class VA,PA highlight
Each page table in this hierarchy is itself a page (4KB) containing 512 entries (each entry is 64 bits or 8 bytes, and 512 * 8 = 4096 bytes). The physical address stored in a PTE is the address of the next-level page table (for non-leaf entries) or the actual physical frame (for leaf entries).
Page Table Entries (PTEs)
Each entry in a page table typically contains:
- The physical frame number
- Permission bits (read, write, execute)
- Status bits (valid, dirty, accessed)
- Other control information
The valid bit is particularly important - it indicates whether the mapping is valid. If this bit is not set, the MMU generates a page fault, which the OS can handle (e.g., by loading data from disk).
Translation Lookaside Buffer (TLB)
To speed up the translation process, modern CPUs include a Translation Lookaside Buffer (TLB), which caches recent virtual-to-physical address translations:
- Before checking the page table, the MMU first looks in the TLB
- If the translation is found (TLB hit), it’s used directly, avoiding the page table lookup
- If not found (TLB miss), the full page table walk is performed, and the result is cached in the TLB
When the OS modifies the page tables, it must invalidate the affected TLB entries to prevent incorrect translations.
Page Faults and Memory Management
When a program tries to access an invalid page (one not currently mapped to physical memory), a page fault occurs:
- The MMU raises a page fault exception
- The OS takes control and determines the cause of the fault
- The OS might load data from disk, allocate a new frame, or terminate the process (segmentation fault)
- If the memory access can be satisfied, the OS updates the page table and returns control to the program
This mechanism enables virtual memory’s powerful features like demand paging, memory-mapped files, and copy-on-write.
The Relationship Between Virtual and Physical Address Spaces
Below are diagrams illustrating how virtual and physical address spaces relate in a system like xv6.
User Virtual Address Space to Physical Memory
flowchart TD
subgraph "User Virtual Address Space"
VA_Code["Code (Text)<br>0x0000 - 0x1FFF"]
VA_Data["Data<br>0x2000 - 0x3FFF"]
VA_Heap["Heap<br>0x4000 - 0x5FFF"]
VA_Stack["Stack<br>0x6000 - 0x7FFF"]
VA_High["Trapframe & Trampoline<br>0xFFFF..."]
end
subgraph "Page Tables"
PT["Multi-level Page Table<br>(maps VPN to PFN)"]
end
subgraph "Physical Memory"
PM_Frame1["Frame 1<br>0x80002000 - 0x80002FFF"]
PM_Frame2["Frame 2<br>0x80005000 - 0x80005FFF"]
PM_Frame3["Frame 3<br>0x80008000 - 0x80008FFF"]
PM_Frame4["Frame 4<br>0x8000A000 - 0x8000AFFF"]
PM_Frame5["Frame 5<br>0x8000F000 - 0x8000FFFF"]
PM_High["Kernel-reserved Frames<br>..."]
end
VA_Code --> PT
VA_Data --> PT
VA_Heap --> PT
VA_Stack --> PT
VA_High --> PT
PT --> PM_Frame4
PT --> PM_Frame1
PT --> PM_Frame3
PT --> PM_Frame5
PT --> PM_High
User and Kernel Virtual Address Spaces to Physical Memory
flowchart TD
subgraph "User Virtual Address Space"
UVA_Low["User Code/Data/Heap<br>0x0000 - 0x7FFF"]
UVA_High["Trap Mechanisms<br>TRAMPOLINE"]
end
subgraph "Kernel Virtual Address Space"
KVA_DevMem["Device Memory<br>UART, VIRTIO, PLIC"]
KVA_KernelCode["Kernel Code<br>KERNBASE - etext"]
KVA_KernelData["Kernel Data<br>etext - PHYSTOP"]
KVA_Trampoline["Trampoline<br>TRAMPOLINE"]
end
subgraph "Physical Memory"
Phys_DevMem["Device Memory<br>0x10000000 - ..."]
Phys_KernelCode["Kernel Code<br>0x80000000 - ..."]
Phys_KernelData["Kernel Data<br>..."]
Phys_UserPages["User Pages<br>Scattered throughout RAM"]
Phys_Trampoline["Trampoline Code<br>..."]
end
UVA_Low --> UserPT["User Page Table"]
UVA_High --> UserPT
KVA_DevMem --> KernelPT["Kernel Page Table"]
KVA_KernelCode --> KernelPT
KVA_KernelData --> KernelPT
KVA_Trampoline --> KernelPT
UserPT --> Phys_UserPages
UserPT --> Phys_Trampoline
KernelPT --> Phys_DevMem
KernelPT --> Phys_KernelCode
KernelPT --> Phys_KernelData
KernelPT --> Phys_UserPages
KernelPT --> Phys_Trampoline
%% Note that kernel mappings are largely identity mappings
classDef kernelMappings fill:#f9f,stroke:#333,stroke-width:2px
class KVA_DevMem,KVA_KernelCode,KVA_KernelData kernelMappings
The RISC-V architecture, on which xv6 runs, implements virtual memory through a hardware mechanism called the Memory Management Unit (MMU). The MMU translates virtual addresses to physical addresses using data structures called page tables.
RISC-V Sv39 Page Table Architecture
xv6 uses RISC-V’s Sv39 virtual memory scheme, which provides a 39-bit virtual address space. This means it can address up to 512GB of virtual memory (2^39 bytes).
Virtual Address Structure
In the Sv39 scheme, a 64-bit virtual address is structured as follows:
63 39 38 30 29 21 20 12 11 0
+-------+---------------+---------------+---------------+---------------+
| Unused| L2 Index | L1 Index | L0 Index | Offset |
| (must | (9 bits) | (9 bits) | (9 bits) | (12 bits) |
| be 0) | | | | |
+-------+---------------+---------------+---------------+---------------+
- Bits 0-11: 12-bit page offset (4KB pages)
- Bits 12-20: 9-bit Level-0 page table index
- Bits 21-29: 9-bit Level-1 page table index
- Bits 30-38: 9-bit Level-2 page table index
- Bits 39-63: Must be all zeros (or all ones, sign-extended from bit 38)
xv6 defines this structure in the vm.c
file:
// The risc-v Sv39 scheme has three levels of page-table
// pages. A page-table page contains 512 64-bit PTEs.
// A 64-bit virtual address is split into five fields:
// 39..63 -- must be zero.
// 30..38 -- 9 bits of level-2 index.
// 21..29 -- 9 bits of level-1 index.
// 12..20 -- 9 bits of level-0 index.
// 0..11 -- 12 bits of byte offset within the page.
Page Table Structure
RISC-V Sv39 uses a three-level page table structure. Each page table is exactly one page (4KB) in size and contains 512 page table entries (PTEs):
- The L2 (top level) page table contains 512 PTEs, each pointing to an L1 page table
- Each L1 page table contains 512 PTEs, each pointing to an L0 page table
- Each L0 page table contains 512 PTEs, each pointing to a physical page
This hierarchical structure allows for efficient memory usage since page tables can be allocated only when needed.
Page Table Entries (PTEs)
Each page table entry in RISC-V is a 64-bit value that either points to the next level page table or to a physical page (leaf entry). The format of a PTE is:
63 54 53 28 27 19 18 10 9 8 7 6 5 4 3 2 1 0
+---------------+------------+------------+------------+-----+-+-+-+-+-+-+-+-+
| Reserved | PPN[2] | PPN[1] | PPN[0] | RSW |D|A|G|U|X|W|R|V|
+---------------+------------+------------+------------+-----+-+-+-+-+-+-+-+-+
- Bits 0-9: Flags
- V (0): Valid bit. If 0, the PTE is invalid
- R (1): Read permission
- W (2): Write permission
- X (3): Execute permission
- U (4): User mode access
- G (5): Global mapping (ignored in xv6)
- A (6): Accessed (ignored in xv6)
- D (7): Dirty (ignored in xv6)
- RSW (8-9): Reserved for supervisor software (unused in xv6)
- Bits 10-53: Physical Page Number (PPN) - the physical address divided by 4096
- PPN[0] (bits 10-18): L0 physical page number
- PPN[1] (bits 19-27): L1 physical page number
- PPN[2] (bits 28-53): L2 physical page number
In xv6, the PTE flags are defined in riscv.h
:
#define PTE_V (1L << 0) // valid
#define PTE_R (1L << 1) // readable
#define PTE_W (1L << 2) // writable
#define PTE_X (1L << 3) // executable
#define PTE_U (1L << 4) // user can access
xv6 also provides macros to convert between physical addresses and PTEs:
#define PA2PTE(pa) ((((uint64)pa) >> 12) << 10)
#define PTE2PA(pte) (((pte) >> 10) << 12)
#define PTE_FLAGS(pte) ((pte) & 0x3FF)
Page Table Operations in xv6
Walking Page Tables
The walk
function in vm.c
is central to page table operations. It navigates the page table hierarchy to find the PTE for a given virtual address:
pte_t *
walk(pagetable_t pagetable, uint64 va, int alloc)
{
// Check if virtual address is valid
if(va >= MAXVA)
panic("walk");
// Walk through page table levels
for(int level = 2; level > 0; level--) {
// Get PTE index for this level
pte_t *pte = &pagetable[PX(level, va)];
// If PTE is valid, get next level page table
if(*pte & PTE_V) {
pagetable = (pagetable_t)PTE2PA(*pte);
} else {
// If PTE is invalid and we're not allocating, return null
if(!alloc || (pagetable = (pde_t*)kalloc()) == 0)
return 0;
// Initialize new page table page
memset(pagetable, 0, PGSIZE);
*pte = PA2PTE(pagetable) | PTE_V;
}
}
// Return pointer to final level PTE
return &pagetable[PX(0, va)];
}
This function is used for:
- Finding an existing mapping
- Creating new page table levels when needed (if
alloc
is true) - Returning the address of the PTE so it can be modified
Creating Mappings
The mappages
function creates mappings from virtual to physical addresses:
int
mappages(pagetable_t pagetable, uint64 va, uint64 size, uint64 pa, int perm)
{
uint64 a, last;
pte_t *pte;
// Ensure addresses are page-aligned
if((va % PGSIZE) != 0)
panic("mappages: va not aligned");
if((size % PGSIZE) != 0)
panic("mappages: size not aligned");
a = va;
last = va + size - PGSIZE;
for(;;){
// Find the PTE for virtual address 'a'
if((pte = walk(pagetable, a, 1)) == 0)
return -1;
if(*pte & PTE_V)
panic("mappages: remap");
// Create the mapping
*pte = PA2PTE(pa) | perm | PTE_V;
if(a == last)
break;
a += PGSIZE;
pa += PGSIZE;
}
return 0;
}
This function maps a range of virtual addresses to a range of physical addresses, setting the specified permissions.
Freeing Page Tables
The freewalk
function recursively frees a page table:
void
freewalk(pagetable_t pagetable)
{
// There are 512 PTEs in a page table
for(int i = 0; i < 512; i++){
pte_t pte = pagetable[i];
if((pte & PTE_V) && (pte & (PTE_R|PTE_W|PTE_X)) == 0){
// This PTE points to a lower-level page table
uint64 child = PTE2PA(pte);
freewalk((pagetable_t)child);
pagetable[i] = 0;
} else if(pte & PTE_V){
panic("freewalk: leaf");
}
}
kfree((void*)pagetable);
}
This function first frees all child page tables (recursively) and then frees the page table itself. Before freeing a user page table, xv6 uses uvmunmap
to free the mapped physical pages.
Kernel vs. User Page Tables
xv6 maintains separate page tables for the kernel and each user process.
Kernel Page Table
The kernel page table is set up during boot by kvminit()
which calls kvmmake()
:
pagetable_t
kvmmake(void)
{
pagetable_t kpgtbl;
kpgtbl = (pagetable_t) kalloc();
memset(kpgtbl, 0, PGSIZE);
// Map hardware devices
kvmmap(kpgtbl, UART0, UART0, PGSIZE, PTE_R | PTE_W);
kvmmap(kpgtbl, VIRTIO0, VIRTIO0, PGSIZE, PTE_R | PTE_W);
kvmmap(kpgtbl, PLIC, PLIC, 0x4000000, PTE_R | PTE_W);
// Map kernel text as read-only and executable
kvmmap(kpgtbl, KERNBASE, KERNBASE, (uint64)etext-KERNBASE, PTE_R | PTE_X);
// Map kernel data and physical RAM
kvmmap(kpgtbl, (uint64)etext, (uint64)etext, PHYSTOP-(uint64)etext, PTE_R | PTE_W);
// Map trampoline for trap entry/exit
kvmmap(kpgtbl, TRAMPOLINE, (uint64)trampoline, PGSIZE, PTE_R | PTE_X);
// Map kernel stacks
proc_mapstacks(kpgtbl);
return kpgtbl;
}
The kernel page table uses direct mapping (also called identity mapping) for most addresses, meaning virtual addresses map to identical physical addresses. This simplifies kernel memory management.
User Page Tables
Each process has its own page table, created by uvmcreate()
:
pagetable_t
uvmcreate()
{
pagetable_t pagetable;
pagetable = (pagetable_t) kalloc();
if(pagetable == 0)
return 0;
memset(pagetable, 0, PGSIZE);
return pagetable;
}
User page tables map the process’s virtual address space to physical memory, with the typical layout:
- Starting at virtual address 0: text (code)
- Followed by data and bss segments
- Fixed-size stack
- Expandable heap
- Special mappings at high addresses:
- TRAPFRAME: Stores process context during traps
- TRAMPOLINE: Code that transitions between user and kernel mode
Hardware Page Table Support
The RISC-V hardware supports page tables through several mechanisms:
SATP Register
The Supervisor Address Translation and Protection (satp) register controls the MMU. It contains:
- Mode (SV39)
- ASID (Address Space ID, unused in xv6)
- Physical address of the root page table
xv6 sets up this register using:
#define SATP_SV39 (8L << 60)
#define MAKE_SATP(pagetable) (SATP_SV39 | (((uint64)pagetable) >> 12))
static inline void
w_satp(uint64 x)
{
asm volatile("csrw satp, %0" : : "r" (x));
}
void
kvminithart()
{
// Wait for any previous writes to page table memory to finish
sfence_vma();
// Load kernel page table
w_satp(MAKE_SATP(kernel_pagetable));
// Flush stale TLB entries
sfence_vma();
}
TLB Management
The Translation Lookaside Buffer (TLB) caches virtual-to-physical address translations. When the page tables change, xv6 must invalidate the TLB using the sfence_vma
instruction:
static inline void
sfence_vma()
{
// The zero, zero means flush all TLB entries
asm volatile("sfence.vma zero, zero");
}
Context Switching and Page Tables
When the OS switches between processes, it must also switch the active page table by updating the satp
register. This is done in the swtch
function and involves:
- Saving the current process state
- Updating the
satp
register to use the new process’s page table - Flushing the TLB to remove stale entries
Conclusion
xv6’s implementation of page tables on RISC-V is a clean, educational example of modern virtual memory systems. The three-level page table structure enables efficient memory management through:
- On-demand allocation of page tables
- Fine-grained memory protection
- Isolation between processes
- Efficient translation from virtual to physical addresses
The design choices in xv6 prioritize simplicity and clarity over optimization, making it an excellent reference for understanding virtual memory concepts that apply to many modern operating systems.