Skip to content
Enrico Fraccaroli (Galfurian) edited this page Jan 29, 2026 · 7 revisions

This page teaches you how the MentOS kernel works - the brain of the operating system.

Important: All code examples and implementation details are from the actual MentOS kernel source code. You can explore these files in kernel/ directory of the MentOS repository.

What is a Kernel?

The kernel is the core program that:

  1. Controls the hardware - CPU, RAM, disk, keyboard, screen
  2. Manages resources - Decides which program gets CPU time, memory, disk space
  3. Provides services - File reading, process creation, network communication
  4. Enforces security - Prevents programs from interfering with each other

Think of it as the traffic controller of your computer:

  • Multiple programs want CPU time → kernel schedules them
  • Multiple programs need memory → kernel allocates RAM
  • Programs want to read files → kernel accesses the disk

The kernel runs in ring 0 (privileged mode) while your programs run in ring 3 (restricted mode). Programs must ASK the kernel for help through system calls.

The Big Picture

┌──────────────────────────────────────────────────────┐
│               User Programs (ring 3)                  │
│  shell, ls, cat, editor, games, etc.                 │
└─────────────────┬────────────────────────────────────┘
                  │ System Calls (INT 0x80)
┌─────────────────┴────────────────────────────────────┐
│                   Kernel (ring 0)                     │
│                                                       │
│  ┌───────────────┐  ┌──────────────┐  ┌───────────┐ │
│  │   Process     │  │    Memory    │  │   File    │ │
│  │  Management   │  │  Management  │  │  System   │ │
│  │               │  │              │  │           │ │
│  │ • Scheduler   │  │ • Paging     │  │ • VFS     │ │
│  │ • fork/exec   │  │ • Allocators │  │ • EXT2    │ │
│  │ • Signals     │  │ • Heap mgmt  │  │ • ProcFS  │ │
│  └───────────────┘  └──────────────┘  └───────────┘ │
│                                                       │
│  ┌───────────────┐  ┌──────────────┐  ┌───────────┐ │
│  │   Drivers     │  │  Interrupts  │  │    IPC    │ │
│  │               │  │              │  │           │ │
│  │ • Keyboard    │  │ • Timer      │  │ • Pipes   │ │
│  │ • Disk (ATA)  │  │ • Syscalls   │  │ • Signals │ │
│  │ • RTC/PS2     │  │ • Exceptions │  │ • SysV IPC│ │
│  └───────────────┘  └──────────────┘  └───────────┘ │
└─────────────────┬────────────────────────────────────┘
                  │
┌─────────────────┴────────────────────────────────────┐
│              Hardware (CPU, RAM, Disk, etc.)          │
└───────────────────────────────────────────────────────┘

Each box solves a specific problem. Let's understand them one by one.

1. Process Management - "Running Multiple Programs"

The Problem: You have 1 CPU but want to run 10 programs simultaneously.

The Solution: The kernel creates the illusion of multiple CPUs by rapidly switching between programs.

How It Works

Timeline (every 10ms, timer interrupt fires):
───────────────────────────────────────────────────>
 Program A  │ Program B  │ Program A  │ Program C
 running    │ running    │ running    │ running

The scheduler decides who runs next:

  1. Timer interrupt fires (10ms elapsed)
  2. Save current program's state (registers, stack pointer)
  3. Pick next program to run (based on priority/fairness)
  4. Load new program's state
  5. Resume execution

This happens 100 times per second, so it feels seamless!

Key Data Structure: task_struct

Every running program is represented by a task_struct (see kernel/inc/process/process.h):

typedef struct task_struct {
    pid_t pid;                  // Process ID
    __volatile__ long state;    // TASK_RUNNING, TASK_STOPPED, etc.

    // Scheduling
    list_head_t run_list;
    sched_entity_t se;

    // CPU/FPU state
    thread_struct_t thread;

    // Memory
    mm_struct_t *mm;

    // Files
    vfs_file_descriptor_t *fd_list;
    int max_fd;

    // Signals
    sighand_t sighand;
    sigset_t blocked;
    sigpending_t pending;

    // Misc
    char name[TASK_NAME_MAX_LENGTH];
    char cwd[PATH_MAX];
} task_struct;

MentOS Implementation: See kernel/inc/process/process.h for full definition and kernel/src/process/ for task management code.

Real-world analogy: Think of task_struct as a "snapshot" of a program. The kernel can freeze any program, save its snapshot, load another program's snapshot, and resume it - like saving your game progress!

Scheduling Algorithms

MentOS supports multiple scheduling algorithms (configurable at build time):

1. Round-Robin (RR) - Default, simple fairness

Queue: [A, B, C]
→ Run A for 10ms
→ Run B for 10ms
→ Run C for 10ms
→ Repeat

Implementation: kernel/src/process/scheduler.c

2. Completely Fair Scheduler (CFS) - Linux-inspired

Track "virtual runtime" for each process:
  A: 100ms
  B: 150ms  ← runs more recently
  C: 80ms   ← runs longest ago

→ Pick process with lowest vruntime (C)

Implementation: kernel/src/process/scheduler.c (CFS mode)

3. Priority-based - Higher priority = more CPU

Priority queue:
  High priority (20): [Process A]
  Medium priority (10): [Process B, Process C]
  Low priority (0): [Process D]

→ Always run highest priority first

See Scheduling for details on each algorithm.

Creating Processes: fork()

The fork() syscall creates a new process:

// User program:
pid_t pid = fork();
if (pid == 0) {
    // Child process
    printf("I'm the child!\n");
} else {
    // Parent process
    printf("I created child PID %d\n", pid);
}

What happens in the kernel:

// kernel/src/process/process.c
pid_t sys_fork(pt_regs_t *f)
{
    task_struct *current = scheduler_get_current_process();
    scheduler_store_context(f, current);

    task_struct *proc = __alloc_task(current, current, current->name);
    proc->mm = mm_clone(current->mm);

    // Child returns 0
    proc->thread.regs.eax = 0;
    proc->thread.regs.eflags |= EFLAG_IF;

    // Inherit ids
    proc->sid  = current->sid;
    proc->pgid = current->pgid;
    proc->uid  = current->uid;
    proc->ruid = current->ruid;
    proc->gid  = current->gid;
    proc->rgid = current->rgid;

    scheduler_enqueue_task(proc);
    return proc->pid;
}

Key trick: Copy-on-Write (COW)

  • Child shares parent's memory pages
  • Pages marked "read-only"
  • If either writes, kernel copies the page first
  • This makes fork() fast!

2. Memory Management - "Giving Each Program Its Own Space"

The Problem: Multiple programs running, but they all need memory. How do we prevent them from overwriting each other?

The Solution: Virtual memory - each program thinks it has the entire address space (0x00000000 - 0xBFFFFFFF) to itself!

MentOS Implementation: kernel/src/mem/ - Contains paging, page tables, memory allocators

How Paging Works

Program A thinks:                Program B thinks:
0x00000000: My code             0x00000000: My code
0x10000000: My heap             0x10000000: My heap
0xBFFFFFFF: My stack            0xBFFFFFFF: My stack

But in PHYSICAL RAM:
0x00100000: Actually Program A's code
0x00200000: Actually Program B's code
0x00300000: Actually Program A's heap
0x00400000: Actually Program B's heap

The CPU's Memory Management Unit (MMU) translates virtual addresses to physical addresses using page tables.

Page Tables

Virtual Address: 0x12345678
     │
     ├─ Top 10 bits (0x048): Index into Page Directory
     │       │
     │       └──> Page Directory[0x048] → Points to Page Table
     │                  │
     ├─ Next 10 bits (0x0D1): Index into Page Table
     │       │
     │       └──> Page Table[0x0D1] → Physical Page Frame (0x00400)
     │
     └─ Bottom 12 bits (0x678): Offset within page
             │
             └──> Physical Address: (0x00400 × 4096) + 0x678 = 0x00400678

Data structures:

// Page directory (one per process)
struct page_directory {
    page_dir_entry_t entries[1024];  // Each points to a page table
};

// Page table (many per process)
struct page_table {
    page_table_entry_t pages[1024];  // Each points to a 4KB physical page
};

// Page table entry
struct page_table_entry {
    unsigned int present  : 1;   // Is page in RAM?
    unsigned int rw       : 1;   // Read/write or read-only?
    unsigned int user     : 1;   // User-accessible or kernel-only?
    unsigned int frame    : 20;  // Physical page frame number
};

Memory Allocators

The kernel has multiple memory allocators for different needs:

1. Buddy Allocator - Allocates physical pages (4KB chunks)

Free memory split into powers of 2:
  Order 0: 4KB pages
  Order 1: 8KB chunks (2 pages)
  Order 2: 16KB chunks (4 pages)
  ...
  Order 10: 4MB chunks (1024 pages)

Request 12KB?
→ Split 16KB chunk into 8KB + 8KB
→ Split 8KB into 4KB + 4KB
→ Give 8KB + 4KB = 12KB

2. Slab Allocator - Caches common object sizes

Frequently allocated objects (task_struct, file, inode):
→ Pre-allocate a "slab" of these objects
→ Allocation is just taking from cache (fast!)
→ Deallocation returns to cache (no fragmentation)

3. kmalloc/kfree - Kernel's malloc

kmalloc(1024) → Uses slab allocator for common sizes
kmalloc(100000) → Uses buddy allocator for large chunks

3. File System - "Organizing Data on Disk"

The Problem: The disk is just a giant array of bytes. How do we organize it into files and folders?

The Solution: The Virtual File System (VFS) provides an abstraction layer. Programs use open/read/write, and VFS translates to the actual filesystem (EXT2, ProcFS, etc.).

VFS Architecture

User Program:
   fd = open("/home/user/file.txt", O_RDONLY);
   read(fd, buffer, 100);
        ↓
   System Call (INT 0x80)
        ↓
VFS Layer:
   vfs_open("/home/user/file.txt")
   → Parse path: / → home → user → file.txt
   → Resolve to a filesystem object
   → Create vfs_file_t structure
   → Return file descriptor (integer)
        ↓
EXT2 Filesystem Driver:
   ext2_read(inode, buffer, offset, count)
   → Read inode's block list
   → Find physical disk blocks
   → Call disk driver
        ↓
ATA Disk Driver:
   ata_read_sectors(block_number, buffer)
   → Send commands to disk controller
   → Wait for disk to read data
   → Copy data to buffer

Key Data Structures

MentOS exposes VFS objects via vfs_file_t and related types (see kernel/inc/fs/vfs_types.h):

typedef struct vfs_file {
    char name[NAME_MAX];
    void *device;
    uint32_t mask;
    uint32_t uid;
    uint32_t gid;
    uint32_t flags;
    uint32_t ino;
    uint32_t length;
    uint32_t open_flags;
    size_t f_pos;
    vfs_file_operations_t *fs_operations;
} vfs_file_t;

typedef struct super_block {
    char name[NAME_MAX];
    char path[PATH_MAX];
    struct vfs_file *root;
    file_system_type_t *type;
} super_block_t;

typedef struct vfs_file_descriptor {
    struct vfs_file *file_struct;
    int flags_mask;
} vfs_file_descriptor_t;

See File Systems for complete details.

4. Interrupt Handling - "Responding to Events"

The Problem: Hardware needs to notify the CPU (keyboard pressed, disk finished reading, timer tick).

The Solution: Interrupts - hardware signals that pause the CPU and run a handler function.

MentOS Implementation: kernel/src/descriptor_tables/ (GDT/IDT setup) and kernel/src/hardware/ (timer/IRQ handling)

Types of Interrupts

Hardware Interrupts (IRQs):
  IRQ 0: Timer (fires every 10ms)
  IRQ 1: Keyboard
  IRQ 14/15: Disk (ATA)

Software Interrupts:
  INT 0x80: System calls (see kernel/inc/system/syscall.h)

CPU Exceptions:
  0: Divide by zero
  6: Invalid opcode
  13: General protection fault
  14: Page fault

How Interrupts Work

1. CPU executing normal code:
   mov eax, [ebx]
   add eax, 5
   ← Timer interrupt fires (IRQ 0)

2. CPU automatically:
   • Pushes current state (EFLAGS, CS, EIP) onto stack
   • Looks up handler in IDT (Interrupt Descriptor Table)
   • Jumps to handler

3. Handler runs:
   void timer_handler(pt_regs_t *regs) {
       tick_count++;
       scheduler_tick();  // Maybe switch processes
   }

4. Handler returns (IRET instruction):
   • Pops state from stack
   • Resumes interrupted code

Interrupt Descriptor Table (IDT)

struct idt_entry {
    uint16_t offset_low;    // Handler address (low 16 bits)
    uint16_t selector;      // Code segment
    uint8_t  zero;          // Reserved
    uint8_t  type_attr;     // Gate type, DPL, present
    uint16_t offset_high;   // Handler address (high 16 bits)
};

// 256 entries (0-255)
idt_entry_t idt[256];

Setting up an interrupt handler:

// kernel/src/descriptor_tables/idt.c
void idt_install_handler(uint8_t num, void (*handler)(pt_regs_t *)) {
    idt[num].offset_low  = (uint32_t)handler & 0xFFFF;
    idt[num].offset_high = ((uint32_t)handler >> 16) & 0xFFFF;
    idt[num].selector    = KERNEL_CODE_SEGMENT;
    idt[num].type_attr   = 0x8E;  // Present, ring 0, interrupt gate
}

5. Device Drivers - "Talking to Hardware"

The Problem: Each hardware device has its own interface (registers, commands, protocols).

The Solution: Device drivers - kernel modules that know how to talk to specific hardware.

Example: Keyboard Driver

// kernel/src/drivers/keyboard.c

void keyboard_init(void) {
    // Install interrupt handler for IRQ 1
    install_interrupt_handler(IRQ_KEYBOARD, keyboard_handler);
    enable_irq(IRQ_KEYBOARD);
}

void keyboard_handler(pt_regs_t *regs) {
    // Read scan code from keyboard controller
    uint8_t scancode = inb(KEYBOARD_DATA_PORT);  // I/O port 0x60
    
    // Translate scancode to ASCII
    char key = scancode_to_ascii(scancode);
    
    // Add to keyboard buffer
    keyboard_buffer_push(key);
}

Example: Disk Driver (ATA)

// kernel/src/drivers/ata.c

void ata_read_sector(uint32_t lba, void *buffer) {
    // Send LBA (Logical Block Address) to disk
    outb(ATA_PORT_LBA_LOW, lba & 0xFF);
    outb(ATA_PORT_LBA_MID, (lba >> 8) & 0xFF);
    outb(ATA_PORT_LBA_HIGH, (lba >> 16) & 0xFF);
    
    // Send READ command
    outb(ATA_PORT_COMMAND, ATA_CMD_READ);
    
    // Wait for disk ready
    while (!(inb(ATA_PORT_STATUS) & ATA_STATUS_DRQ));
    
    // Read 512 bytes from data port
    insw(ATA_PORT_DATA, buffer, 256);  // Read 256 words (512 bytes)
}

Kernel Initialization Sequence

When the bootloader jumps to the kernel, here's what happens:

// kernel/src/kernel.c
void kmain(multiboot_info_t *mboot_info)
{
    // 1. Video
    video_init();
    printf("MentOS starting...\n");
    
    // 2. Descriptor tables
    gdt_init();    // Global Descriptor Table (memory segments)
    idt_init();    // Interrupt Descriptor Table
    
    // 3. Memory
    paging_init(mboot_info);       // Enable virtual memory
    buddy_init();                  // Physical page allocator
    slab_init();                   // Object allocator
    
    // 4. Interrupts
    pic_init();                    // Programmable Interrupt Controller
    timer_init(100);               // Timer at 100 Hz
    keyboard_init();               // Keyboard driver
    
    // 5. File system
    vfs_init();                    // Virtual File System
    ext2_init();                   // EXT2 driver
    procfs_init();                 // /proc filesystem
    
    // 6. Mount root filesystem
    vfs_mount("/", rootfs_image);
    
    // 7. Scheduler
    scheduler_init();
    
    // 8. Create init process (PID 1)
    task_struct *init = create_process("/bin/init");
    scheduler_add_task(init);
    
    // 9. Enable interrupts
    sti();
    
    // 10. Idle loop
    while (1) {
        hlt();  // Wait for interrupt
    }
}

Key Kernel Concepts

1. Kernel vs User Mode

The x86 CPU has privilege levels (rings 0-3). MentOS uses:

  • Ring 0 - Kernel mode (full access)
  • Ring 3 - User mode (restricted)

Switching happens on syscalls, interrupts, and exceptions.

2. Preemptive Multitasking

The timer interrupt preempts (forcibly pauses) the running program:

Program A running
    ↓
Timer interrupt (10ms elapsed)
    ↓
Kernel scheduler picks Program B
    ↓
Program A is paused (saved state)
Program B resumes

This prevents any program from hogging the CPU.

3. Synchronization

Multiple parts of the kernel might access shared data:

// BAD: Race condition!
if (free_list_head != NULL) {
    // ← Interrupt here could corrupt free_list!
    node = free_list_head;
    free_list_head = node->next;
}

// GOOD: Use spinlock
spinlock_lock(&free_list_lock);
if (free_list_head != NULL) {
    node = free_list_head;
    free_list_head = node->next;
}
spinlock_unlock(&free_list_lock);

4. System Call Mechanism

// User calls read():
read(fd, buf, 100);

// Libc wrapper:
_syscall3(read, int, fd, void *, buf, size_t, count) {
    eax = SYSCALL_NUMBER_READ;  // 3
    ebx = fd;
    ecx = buf;
    edx = count;
    INT 0x80;  // ← Switch to kernel mode
    return eax;
}

// Kernel:
void syscall_handler(pt_regs_t *regs) {
    uint32_t syscall_num = regs->eax;
    switch (syscall_num) {
    case 3:  // read
        regs->eax = sys_read(regs->ebx, regs->ecx, regs->edx);
        break;
    }
}

Exploring the Kernel Code

Start here:

  1. kernel/src/kernel.c - kmain() function (initialization)
  2. kernel/src/process/scheduler.c - How processes are scheduled
  3. kernel/src/mem/paging.c - Virtual memory implementation
  4. kernel/src/system/syscall.c - System call dispatcher
  5. kernel/src/drivers/ - Device drivers (keyboard, disk, etc.)

Key directories:

kernel/
├── src/
│   ├── kernel.c           # Main kernel initialization
│   ├── process/           # Process management and scheduling
│   │   ├── scheduler.c    # CPU scheduling algorithms
│   │   ├── fork.c         # Process creation
│   │   └── wait.c         # Process synchronization
│   ├── mem/               # Memory management
│   │   ├── paging.c       # Virtual memory (page tables)
│   │   ├── zone.c         # Physical memory zones
│   │   └── slab.c         # Object allocator
│   ├── fs/                # File systems
│   │   ├── vfs.c          # Virtual File System
│   │   ├── ext2.c         # EXT2 filesystem
│   │   └── namei.c        # Path resolution
│   ├── drivers/           # Device drivers
│   │   ├── keyboard.c     # Keyboard driver
│   │   ├── ata.c          # Disk driver
│   │   └── video.c        # VGA text mode
│   ├── system/            # System calls
│   │   ├── syscall.c      # Syscall dispatcher
│   │   └── signal.c       # Signal handling
│   └── descriptor_tables/ # CPU tables
│       ├── gdt.c          # Global Descriptor Table
│       ├── idt.c          # Interrupt Descriptor Table
│       └── isr.c          # Interrupt Service Routines

Further Reading


Key takeaway: The kernel is not magic - it's code that manages hardware and provides services. Start with one subsystem (e.g., scheduler) and trace through how it works! .extern isr_handler .extern irq_handler

// Register handler interrupt_handler_install(vector, handler_func, flags)


### Device Drivers

Located in `kernel/src/drivers/`

**Implemented Drivers:**

- **Keyboard** - PS/2 keyboard input handling
- **ATA** - IDE/ATA disk driver for reading sectors
- **RTC** (Real-Time Clock) - System time and calendar
- **Video** - VGA text mode display (80x25 console)

**Driver Interface:**

```c
typedef struct device_driver {
    const char *name;
    int (*probe)(void);         // Detect hardware
    int (*init)(void);          // Initialize
    void (*interrupt)(void);    // Handle interrupt
} device_driver_t;

I/O and Debugging

Located in kernel/inc/io/ and kernel/src/io/

Kernel Logging:

  • 8 log levels (0=EMERGENCY to 7=DEBUG)
  • Per-file log level control
  • Debug macros: pr_err(), pr_warn(), pr_notice(), pr_debug()

Available Log Levels:

#define LOGLEVEL_EMERGENCY  0  // System unusable
#define LOGLEVEL_ALERT      1  // Immediate action required
#define LOGLEVEL_CRITICAL   2  // Critical conditions
#define LOGLEVEL_ERROR      3  // Error conditions
#define LOGLEVEL_WARNING    4  // Warning conditions
#define LOGLEVEL_NOTICE     5  // Normal but significant
#define LOGLEVEL_INFO       6  // Informational
#define LOGLEVEL_DEBUG      7  // Debug information

Debug Functions:

  • dbg_print_*() - Print various data structures
  • panic() - Kernel panic with message

System Calls

Located in kernel/inc/system/ and kernel/src/system/

Over 60 system calls implemented across categories:

  • Process Management - fork, exec, exit, wait, getpid, setpgid, signals
  • File Operations - open, close, read, write, lseek, stat, chmod
  • Memory - brk, mmap, munmap
  • IPC - semget, semop, msgget, msgsnd, msgrcv, shmget, shmat
  • Filesystem - mkdir, rmdir, unlink, symlink, readlink
  • Timing - time, sleep, timer operations
  • User/Group - getuid, setuid, getgid, setgid
  • Misc - ioctl, fcntl, uname, reboot

See System Calls page for complete reference.

Initialization Sequence

1. Boot Assembly (stack.S)
   ├─ Set up minimal stack
   └─ Jump to kernel_main()

2. kernel_main() (kernel.c)
   ├─ Parse multiboot info
   ├─ Initialize GDT
   ├─ Initialize IDT
   ├─ Enable paging
   ├─ Initialize memory allocators
   ├─ Initialize VFS
   ├─ Initialize scheduler
   ├─ Initialize devices
   ├─ Load initial programs
   └─ Enable interrupts

3. Scheduler takes over
   └─ Execute init process

Context Switching

The scheduler performs context switching through:

  1. Store Context - Save current process registers to task_struct
  2. Select Next - Choose next process based on scheduling algorithm
  3. Restore Context - Load new process registers from its task_struct
  4. Switch CR3 - Load new page directory for memory mapping
void scheduler_store_context(pt_regs_t *f, task_struct *process) {
    // Save CPU state from interrupt frame to task
    process->thread.esp0 = f->esp;
    process->thread.ss0 = f->ss;
    // ... save other registers
}

void scheduler_run(pt_regs_t *f) {
    // Store current process context
    scheduler_store_context(f, current);
    
    // Select next process
    current = select_next_process();
    
    // Switch memory context
    paging_switch_page_directory(current->mm->pgdir);
    
    // Restore registers from interrupt frame
    f->esp = current->thread.esp0;
    f->eip = current->thread.eip;
    // ... restore other registers
}

Exception Handling

The kernel handles CPU exceptions like page faults, general protection faults, etc.

Example: Page Fault Handler

void page_fault_handler(pt_regs_t *regs) {
    unsigned long addr;
    asm("mov %%cr2, %0" : "=r" (addr));
    
    // Check if fault address is in a valid vm_area
    vm_area_t *vma = find_vma(current, addr);
    if (!vma) {
        // Invalid access - send SIGSEGV signal
        send_sig(SIGSEGV, current);
    } else {
        // Allocate page on demand
        allocate_page_on_fault(vma, addr);
    }
}

Process Lifecycle

1. Process Creation (fork)
   ├─ Allocate new task_struct
   ├─ Copy parent's memory (with COW)
   ├─ Copy file descriptors
   ├─ Copy signal handlers
   └─ Add to scheduler runqueue

2. Process Execution (exec)
   ├─ Load ELF binary
   ├─ Clear old memory maps
   ├─ Set up new address space
   └─ Jump to entry point

3. Process Termination (exit)
   ├─ Close file descriptors
   ├─ Free memory
   ├─ Notify parent (SIGCHLD)
   ├─ Reparent children to init
   └─ Remove from scheduler

Key Design Patterns

Further Reading


Key takeaway: The kernel is not magic - it's code that manages hardware and provides services. Start with one subsystem (e.g., scheduler) and trace through how it works!

Clone this wiki locally