Double Faults #1005

utterances-bot · 2018-07-03T06:42:55Z

utterances-bot
Jul 3, 2018

This is a general purpose comment thread for the “Double Faults” post.

pbn4 · 2018-07-03T06:42:55Z

pbn4
Jul 3, 2018

Awesome stuff, waiting for more :)

0 replies

phil-opp · 2018-07-05T11:08:24Z

phil-opp
Jul 5, 2018
Maintainer

@pbn4 Thank you :)

0 replies

mtn · 2018-07-08T01:50:12Z

mtn
Jul 8, 2018

This is awesome -- thank you for the amazing work @phil-opp!

I had a newbie question: I understand why these handlers are useful in preventing everything from ending up in a restart loop, but differences would an actual implementation have? For example, if a user is running a program that results in a page fault in their shell, the handler must be reporting that back to the application so it can surface it to the user, right?

0 replies

phil-opp · 2018-07-09T12:34:52Z

phil-opp
Jul 9, 2018
Maintainer

@mtn Thank you!

I understand why these handlers are useful in preventing everything from ending up in a restart loop, but differences would an actual implementation have?

Depends on the OS implementation and the fault type. For example, if the exception is caused because a userspace process tried to execute a privileged instruction, the kernel would simply kill the process (and the shell would report to the user that the process was killed).

For a page fault, the kernel can react in multiple ways. If it's just an out of bound access to unmapped memory (like we do in the blog post), the kernel would kill the user program with a segmentation fault. However, most operating systems have a mechanism called swapping, where parts of the memory are moved to disk when the main memory becomes too full. Then a legitimate memory access could cause a page fault because the accessed data is no longer in memory. The OS can handle this page fault by loading the contents of the memory page from disk and continuing the interrupted process. This technique is called demand paging and allows to run programs that wouldn't fit completely into memory.

0 replies

yjhmelody · 2018-07-20T04:31:45Z

yjhmelody
Jul 20, 2018

Yeah! That you showed for us is what we have learned from OS course.

0 replies

montao · 2018-07-21T07:56:39Z

montao
Jul 21, 2018

Thanks for the post, Phil. I am going to catch up on your posts now that I have completed a B.Sc. in technology. I am going to continue for a master in computer science and your material is very helpful for understanding operating systems.

Small typo maybe: You spell it 0xdeadbeaf but the common spelling is usually 0xdeadbeef isn't it?

0 replies

phil-opp · 2018-07-23T09:10:13Z

phil-opp
Jul 23, 2018
Maintainer

@montao Congratulations on your degree!

Small typo maybe: You spell it 0xdeadbeaf but the common spelling is usually 0xdeadbeef isn't it?

Thanks! Fixed in f551116.

0 replies

ghost · 2018-08-13T19:19:47Z

ghost
Aug 13, 2018

Where should I go next after completing your tutorial ? I am neither a beginner nor an expert in Rust but I am interested in OS development and have followed your tutorial thoroughly, I would like to proceed further

Thanks for making this series :)

0 replies

robert-w-gries · 2018-08-13T23:16:16Z

robert-w-gries
Aug 13, 2018

@siddharthsymphony The OSDev Wiki is one of the best online resources for OS development. If you want more theoretical knowledge, take a look at Modern Operating Systems by Andrew Tanenbaum

0 replies

bacharSalleh · 2018-08-14T10:41:32Z

bacharSalleh
Aug 14, 2018

@siddharthsymphony I'm very interesting in System Development, any resources you recommend beside The OSDev Wiki and Modern Operating Systems by Andrew Tanenbaum ?

0 replies

tnargy · 2018-08-21T02:17:53Z

tnargy
Aug 21, 2018

@phil-opp I’ve really enjoyed your blog! The way you express the concepts makes it easy for me to follow. Are you still planning on handling interrupts from external devices in next post?

0 replies

phil-opp · 2018-08-21T12:06:26Z

phil-opp
Aug 21, 2018
Maintainer

@tnargy Thank you! Yes, the next post will be about hardware interrupts. It will explore the programmable interrupt controller, timer interrupts, and keyboard interrupts. I already created a first draft that you can preview here (it's still work in progress).

0 replies

Ben-PH · 2018-09-20T15:51:39Z

Ben-PH
Sep 20, 2018

Awesome stuff.

What was your source for learning this? If I wanted to contribute to the next posts while I'm having a go at them, do you have any recommended reading for that?

0 replies

phil-opp · 2018-10-07T17:45:01Z

phil-opp
Oct 7, 2018
Maintainer

@Ben-PH Thanks!

I don't have a single source. It's a mix of what I learned at university, the OSDev wiki, Wikipedia, the Intel/AMD manuals, and various other resources. If you're looking for a book about the fundamentional of operating systems, I can recommenend the free Three Easy Pieces.

0 replies

lilyball · 2018-11-12T06:05:00Z

lilyball
Nov 12, 2018

Typo: becaues

0 replies

ferbass · 2020-09-28T15:07:37Z

ferbass
Sep 28, 2020

Hey @phil-opp thanks again for the great tutorial.

About the tests, I understand stack overflows are undefined behavior we should run it using the release mode, but I think we can add the optimization-level to the test profile and run the tests without use release mode.

I did some tests on my end and I realize if we add the opt-level = 1 in the profile.test section of Cargo.toml for basic optimizations as described in here https://doc.rust-lang.org/cargo/reference/profiles.html#opt-level we are able to run the tests without force release mode.

I added this change to my repo you can check it out if you want https://github.com/ferbass/gat_os/pull/1/files#diff-80398c5faae3c069e4e6aa2ed11b28c0R27

Do you think this is a valid setup to use or should we avoid opt-level for profile.test?

Thank you in advanced

--
ferbass

0 replies

Qubasa · 2020-10-12T23:29:29Z

Qubasa
Oct 12, 2020

Hi @phil-opp,

I have trouble understanding why a kernel code segment is needed.
I read that some instructions check for the permission level in the cs register.
That's why you need a kernel cs and a user cs.
What I do not understand is if it used for something else besides that?
The cs register can only map to 4GB Ram, so what if I address the 5th GB?
Is this not important because the cs register is only used for CPL validation?
Also I guess it's needed to transition from real mode to long mode in the bootloader?

Adding a little footnote in the blog would be nice I think.

Thanks in advance, and keep up the great work! :-)

0 replies

GuillaumeDIDIER · 2020-10-13T13:26:41Z

GuillaumeDIDIER
Oct 13, 2020

In 64-bit mode, segmentation is mostly deactivated, apart from the Privilege level bits. In 16bit and 32 bit mode however, Segmentation is mandatory and correct cs ss and ds segments are usually needed, with correct bits (but often a 0 base address anyway).

0 replies

phil-opp · 2020-10-21T20:23:03Z

phil-opp
Oct 21, 2020
Maintainer

@ferbass

About the tests, I understand stack overflows are undefined behavior we should run it using the release mode, but I think we can add the optimization-level to the test profile and run the tests without use release mode.

Let me clarify my above comments a bit: Stack overflows on the main kernel stack are not undefined behavior because the bootloader creates a special unmapped page called guard page at the bottom ot this stack. Thus, a stack overflow results in a page fault and no memory is corrupted.

The problem is/was that the double fault stack that we create in this post doesn't have such a guard page yet (we will improve this in a future post). Thus, a stack overflow is undefined behavior as it overwrites other data that might still be needed. While compiling with optimizations reduces stack size and can thus avoid these stack overflows in some cases, this is merely a workaround and not a valid solution to the problem. Instead, the double fault stack should still be large enough to work in debug mode too. For this reason I increased the stack size for the double fault stack, so that stack overflows should no longer occur even in debug mode, provided that you keep the double fault handler minimal.

It's important to note that this problem is not exclusive to test. It can also occur on a normal execution, e.g. if we accidentally write a function with endless recursion. Since we don't want any undefined behavior in this case, even when running in debug mode, the double fault stack should be large enough for this. So changing the optimization level for tests is not a good solution for this problem because if a test fails in debug mode, a normal cargo run in debug mode might fail in the same way.

Do you think this is a valid setup to use or should we avoid opt-level for profile.test?

In general, I don't think that changing the test optimization level is problematic. For example, it might be a valid way to speed up a test suite in some cases. However, the program/kernel/etc should still work in debug mode, so optimizing the tests only to avoid some runtime problems is not a good idea.

0 replies

phil-opp · 2020-10-21T20:29:42Z

phil-opp
Oct 21, 2020
Maintainer

@luis-hebendanz As @GuillaumeDIDIER said, segmentation is mostly deactivated in 64-bit mode. The x86_64 architecture still requires a code segment for historical purposes, even though most of its content is ignored. It is still used for specifiying the privilege level and and for putting the CPU in 64-bit mode.

0 replies

slinkydeveloper · 2021-01-03T11:33:39Z

slinkydeveloper
Jan 3, 2021

Hi @phil-opp, thanks fo this amazing guide!

I was wondering why stack size is fixed to 4096 * 5 and then I just stumbled upon this:

The fix is simple: Increase the stack size of the double fault stack by adjusting the STACK_SIZE constant in the gdt.rs. For example, set it to const STACK_SIZE: usize = 4096 * 5. It's also worth noting that running in --release mode also works with the smaller stack size because it is more optimized then. (You need to ensure that stack_overflow method is not optimized to a loop in --release mode, e.g. by doing a volatile read in it.)

I think it would be nice if you could add a comment about that stack size in the post 😄

0 replies

ghost · 2021-05-03T11:39:45Z

ghost
May 3, 2021

I think crate::interrupts::init_idt should be unsafe, it now depends on an entry in the IST that's not there if crate::gdt::init is not called, allowing access to uninitialized memory in safe code.

0 replies

d0ntrash · 2021-05-30T15:43:36Z

d0ntrash
May 30, 2021

I followed the post up to the point where the basic double fault handler is implemented. Adding the double_fault_handler function does not change the behavior when triggering the page fault. The kernel still ends up in a boot loop.

I saw that you pushed some changes a few days ago. Might this be related? I am using the version 0.14.2 of x86_64.

PS. Thanks for this great blog. I learned a lot so far!

0 replies

laokz · 2021-08-15T12:51:20Z

laokz
Aug 15, 2021 — with giscus

Hi @phil-opp, thank you for the great blogs!

A compiler error. It complained on src/interrupts.rs .set_stack_index(gdt::DOUBLE_FAULT_IST_INDEX); "error[E0433]: failed to resolve: use of undeclared crate or module gdt". When prefixed crate::, done.

Best regards!

0 replies

nohenry · 2021-11-23T22:37:48Z

nohenry
Nov 23, 2021

When running this on my machine using QEMU, I kept getting double faults after the breakpoint handler ran. I fixed it by setting the segment descriptor register to zero: x86_64::instructions::segmentation::SS::set_reg(gdt::SegmentSelector{0: 0});

7 replies

Defmc Oct 10, 2023

Is SegmentSelector::NULL safe to use? I was having the same problem, but I don't know if this solution will be a problem in the future.

nohenry Oct 10, 2023

I don't exactly remember the context of the blog post. I'm not sure why only a kernel code segment and TSS segment are being set in the GDT. You would normally also need a kernel data segment then, set the data segment selector registers to the index of this data segment.

I'm frankly surprised my original solution worked for this (but I also don't even remember the problem nor positing this). So to answer your question, using SegmentSelector::NULL is probably not exactly correct here.

bjorn3 Oct 10, 2023

x86_64 doesn't really support segmentation anymore. You can still have fs and gs relative pointers, but all other segments are hard coded to start at 0 and cover the entire memory.

Defmc Oct 10, 2023

Thanks for yours (fast) replies. Currently, I'm setting in this way (yeah, I took a look in your RustKernel, @Ocrap7):

lazy_static::lazy_static! {
    static ref GDT: (GlobalDescriptorTable, SegSelectors) = {
        let mut gdt = GlobalDescriptorTable::new();
        let kcode_seg = gdt.add_entry(Descriptor::kernel_code_segment());
        let kdata_seg = gdt.add_entry(Descriptor::kernel_data_segment());
        gdt.add_entry(Descriptor::UserSegment(0));
        gdt.add_entry(Descriptor::user_code_segment());
        gdt.add_entry(Descriptor::user_data_segment());

        let tss_seg = gdt.add_entry(Descriptor::tss_segment(&TSS));
        (gdt, SegSelectors::new(kcode_seg, kdata_seg, tss_seg))
    };
}
    
pub fn init() {
    info!("loading gdt");
    GDT.0.load();
    info!("loaded gdt");

    info!("\tsetting registers for gdt");
    unsafe {
// GDT.1.set_segmentations() = 
        use instructions::{segmentation, tables};
        segmentation::CS::set_reg(GDT.1.kcode);
        segmentation::DS::set_reg(GDT.1.kdata);
        segmentation::ES::set_reg(GDT.1.kdata);
        segmentation::FS::set_reg(GDT.1.kdata); // FIXME: unnecessary
        segmentation::GS::set_reg(GDT.1.kdata); // FIXME: unnecessary
        segmentation::SS::set_reg(GDT.1.kdata);
        tables::load_tss(GDT.1.tss);
    }
    okay!("\tsetted registers for gdt");
    okay!("finished gdt initialization");
}

I don't exactly remember the context of the blog post. I'm not sure why only a kernel code segment and TSS segment are being set in the GDT. [...]

Looks like a breaking change: https://github.com/rust-osdev/bootloader/blob/7c8e2ca63449f92cd1f7494b9cdc9fcf58b7375d/docs/migration/v0.9.md?plain=1#L41

Anyway. Now I can continue after three days racking my brain for a one line. 😆

nohenry Oct 10, 2023

Awesome, glad you were able to figure it out.

begugla1 · 2023-11-11T16:14:26Z

begugla1
Nov 11, 2023 — with giscus

Hello, Phil. Thank you for great tutorial! This is really cool stuff, it's much interesting than writing high level code. Thanks to you, i have enough knowledges to write school project about OSs. All the best!

0 replies

0m-a-D · 2024-02-17T06:36:49Z

0m-a-D
Feb 17, 2024 — with giscus

Why does the TSS struct have Privilege Stack Table with three entries:
Privilege Stack Table: [u64; 3]

Doesn't it have 4 privilege levels? (0-kernel, {1,2}-device drivers, 3-user)

0 replies

DanielCoder834 · 2024-09-08T01:22:27Z

DanielCoder834
Sep 8, 2024

Hi, thank you so much for making this tutorial with so much information as it has helped me learn about OS logic for the first time.

When I ran cargo run, the compiler recommended to make this change:
let stack_start = VirtAddr::from_ptr(unsafe { &STACK });
to this:
let stack_start = VirtAddr::from_ptr(unsafe { addr_of!(STACK) });.
Would you recommend making this change? Also why is the compiler recommending to use this macro versus just using the reference to the STACK?
Thank you again.

1 reply

bjorn3 Sep 8, 2024

Also why is the compiler recommending to use this macro versus just using the reference to the STACK?

Creating a shared reference asserts that the STACK will not be modified through a pointer derived from this shared reference, and additionally will invalidate the shared reference when something else modifies it. addr_of!() directly creates a raw pointer and thus avoids this assertion.

Would you recommend making this change?

Yes!

Sidray-Infinity · 2024-11-13T03:17:28Z

Sidray-Infinity
Nov 13, 2024 — with giscus

I tried the following scenario:

Throw a breakpoint exception
Enable only page fault exception

lazy_static! {
    static ref IDT: InterruptDescriptorTable = {
        let mut idt = InterruptDescriptorTable::new();
        // idt.breakpoint.set_handler_fn(breakpoint_handler);
        // idt.double_fault.set_handler_fn(double_fault_handler);
        idt.page_fault.set_handler_fn(page_fault_handler);
        idt
    };
}

According to this

If a breakpoint exception occurs and the corresponding handler function is swapped out, a page fault occurs and the page fault handler is invoked.
If a page fault occurs and the page fault handler is swapped out, a double fault occurs and the double fault handler is invoked.

Page fault handler should be called, but rather it goes into restart mode. Also if I enable the double fault handler, then I am able to catch it. Am I missing something here?

3 replies

Sidray-Infinity Nov 13, 2024 — with giscus

am able to catch it
Catch the double fault

Sidray-Infinity Nov 13, 2024 — with giscus

extern  "x86-interrupt" fn page_fault_handler(
    stack_frame: InterruptStackFrame, _error_code: PageFaultErrorCode) {
    println!("EXCEPTION: PAGE FAULT\n{:#?} err_code:{:#?}", stack_frame, _error_code);
}

nohenry Nov 13, 2024

Careful, the text says "swapped out" meaning that the handler is present in the IDT, but the corresponding handler code is not available (swapped out to disk, or simply unmapped).

In your case, when a breakpoint exception is raised, the cpu will try to enter the breakpoint handler, but since it doesn't exist, a "Segment not present" exception will try to be raised (caused by missing idt gate), but since you also don't have this exception handler present, a double fault is raised.

You could make this work by manually setting the breakpoint gate descriptor address in the IDT to an arbitrary value that is not mapped to anything, then raising a breakpoint exception.
lmk if this works (or if any of this makes sense), Im just going off what I remember :)

Mrgoblings · 2024-11-25T17:21:57Z

Mrgoblings
Nov 25, 2024 — with giscus

in x86_64 package version 0.15.x the method add_entry is i think changed to append. Should i continue with version 0.14.x or i can follow with 0.15.x with only this change in mind?

1 reply

tsatke Nov 25, 2024

I get about 25 compile errors when I try to go from 0.14 to 0.15, and for some reason my preemptive multitasking breaks (haven't looked into it that much yet). If you want to follow the tutorial, go with 0.14, no harm in that and it'll make your life easier. You can always upgrade to 0.15 later.

Double Faults #1005

Replies: 84 comments · 12 replies

phil-opp Jul 5, 2018 Maintainer

phil-opp Jul 9, 2018 Maintainer

phil-opp Jul 23, 2018 Maintainer

phil-opp Aug 21, 2018 Maintainer

phil-opp Oct 7, 2018 Maintainer

phil-opp Oct 21, 2020 Maintainer

phil-opp Oct 21, 2020 Maintainer

laokz Aug 15, 2021 — with giscus

begugla1 Nov 11, 2023 — with giscus

0m-a-D Feb 17, 2024 — with giscus

Sidray-Infinity Nov 13, 2024 — with giscus

Sidray-Infinity Nov 13, 2024 — with giscus

Sidray-Infinity Nov 13, 2024 — with giscus

Mrgoblings Nov 25, 2024 — with giscus

Replies: 84 comments 12 replies

phil-opp
Jul 5, 2018
Maintainer

phil-opp
Jul 9, 2018
Maintainer

phil-opp
Jul 23, 2018
Maintainer

phil-opp
Aug 21, 2018
Maintainer

phil-opp
Oct 7, 2018
Maintainer

phil-opp
Oct 21, 2020
Maintainer

phil-opp
Oct 21, 2020
Maintainer

laokz
Aug 15, 2021 — with giscus

begugla1
Nov 11, 2023 — with giscus

0m-a-D
Feb 17, 2024 — with giscus

Sidray-Infinity
Nov 13, 2024 — with giscus

Mrgoblings
Nov 25, 2024 — with giscus