Redoing the scheduler
So now that I have a dynamic allocator (or so I thought, it was actually broken as seen later LOL) I am going to redo all of the scheduler, make it cleaner code, and create semaphores so I can have syncronyzation.
This part was ezpz, all I did was have a ready queue that I would cycle over and then if a semaphore was hit, it would go to that queue and get out of the ready queue if the num was below 0. I will say, the timer ISR was well refactored and now it is SO NICE to look at (it started looking shitty the more I kept adding to it, as it usually goes):
void _mk_timer_int_handler(uint64_t* stack) {
mk_working_thread = mk_get_working_thread();
if (mk_working_thread->time_slice > 0) {
mk_working_thread->time_slice--;
return;
}
if (!mk_working_thread->started) {
t_init(stack);
return;
}
if (!t_dis_by_state(stack)) {
mk_thread_ctx_switch();
mk_working_thread = mk_get_working_thread();
t_res_state(stack);
}
}
Outside of that the issues came later which I will talk about, I spent a decent amount of time staring at GDB lol. Luckily pwndbg makes it a lot nicer.
I dont usually talk about bugs I come across because there are always bugs, but I got some good things out of these that I want to note.
Bug one
So to recap I am running this on x86_64, now with this in mind I want to give you a snippet of code and see if you can spot the bug. Keep in mind ctx is an object that is used for saving and restoring context, also the stack variable is a pointer to the stack right before calling the handler function:
void mk_thread_ctx_save_from_stack(struct regs_context* ctx, uint64_t* stack) {
ctx->rax = stack[0];
ctx->rbx = stack[1];
ctx->rcx = stack[2];
ctx->rdx = stack[3];
ctx->rbp = stack[4];
ctx->rdi = stack[5];
ctx->rsi = stack[6];
ctx->r8 = stack[7];
ctx->r9 = stack[8];
ctx->r10 = stack[9];
ctx->r11 = stack[10];
ctx->r12 = stack[11];
ctx->r13 = stack[12];
ctx->r14 = stack[13];
ctx->r15 = stack[14];
// Interrupt frame
ctx->rip = stack[15];
ctx->cs = stack[16];
ctx->rflags = stack[17];
// CS & 3 gives us the RPL (requested privilege level)
if ((ctx->cs & 0x3) != 0) {
ctx->rsp = stack[18];
ctx->ss = stack[19];
} else {
ctx->rsp = (uint64_t)(&stack[15]);
uint64_t current_ss;
asm volatile("mov %%ss, %0" : "=r"(current_ss));
ctx->ss = current_ss;
}
}
Got it? The problem was that after every timer interrupt the stack seemed to not go back to what it was supposed to be (offset -0x30 bytes). I have not implemented user space at all on this and the CPL is always 0, or so I thought. The context save handler checks the lower two bits of the CS register that was pushed into the stack by the timer interrupt and then handles RSP and SS accordingly if one of the two lower CS bits are set.
The problem is that never happens, and to my understanding it shouldnt. But from debugging for some reason I saw that the interrupt and IRETQ are goning to both push and pop all 5 interrupt registers into the stack as if it were a user interrupt. Is there somethign I dont understand with the PIC or PIT? because when I made it handle RSP and SS no matter what, my kernel stopped having issues.
This was quite a bit of GDBing to figure out, and after doing some research, the thing is that this is designed for 32 bit systems!!! 64 bit systems push all 5 RIP CS RFLAGS RSP SS on interrupts no matter if it is coming from privilaged space or not, while on 32 bit systems it will only push the first three if it is coming from kernel space and all 5 if its coming from user. here is the fixed code:
void mk_thread_ctx_save_from_stack(struct regs_context* ctx, uint64_t* stack) {
ctx->rax = stack[0];
ctx->rbx = stack[1];
ctx->rcx = stack[2];
ctx->rdx = stack[3];
ctx->rbp = stack[4];
ctx->rdi = stack[5];
ctx->rsi = stack[6];
ctx->r8 = stack[7];
ctx->r9 = stack[8];
ctx->r10 = stack[9];
ctx->r11 = stack[10];
ctx->r12 = stack[11];
ctx->r13 = stack[12];
ctx->r14 = stack[13];
ctx->r15 = stack[14];
ctx->rip = stack[15];
ctx->cs = stack[16];
ctx->rflags = stack[17];
ctx->rsp = stack[18];
ctx->ss = stack[19];
}
Bug two
This one is not much about the bug, but just the fact that it was in my slab allocator because it motivated me to make a GDB plugin and it came out to be really nice!!! It has colors and everything. take a peek!
pwndbg> slabs
================================================================================
SLAB ALLOCATOR STATE
================================================================================
[Bucket 2] Size: 32 bytes
--------------------------------------------------------------------------------
Slab #0 @ 0xffffffff80208000
ID: 2
Size: 32 bytes
Usage: 1/127 (0.8%)
Free: 126
Freelist: 0xffffffff80208040
Total slabs in bucket: 1
[Bucket 3] Size: 64 bytes
--------------------------------------------------------------------------------
Slab #0 @ 0xffffffff80200000
ID: 3
Size: 64 bytes
Usage: 2/63 (3.2%)
Free: 61
Freelist: 0xffffffff802000a0
Slab #1 @ 0x2001003
ID: 0
Size: 0 bytes
Usage: 0/0 (0.0%)
Free: 0
Freelist: 0x0
Total slabs in bucket: 2
[Bucket 5] Size: 256 bytes
--------------------------------------------------------------------------------
Slab #0 @ 0xffffffff80202000
ID: 5
Size: 256 bytes
Usage: 1/15 (6.7%)
Free: 14
Freelist: 0xffffffff80202120
Total slabs in bucket: 1
================================================================================
Total slabs: 4
================================================================================
pwndbg> vslabs
Undefined command: "vslabs". Try "help".
pwndbg> vslab 0xffffffff80200000
================================================================================
SLAB INSPECTION @ 0xffffffff80200000
================================================================================
ID: 3
Object Size: 64 bytes
Max Objects: 63
Used: 2 (3.2%)
Free: 61
Next Slab: 0x0000000002001003
Freelist: 0xffffffff802000a0
================================================================================
MEMORY DUMP (Page @ 0xffffffff80200000)
================================================================================
0xffffffff80200000 0000004002000003 0000003f0000003d ....@...=...?...
0xffffffff80200010 0000000002001003 ffffffff802000a0 .......... .....
0xffffffff80200020 00000000010093f8 00000000010093ea ................
0xffffffff80200030 ffffffff8100ab50 0000000002002003 P........ ......
0xffffffff80200040 0000000002003003 0000000000000000 .0..............
0xffffffff80200050 0000000000000000 0000000000000000 ................
0xffffffff80200060 000000000100940a 00000000010093fd ................
0xffffffff80200070 ffffffff8100a750 0000000000000001 P...............
0xffffffff80200080 ffffffff80200020 0000000000000000 . .............
0xffffffff80200090 0000000000000000 0000000000000000 ................
0xffffffff802000a0 ffffffff802000e0 0000000000000000 .. ............. <=== isfree
0xffffffff802000b0 0000000000000000 0000000000000000 ................
0xffffffff802000c0 0000000000000000 0000000000000000 ................
0xffffffff802000d0 0000000000000000 0000000000000000 ................
0xffffffff802000e0 ffffffff80200120 0000000000000000 . ............. <=== isfree
0xffffffff802000f0 0000000000000000 0000000000000000 ................
0xffffffff80200100 0000000000000000 0000000000000000 ................
0xffffffff80200110 0000000000000000 0000000000000000 ................
0xffffffff80200120 ffffffff80200160 0000000000000000 `. ............. <=== isfree
0xffffffff80200130 0000000000000000 0000000000000000 ................
0xffffffff80200140 0000000000000000 0000000000000000 ................
0xffffffff80200150 0000000000000000 0000000000000000 ................
... more ...
pwndbg> pagewalk 0xffffffff80200000
================================================================================
PAGE TABLE WALK FOR VIRTUAL ADDRESS: 0xffffffff80200000
================================================================================
CR3 Register: 0x0000000001001000
PML4 Base: 0x0000000001001000
================================================================================
Virtual Address Breakdown:
PML4 Index: 511 (0x1ff)
PDPT Index: 510 (0x1fe)
PD Index: 1 (0x001)
PT Index: 0 (0x000)
Offset: 0 (0x000)
================================================================================
[1] PML4 Entry
Address: 0x0000000001001ff8
Value: 0x0000000001002023
Flags: P | RW | A
PDPT Base: 0x0000000001002000
[2] PDPT Entry
Address: 0x0000000001002ff0
Value: 0x0000000001003023
Flags: P | RW | A
PD Base: 0x0000000001003000
[3] PD Entry
Address: 0x0000000001003008
Value: 0x00000000002000e3
Flags: P | RW | A | D | PS
[2MB HUGE PAGE]
Final Physical Address: 0x0000000000200000
================================================================================
The colors probably wont show on md but whatever!!!
Anyways, if you read the debug logs, I acutally left the bug as part of these logs to see if you can find it. Do you notice anything odd?
If you didnt that is fine, look at Bucket 3 of the slabs command, that does not look like a valid address because it isnt. The problem is that my virtual memory mapper was not taking into account HUGE page flags when determining if a page table entry existed… that is it. So it would treat valid memory that was being used as a page table hehe.