Implement Your Own Operating System (week 02)
This article will show you how to use C instead of assembly code as the programming language for the OS.
“ what …..? Why we need C codes .😳😳 We have assembly codes.”
The UNIX operating system’s development started in 1969, and its code was rewritten in C in 1972. The C language was actually created to move the UNIX kernel code from assembly to a higher level language, which would do the same tasks with fewer lines of code.
as you remember we created simple booting OS in the week 01 article. today’s article continues with last week’s article. if you missed the week 01 article, please study it before reading this.
Main topics today we discuss,
- Setting up a stack
- Calling C code from assembly
- Packing Structs
- Compiling C code
- Build Tools
Setting up a stack
what is the stack ?
The stack is a dynamic structure. You do not know ahead of time how much stack space will be required by any given program as it executes. It is impossible to know how much space to allocate for the stack. So you would like to allocate as much space as possible, while preventing it from colliding with program instructions. The solution is to start the stack at the highest address and have it grow toward lower addresses.
why we need stack ?
We could point esp to a random area in memory since, so far, the only thing in the memory is GRUB, BIOS, the OS kernel, and some memory-mapped I/O. This is not a good idea — we don’t know how much memory is available or if the area esp would point to is used by something else. A better idea is to reserve a piece of uninitialized memory in the bss section in the ELF file of the kernel. It is better to use the bss section instead of the data section to reduce the size of the OS executable. Since GRUB understands ELF, GRUB will allocate any memory reserved in the bss section when loading the OS.
The NASM pseudo-instruction resb
can be used to declare uninitialized data:
KERNEL_STACK_SIZE equ 4096 ; size of stack in bytes
section .bss
align 4 ; align at 4 bytes
kernel_stack: ; label points to beginning of memory
resb KERNEL_STACK_SIZE
The stack pointer is then set up by pointing esp
to the end of the kernel_stack
memory:
mov esp, kernel_stack + KERNEL_STACK_SIZE ; point esp to the start of the
; stack (end of memory area)
Calling C code from assembly
The next step is to call a C function from assembly code. There are many different conventions for how to call C code from assembly code . This article uses the cdecl calling convention, since that is the one used by GCC. The cdecl calling convention states that arguments to a function should be passed via the stack (on x86). The arguments of the function should be pushed on the stack in a right-to-left order, that is, you push the rightmost argument first. The return value of the function is placed in the eax register.
The following code shows an example:
/* The C function */
int sum_of_three(int arg1, int arg2, int arg3)
{
return arg1 + arg2 + arg3;
}; The assembly code
external sum_of_three ; the function sum_of_three is defined elsewhere
push dword 3 ; arg3
push dword 2 ; arg2
push dword 1 ; arg1
call sum_of_three ; call the function, the result will be in eax
After adding the c function to the main.c file and the assembly code to the loader.s file, the codes looks like this,
loader.s file
kmain.c file
Packing Structs
When structures are defined, the compiler is allowed to add paddings (spaces without actual data) so that members fall in address boundaries that are easier to access for the CPU.
For example, on a 32-bit CPU, 32-bit members should start at addresses that are multiple of 4 bytes in order to be efficiently accessed (read and written). The following structure definition adds a 16-bit padding between both members, so that the second member falls in a proper address boundary:
struct S {
int16_t member1;
int32_t member2;
};
The structure in memory of the above structure in a 32-bit architecture is (~ = padding):
+---------+---------+
| m1 |~~~~| m2 |
+---------+---------+
When a structure is packed, these paddings are not inserted. The compiler has to generate more code (which runs slower) to extract the non-aligned data members, and also to write to them.
The same structure, when packed, will appear in memory as something like:
+---------+---------+
| m1 | m2 |~~~~
+---------+---------+
_attribute__((__packed__))
means (most probably) "do not insert any padding to make things faster" and may also mean "do not insert any alignments to preserve alignment".
Compiling C code
When compiling the C code for the OS, a lot of flags to GCC need to be used. This is because the C code should not assume the presence of a standard library, since there is no standard library available for our OS. For more information about the flags, see the GCC manual.
The flags used for compiling the C code are:
-m32 -nostdlib -nostdinc -fno-builtin -fno-stack-protector -nostartfiles
-nodefaultlibs
As always when writing C programs we recommend turning on all warnings and treat warnings as errors:
-Wall -Wextra -Werror
Build Tools
Make searches the current directory for the makefile to use, e.g. GNU Make searches files in order for a file named one of GNUmakefile, makefile, or Makefile and then runs the specified target from that file.
Now is also probably a good time to set up some build tools to make it easier to compile and test-run the OS.
Makefile
The contents of your working directory should now look like the following figure:
You should now be able to start the OS with the simple command “ make run”, which will compile the kernel and boot it up in Bochs.
this is week 2 article of Implement Your Own Operating System article series. See you in the next week article.
Reference: Helin, E., & Renberg, A. (2015). The little book about OS development
Thanks for reading……
— Kasun Madhumal —