Notes on CS170

So finally, I started a proper date with Miss OS. I have read some code from Serenity OS and done some intro-level kernel challenges, but I feel I should definitely ace the OS course in school. To show my respect, I am going to take notes of this course on my website.

Intro to OS

I believe a pwn player would definitely have an overview of what a kernel is more or less. After all, syscall is a familiar concept when writing shellcode or doing a ret2sys trick. To my understanding, the kernel is the layer between userland and hardware. The kernel is the raw program running on the CPU to manage other sub-programs.

OS provides an abstraction to userland programs. Programs think they are running continuously; they are not. Files think they are continuous bytes; they are not. Programs think they have individual memory; they do not.

Intro to processes

Definition: the instance of a running program

OS uses fork() to create processes. The root process of Linux is init.

notes on init
I believe in macOS the root process is launchd, and in CentOS it’s systemd. But yeah, in Debian it is init.
notes on fork

I am a little bit surprised by the fact that processes are actually created only using fork(). After searching, the fact is, initially there is only fork(). Later, exec() also becomes a method of creating new processes.

why fork and exec are kept 2 seperate calls

do_fork source code

The OS will first create a task_struct* p and setup the new information for it. Then it copies all information from the parent process. One thing to notice is that, on many OS, there is a technique called COW (copy-on-write): this allows the OS to delay the allocation of memory pages of the new process until a write operation is wanted.

The code for exec is quite different from fork. The main difference should be that exec will replace instead of copy. I am not sure about the reason why exec exists, maybe for performance (fork needs to kill the original process if it wants what exec does)

interesting security issue

When I am reading Linux’s exec.c, I notice an interesting function.

 * determine how safe it is to execute the proposed program
 * - the caller must hold ->cred_guard_mutex to protect against
 *   PTRACE_ATTACH or seccomp thread-sync
static void check_unsafe_exec(struct linux_binprm *bprm)

This actually prevents a setuid process from being ptraced. Say we have a process running as nobody. We attach to that process and do a exec("/usr/bin/passwd") and that allows us to write shellcode before role changes. hxp CTF 2020: wisdom2

The structure that contains information about a process is PCB. A PCB can also contain:

	    |   process id  |
	    |   state       |   (ready, runnable, blocked, etc.)
	    |   user id     |
	    |   IP          |
	    |   open file   |
	    | VM structures |
	    |   registers   |
	    |   .....       |  (signal mask, terminal, priority, ...) 

Privileged vs. unprivileged mode

Three ways to switch to kernel mode:

  • Exceptions (e.g., divide by 0)
  • Trap (e.g., int3, int80, syscall)
  • Interrupt (e.g., ctrl+c)
Is SIGSEV/SIGABRT a trap or an exception?
office time about exceptions

Q: “How are the 3 ways of switching to kernel mode different? Suppose we have a customized kernel that will handle null-ref as syscall. Then how do we differ ‘exception’ and ‘trap’?”

A: “From a kernel’s view, all ‘exceptions’, ‘interrupts’, ‘traps’ are just methods to get into kernel mode.”

My understanding: “So, really, it’s just a categorizing thing. A kernel developer should write code to handle all three ways of entering kernel mode”

A to the previous question about signals

Q: “Signals are just IPC methods. There may or may not exists a real, say, null-ref when there is a SIGSEV raised.”

Again, hidden behind the kernel.

Return from kernel mode to user mode:

  • iret
notes on iret

On a x64 platform, iretq is used instead of iret. The operation can be found at Interrupt Return. Basically, it does a context switch, for example switching stack/program counters and stuff. There is also swapgs instruction before iretq, and this instruction is self-explanatory: it swaps the gs register.

One concept I am not familiar with is “real mode” and “protect mode” things. I will come back to this note if I have more knowledge on it.

ret2usr attack corresponds to ret2shellcode in userland. Say we have a simple stack buffer overflow and we can overflow the return address to any place. With no mitigation enabled, we can simply return to a piece of memory in userland, where we write shellcode. However, when executing shellcode, we are still in the kernel mode and a kernel cannot simply execve(/bin/sh, 0, 0). Instead, we need to set the privilege of some process to root (cause that’s our purpose), and gracefully return to the execution context of the process to spawn a shell.

Process’s view of memory; stacks

system calls, file descriptors

These parts are too trivial to be noted.

How a computer is started

  1. Physically the machine is on.
  2. On the ROM, BIOS is loaded and executed.
  3. BIOS finds the boot loader and runs it.
  4. The boot loader loads and runs the kernel.
  5. OS is started and have fun.

Shell (we love shells, unexpected ones)

How does the shell start programs

fork & exec


ls > tmp1

This will do a

open("tmp1", O_TRUNC | O_CREAT | O_WRONLY, 0666) // Returns 1

before exec()


ls | head: the stdout of ls will be the stdin of head


Experiment on `pipe`

I had a long time fearing pipe for some unknown reason(maybe because of its weird calling process). So I wrote this test code to help me understand how to use pipe.

int main(){
    int fdarray[2];  // What feared me is this strange line. What does it even do?
    char* buffer = malloc(0x10);
    write(fdarray[1], "hello", 5);
    read(fdarray[0], buffer, 5);
    write(1, buffer, 5);
    return 0;

The thing is, pipe takes in a 2-int-long array: it will set the first int to “read end” and the second int to “write end”. Simple as that. And the two ints are actually file descriptors, with fd number 3 and 4. If you goto /proc/$(pidof a.out)/fd and run a ls -la, you will see:

/tmp # ls -la /proc/$(pidof a.out)/fd
dr-x------ 2 root root  0 Jan 17 20:39
dr-xr-xr-x 9 root root  0 Jan 17 20:39 ..
lrwx------ 1 root root 64 Jan 17 20:39 0 -> /dev/pts/1
lrwx------ 1 root root 64 Jan 17 20:39 1 -> /dev/pts/1
lrwx------ 1 root root 64 Jan 17 20:39 2 -> /dev/pts/1
lr-x------ 1 root root 64 Jan 17 20:39 3 -> 'pipe:[157751]'
l-wx------ 1 root root 64 Jan 17 20:39 4 -> 'pipe:[157751]'

Of course, everything is a file.

dup2 and dup

dup2 manipulates the fd table. Say 3 points to file1 and 4 points to file2. If we do a dup2(3, 4), now 4 also points to file2.

dup will copy the fd. Say 3 points to file1 and dup(3) will create an fd 4 and it points to file1 as well.


* Lightweight processes


To my understanding and explanations by the professor, the thread is an abstraction. That means, from a kernel’s perspective, everything is still sequential (not exactly, but mostly yes). From a user’s view, things are running simultaneously. Just like the async programming stuff.

Will read some more code on this topic. SerenityOS would be a good target.

To correct the above statement: it is true if there’s only one CPU. That is, there is of course real multi-threaded program. The world is not too deceptive.

Security 🎉

“Race condition” is never an unfamiliar term for CTF players or security researchers. The basic idea is that when a shared object/memory/stuff is accessed by multiple threads/programs/people, bad code can result in unwanted behavior.

I strongly suggest you to watch this video from liveoverflow. It explains a TOCTOU (a kind of race condition) very well using a small example.