Notes on CS170
So finally, I started a proper date with Miss OS. I have read some code from Serenity OS and done some intro-level kernel challenges, but I feel I should definitely ace the OS course in school. To show my respect, I am going to take notes of this course on my website.
Intro to OS
syscall
is a familiar concept when writing shellcode or doing a ret2sys trick. To my understanding, the kernel is the layer between userland and hardware. The kernel is the raw program running on the CPU to manage other sub-programs.OS provides an abstraction to userland programs. Programs think they are running continuously; they are not. Files think they are continuous bytes; they are not. Programs think they have individual memory; they do not.
Intro to processes
Definition: the instance of a running program
OS uses fork()
to create processes. The root process of Linux is init
.
launchd
, and in CentOS it’s systemd
. But yeah, in Debian it is init
.I am a little bit surprised by the fact that processes are actually created only using fork()
. After searching, the fact is, initially there is only fork()
. Later, exec()
also becomes a method of creating new processes.
why fork and exec are kept 2 seperate calls
The OS will first create a task_struct* p
and setup the new information for it. Then it copies all information from the parent process. One thing to notice is that, on many OS, there is a technique called COW (copy-on-write): this allows the OS to delay the allocation of memory pages of the new process until a write operation is wanted.
https://elixir.bootlin.com/linux/latest/source/fs/exec.c#L1703
The code for exec
is quite different from fork
. The main difference should be that exec
will replace instead of copy. I am not sure about the reason why exec
exists, maybe for performance (fork
needs to kill the original process if it wants what exec
does)
When I am reading Linux’s exec.c
, I notice an interesting function.
/*
* determine how safe it is to execute the proposed program
* - the caller must hold ->cred_guard_mutex to protect against
* PTRACE_ATTACH or seccomp thread-sync
*/
static void check_unsafe_exec(struct linux_binprm *bprm)
This actually prevents a setuid process from being ptraced
. Say we have a process running as nobody. We attach to that process and do a exec("/usr/bin/passwd")
and that allows us to write shellcode before role changes. hxp CTF 2020: wisdom2
The structure that contains information about a process is PCB. A PCB can also contain:
-----------------
| process id |
| state | (ready, runnable, blocked, etc.)
| user id |
| IP |
| open file |
| VM structures |
| registers |
| ..... | (signal mask, terminal, priority, ...)
----------------
Privileged vs. unprivileged mode
Three ways to switch to kernel mode:
- Exceptions (e.g., divide by 0)
- Trap (e.g.,
int3
,int80
,syscall
) - Interrupt (e.g.,
ctrl+c
)
SIGSEV/SIGABRT
a trap or an exception?Q: “How are the 3 ways of switching to kernel mode different? Suppose we have a customized kernel that will handle null-ref as syscall
. Then how do we differ ‘exception’ and ‘trap’?”
A: “From a kernel’s view, all ‘exceptions’, ‘interrupts’, ‘traps’ are just methods to get into kernel mode.”
My understanding: “So, really, it’s just a categorizing thing. A kernel developer should write code to handle all three ways of entering kernel mode”
A to the previous question about signals
Q: “Signals are just IPC methods. There may or may not exists a real, say, null-ref when there is a SIGSEV
raised.”
Again, hidden behind the kernel.
Return from kernel mode to user mode:
iret
On a x64 platform, iretq
is used instead of iret
. The operation can be found at Interrupt Return. Basically, it does a context switch, for example switching stack/program counters and stuff. There is also swapgs
instruction before iretq
, and this instruction is self-explanatory: it swaps the gs
register.
One concept I am not familiar with is “real mode” and “protect mode” things. I will come back to this note if I have more knowledge on it.
ret2usr
attack corresponds to ret2shellcode
in userland. Say we have a simple stack buffer overflow and we can overflow the return address to any place. With no mitigation enabled, we can simply return to a piece of memory in userland, where we write shellcode. However, when executing shellcode, we are still in the kernel mode and a kernel cannot simply execve(/bin/sh, 0, 0)
. Instead, we need to set the privilege of some process to root (cause that’s our purpose), and gracefully return to the execution context of the process to spawn a shell.Process’s view of memory; stacks
system calls, file descriptors
These parts are too trivial to be noted.
How a computer is started
- Physically the machine is on.
- On the ROM, BIOS is loaded and executed.
- BIOS finds the boot loader and runs it.
- The boot loader loads and runs the kernel.
- OS is started and have fun.
Shell (we love shells, unexpected ones)
How does the shell start programs
fork
& exec
Redirection
ls > tmp1
This will do a
close(1)
open("tmp1", O_TRUNC | O_CREAT | O_WRONLY, 0666) // Returns 1
before exec()
Pipelines
ls | head
: the stdout of ls
will be the stdin of head
pipe()
I had a long time fearing pipe
for some unknown reason(maybe because of its weird calling process). So I wrote this test code to help me understand how to use pipe
.
int main(){
int fdarray[2]; // What feared me is this strange line. What does it even do?
pipe(fdarray);
char* buffer = malloc(0x10);
write(fdarray[1], "hello", 5);
read(fdarray[0], buffer, 5);
write(1, buffer, 5);
return 0;
}
The thing is, pipe
takes in a 2-int-long array: it will set the first int to “read end” and the second int to “write end”. Simple as that. And the two ints are actually file descriptors, with fd number 3 and 4. If you goto /proc/$(pidof a.out)/fd
and run a ls -la
, you will see:
/tmp # ls -la /proc/$(pidof a.out)/fd
dr-x------ 2 root root 0 Jan 17 20:39
dr-xr-xr-x 9 root root 0 Jan 17 20:39 ..
lrwx------ 1 root root 64 Jan 17 20:39 0 -> /dev/pts/1
lrwx------ 1 root root 64 Jan 17 20:39 1 -> /dev/pts/1
lrwx------ 1 root root 64 Jan 17 20:39 2 -> /dev/pts/1
lr-x------ 1 root root 64 Jan 17 20:39 3 -> 'pipe:[157751]'
l-wx------ 1 root root 64 Jan 17 20:39 4 -> 'pipe:[157751]'
Of course, everything is a file.
dup2
and dup
dup2
manipulates the fd table. Say 3
points to file1
and 4
points to file2
. If we do a dup2(3, 4)
, now 4
also points to file2
.
dup
will copy the fd. Say 3
points to file1
and dup(3)
will create an fd 4
and it points to file1
as well.
Threads
* Lightweight processes
To my understanding and explanations by the professor, the thread is an abstraction. That means, from a kernel’s perspective, everything is still sequential (not exactly, but mostly yes). From a user’s view, things are running simultaneously. Just like the async programming stuff.
Will read some more code on this topic. SerenityOS would be a good target.
To correct the above statement: it is true if there’s only one CPU. That is, there is of course real multi-threaded program. The world is not too deceptive.
Security 🎉
“Race condition” is never an unfamiliar term for CTF players or security researchers. The basic idea is that when a shared object/memory/stuff is accessed by multiple threads/programs/people, bad code can result in unwanted behavior.
I strongly suggest you to watch this video from liveoverflow. It explains a TOCTOU (a kind of race condition) very well using a small example.