This is a special README for the AT&T syntax connoisseurs, all other readers are advised to read the normal README, unless they can read instructions like this without hesitation:
AT&T syntax warning, viewer discretion is advised
movq -0x7e3ffde0(, %rax, 8), %rax
This is a simple anti-rootkit Linux Kernel Module written for a Operating Systems Security course.
It contains 7 different kernel integrity checks.
It is only compatible with the x86_64
architecture, but some non architecture specific checks can be used on a different architecture.
Table of Contents
WARNING: This is not a stable module, it could cause a kernel panic at anytime, and I would not recommend loading it on your main system.
Use a virtual machine!
To build the module you will need the header files for your kernel. Depending on the distribution they might be in different packages.
pacman -S linux-headers
apt install linux-headers-$(uname -r)
yum install kernel-devel
The module can be configured by modifying the config.h
file in the module's directory.
Each check can be disabled separately from it's recovery. Note, that disabled detection
implies disabled recovery. Additionally the periodic check interval can be changed, as
well as the experimental option to unload suspect modules by using the free_module
function.
- Go to the
modules/anti_rootkit
directory - Configure the module's checks by modifying the
config.h
file - Build the module with
make
- Install the module with
insmod anti_rootkit.ko
(again, use a Virtual Machine)
Once loaded the module will attempt to initialize the enabled checks and log some information about them. Since the module is only a proof of concept the logs may contain information that can be considered confidential to some degree, such as addresses of kernel structures.
The module provides a simple sysfs interface that allows a system check-up to be triggered on demand, as well as allows the user to display the time of the latest check.
/sys/kernel/antirootkit/last_check
- read to get the last check time/sys/kernel/antirootkit/check
- write a1
to trigger all enabled checks
The first, and the most simple check is the PINNED_BITS
check.
It checks up to 5 bits in 2 control registers cr0
and cr4
:
cr0
bits:- Write Protect (WP) - When cleared, the CPU can write to read-only pages, when running in ring 0. When set and a write is attempted the CPU will generate a page fault.
cr4
bits:- User-Mode Instruction Prevention (UMIP) - When set the
SGDT
,SIDT
,SLDT
,SMSW
andSTR
instructions cannot be executed in user mode. Those instructions are mostly related to the different descriptor tables. TheSIDT
instruction is explained in more details in the Interrupt Descriptor Table section. - Supervisor Mode Execution Protection (SMEP) - When set, an attempt to execute code from userspace in kernel mode generates a fault.
- Supervisor Mode Access Prevention (SMAP) - Similar to SMEP, but generates a fault when any data access is attempted.
- User-Mode Instruction Prevention (UMIP) - When set the
Those bits should always be set during normal operation, and in kernel 5.3 the common functions
native_write_cr0
and native_write_cr4
were changed to always set those bits, and warn when an attempt was made to clear any of them.
Two related commits (8dbec27a242cd3e2
and 873d50d58f67ef15
) implemented these checks for CR4 and CR0 registers respectively.
x86/asm: Pin sensitive CR4 bits
Several recent exploits have used direct calls to the native_write_cr4() function to disable SMEP and SMAP before then continuing their exploits using userspace memory access.
Direct calls of this form can be mitigate by pinning bits of CR4 so that they cannot be changed through a common function. This is not intended to be a general ROP protection (which would require CFI to defend against properly), but rather a way to avoid trivial direct function calling (or CFI bypasses via a matching function prototype) as seen in:
https://googleprojectzero.blogspot.com/2017/05/exploiting-linux-kernel-via-packet.html
(https://github.com/xairy/kernel-exploits/tree/master/CVE-2017-7308)
x86/asm: Pin sensitive CR0 bits
With sensitive CR4 bits pinned now, it's possible that the WP bit for CR0 might become a target as well.
The POC exploit, mentioned in the first commit, was able to call the native_write_cr4
function after bypassing KASLR
and calculating the offset to the function. Then it cleared the SMEP and SMAP bits,
which in turn allowed the payload to be executed with ring 0 privileges.
It is obviously still possible to write to these registers in kernel mode, by implementing the function without any checks.
static inline void _write_cr0(unsigned long val)
{
asm volatile("mov %0,%%cr0" : "+r"(val) : : "memory");
}
A malicious module might clear the WP bit to overwrite important kernel structures, such as but not limited to:
- system call entries of the
syscall_table
(see section Syscall table) - interrupt handler entries of the
itd_table
(see section Interrupt Descriptor Table)
or to hook kernel functions (later checks confirm the integrity of those structures and selected functions to make sure they have not been overwritten or hooked).
Usually the bit will be set back after the write is completed, but there's a chance that it won't be, and those bits are an integral part of the kernel security, so we have to make sure they are set.
The next check is also simple, but one of the most important checks in the whole project.
On x86_64 the syscall
instruction is used to enter kernel mode from user mode.
The Intel Software Developer Manual describes it's operation as such:
SYSCALL invokes an OS system-call handler at privilege level 0. It does so by loading RIP from the IA32_LSTAR MSR (after saving the address of the instruction following SYSCALL into RCX).
So, the CPU will enter ring 0 (the kernel mode), and then jump to the address specified by the model-specific register LSTAR
.
The kernel sets the value of the MSR_LSTAR
in the syscall_init
function to point to entry_SYSCALL_64
.
wrmsrl(MSR_LSTAR, (unsigned long)entry_SYSCALL_64);
A malicious module might want to overwrite the register's value to intercept all system calls (it
is very easy to do so).
If we are loaded into an unmodified kernel we can store the address of entry_SYSCALL_64
and detect any potential attempts to overwrite the address by periodically checking the value of the register.
If, however, we are loaded after a malicious module, and we suspect it has overwritten the register,
we can still detect, and potentially recover it's value by using kallsyms. The sprint_symbol
function allows us to
check the symbol name of the function pointed to by the MSR_LSTAR
, if it doesn't match it may have been overwritten.
We could also check the alignment of the entry_SYSCALL_64
, it should be paged aligned (the address ends in 0000
).
Having protected the syscall handler entry via the syscall
instruction (at least in the long mode,
we address the compatibility mode in the Interrupt Descriptor Table section)
the next logical step is to protect the syscall table itself.
A hypothetical rootkit, can very easily hook any syscall, without the need to patch the existing handlers. All it needs to do is replace an address of a specific syscall handler with an address of it's own function, with the same signature, inside the syscall table.
In older kernel versions (prior to 4.17) it was possible to locate the syscall table by searching
through the kernel address space and looking for references to the sys_close
function (more precisely, it's address)
which is the handler of the close
syscall, and thus can be found in the syscall table.
However, commit 2ca2a09d6215fd96
removed the export for sys_close
and replaced its usages with a ksys_close
wrapper,
making this approach unviable. The syscall table is also not available through kallsyms.
We have to get a bit clever to get access to the table. Let's follow what the CPU does to call a specific syscall handler.
We already know, when the syscall
instruction gets executed, we jump to entry_SYSCALL_64
,
and the address of the next instruction is stored in the RCX register.
This procedure is actually written in assembly (in the /arch/x86/entry/entry_64.S
file).
entry_SYSCALL_64:
; Prepare the stack
...
; Construct struct pt_regs on the stack
...
pushq %rcx ; pt_regs->ip (CPU stored it here on syscall)
pushq %rax ; pt_regs->orig_ax
; Pushes and clears (using xor %r, %r) all registers except for RAX, since it holds the syscall number
PUSH_AND_CLEAR_REGS rax=$-ENOSYS
; IRQs are off
movq %rax, %rdi ; unsigned long nr
movq %rsp, %rsi ; struct pt_regs *regs
callq do_syscall_64 ; returns with IRQs disabled
; Now RAX has the return value of the handler
...
It will first push the registers onto the stack (the struct pt_regs
structure) and then call do_syscall_64
with the first argument, which is stored in the RDI register, being the syscall number (originally stored in the RAX register),
and the second argument being the struct pt_regs
structure holding the other registers (mov rsi, rsp
;
remember that struct pt_regs
is now on the stack).
The do_syscall_64
function will check if the syscall number is valid and then call the appropriate syscall,
like this:
regs->ax = sys_call_table[nr](regs);
There it is, a reference to the sys_call_table
. If the CPU knows where to lookup the address to jump to, we can too!
Since entry_SYSCALL_64
is written in assembly, and it is not changed too often,
we can calculate a static offset to the call do_syscall_64
instruction like this:
// First byte of call is the opcode, following 4 bytes are the signed offset
offset = *((int *)(entry_syscall + ENTRY_DO_CALL_OFFSET + 1));
The do_syscall_64
function will be located at:
// The call offset should include the 5 instruction bytes
do_syscall = entry_syscall + offset + ENTRY_DO_CALL_OFFSET + 5;
The next part is a bit tricky, since the do_syscall_64
is written in C, so we have to apply
some heuristics, as we cannot be certain what the compiler will generate.
This is how the jump is compiled with my gcc
:
movq -0x7e3ffde0(, %rax, 8), %rax
callq sym.__x86_indirect_thunk_rax
We can now attempt to locate this part by pattern matching, looking for the mov
instruction -
specifically MOV r64,r/m64
. Additionally the value of the ModR/M Byte has to be equal to 04
,
meaning mov rax, ?
and the SIB Byte follows the ModR/M Byte to describe the source operand.
Moreover SIB Byte has to be equal to c5
since we are multiplying RAX by 8
(sizeof(void *)
; RAX is the syscall number) and combine it with the 32 bit displacement.
The format of the mov
instruction is as follows.
488b04c5????????
48 - REX.W
8b - MOV r64,r/m64
04 - ModR/M Byte - destination operand is RAX, SIB Byte follows
c5 - SIB Byte - source operand is rax*8 + disp32
???????? - disp32
Armed with this knowledge, we can now attempt to find this instruction in the first few
hundred bytes of the do_syscall_64
function.
Then it's as simple as extracting the offset (disp32
) and we can calculate the address
of the syscall table remembering to sign extend the displacement.
This is by no means a perfect approach, since the compiler can generate
a completely different code. However, it should be pretty easy to add
more patterns by analyzing the do_syscall_64
assembly.
More advanced techniques are possible, such as dynamically instrumenting the syscall handler
and analyzing it's memory access, but that was beyond the scope of this project.
When we finally get the address of the syscall table, we can create a copy of it, and check for any anomalies. Since we have a full copy, we can not only detect any attempts to tamper with the table, but also recover it, if we notice anything out of place.
That does require, however, that we are loaded into a clean kernel. If we suspect a potential rootkit has already been loaded before us, and replaced the entires in the syscall table we can still attempt to detect it (but it's going to be very difficult to recover the table). We can check weather the syscall table entries are close to each other. If a few entries reside in a completely unrelated memory region we can suspect they have been hooked.
Already mentioned a couple of times, the Interrupt Descriptor Table, or IDT for short is a data structure
used by x86. It associates a list of interrupts and exceptions with their respective handlers.
Many important interrupt handlers are referenced in the IDT, in particular the 0x80
32 bit syscall handler.
The kernel initializes the IDT very early on in trap_init
just before calling cpu_init
.
The struct desc_ptr idt_descr
structure contains the size and address of the IDT,
and is persisted to the 80 bit idtr
register using the lidt
instruction.
Definition of the 80 bit desc_ptr
structure:
struct desc_ptr {
unsigned short size;
unsigned long address;
} __attribute__((packed));
The operation of the lidt
instruction in 64 bit mode:
IDTR(Limit) ← SRC[0:15];
IDTR(Base) ← SRC[16:79];
Getting the address of the IDT is much much simpler then accessing the syscall table.
We can either directly use the sidt
instruction to load the contents of idtr
into a struct desc_ptr
,
or even simpler, just use the function included from asm/desc.h
- void store_idt(struct desc_ptr *dtr)
.
Once we have to address of the IDT, we can create a copy just like we did for the syscall table and monitor it for any changes.
Most rootkit modules, don't only go on the offence, they also need to protect themselves from being
found and unloaded. The most common way they achieve this is by removing themselves from the module list.
This not only guarantees that the module cannot be seen by the user calling lsmod
, but it
also cannot be seen by other kernel structures, so the module cannot be unloaded at all.
Let's try to detect that. The kernel uses a doubly linked list to store the list of modules.
We can get access to that list using the THIS_MODULE
macro, that returns the struct module
structure that describes our module. It contains a struct list_head list
entry, that's the module list.
Granted, we are in the middle, or just after we are loaded, at the end of the list, but thanks
to the structure of a doubly linked list, we can access other entries in that list.
Now we have to assume, that again, we are loaded before any malicious modules.
We can hook the do_init_module
and free_module
functions using ftrace
, the built-in kernel tracing and hooking engine.
This allows us to independently keep track of the loaded modules and detect when a module disappears from
the real list.
We can assume the module is malicious when it removes itself from the list, and even forcefully unload it
using the free_module
function. It might however leave some kernel structures in a broken state, as
the proper way to remove the module would be to call the delete_module
syscall. We could do that with
a userspace helper, but this project does not implement that functionality (TODO?).
Everything is a file is one of the defining features of every Unix operating system. Linux being it's representative also follows this convention. Access to devices such as a keyboard, a USB flash drive or even a printer can be done by reading and writing certain files in the filesystem.
The structure backing this actions is called file_operations
and it is a struct of
function pointers implementing read, write and many other operations for a set file
(directory is also a file). It contains more then 30 operations, but the most commonly
implemented ones are:
read
read_iter
write
write_iter
llseek
fsync
iterate
iterate_shared
It is possible to hook a few selected file operations for certain file systems, to further conceal a rootkit and it's components.
Once example includes hooking the procfs, sysfs and the rootfs file operations
(iterate_shared
in particular) to hide the rootkit's files stored on the disk,
it's processes running in userspace as well as the interface to interact with it.
We make an attempt at protecting a few selected file operations of a few filesystems,
by detecting and preventing hooking of the functions handling these file operations.
We do that by storing a copy of the first dozen bytes of a function, which when ftrace
is enabled will contain a 5-byte nop
instruction used ftrace
for hooking functions.
We periodically compare our copy with the original function and make sure it has not been hooked. We can restore the hooked functions by copying the data from our backup back to the original function.
This check is very similar to the File operations check in the way it checks the integrity of functions. It does however use a different approach to finding locations of functions.
Many rootkits like to hook networking functions in the Linux Kernel. This allows them to create backdoors, as well as hide their open communication channels (similarly to how hooking file operations works).
By hooking the ip_rcv
function a malicious module can parse the IP
packets, before they're processed by the kernel (as per the comment in the Linux Kernel
source IP receive entry point) and react accordingly when they see a specially crafted packet.
They can also hook the tcp4_seq_show
and udp4_seq_show
which are the procfs
handlers that allow listing the currently active TCP and UDP connections which. The procfs is
used by netstat
to list active connections. While the File operations check
allowed disguising specific files and directories in the procfs, hooking this methods allows the
rootkit to modify the contents of those files.
The detection an recovery works exactly like the File operations protection.
Those 3 functions are just for example purposes and many more functions can easily be added to this module to be protected as well.
- Clone this repo.
- Initialize and update the submodules with
git submoule init && git submodule update
(or pass the--recurse-submodules
flag when cloning). - Copy the
config/kernel.config
to thekernel/
directory. - Build the kernel with
make
(only needs to be done once). - Build the modules and create a rootfs with
./build.sh
. - Run the QEMU VM with
./run.sh
.
- ftrace hooking - GPLv2