eBPF for Linux Admins: Part VII
Table of Contents
eBPF - This article is part of a series.
Let’s look at , kprobes
.
Kprobes are one of the dynamic tracing functionality available in Linux Kernel.
But why we are learning kprobes
and why that is related to eBPF
?
As I said earlier, things will get interesting going forward. Please be patient and keep learning
Here is the basic working principle of kprobes
- Identify the kernel function you want to probe.
- Register kprobes in that function.
- The first
opcode
of the function will be replaced with abreak point
and the original instruction gets copied. - The user defined routine,
pre-handler
gets called which can inspect all details coming to the original function. - Once pre-handler completes the execution, the
original instruction
that copied earlier gets executed. - After the execution, the optional
post-handler
gets executed which is a user defined routine. - Finally the control goes back to the original flow and the next instructions gets executed.
Most of our focus will be on pre-handler
where we can examine the data coming to the function.
Let’s write a kernel module that probes the function openat
and print the program, pid and the file name.
We are not going to trace all programs, instead we will trace the function only when the program name matches with sample_write
that we wrote earlier.
Let’s write the kprobe module and compile it.
mkdir -p lkmpg/kprobes
cd !$
vi kprobe_example.c
#include <linux/kernel.h>
#include <linux/module.h>
#include <linux/kprobes.h>
#include <linux/sched.h>
static struct kprobe kp = {
// The actual function that implements openat syscall
// Detailed explanation of this part is at the end of the article.
.symbol_name = "do_sys_openat2",
};
static int
handler_pre (struct kprobe *p, struct pt_regs *regs)
{
// we are interested in 'sample_write' program we wrote in previous chapter
if (strncmp (current->comm, "sample_write", 12))
return 0;
// Access file name - "man syscall" and look for ABI for more details
char *param_fname_reg = (char __user *) regs->si;
// Print the information to 'dmesg'
printk ("do_sys_openat2 called by:%s pid=%i fname=%s\n", current->comm,
current->pid, param_fname_reg);
return 0;
}
static void
handler_post (struct kprobe *p, struct pt_regs *regs, unsigned long flags)
{
/* Optional handler */
}
static int __init
kprobe_init (void)
{
kp.pre_handler = handler_pre;
kp.post_handler = handler_post;
register_kprobe (&kp);
printk ("Kprobe attached to do_sys_openat2\n");
return 0;
}
static void __exit
kprobe_exit (void)
{
unregister_kprobe (&kp);
printk ("Kprobe detached from do_sys_openat2\n");
}
module_init (kprobe_init);
module_exit (kprobe_exit);
MODULE_LICENSE ("GPL");
MODULE_AUTHOR ("Ansil H");
MODULE_DESCRIPTION ("Simple Kprobe to trace file open operations from sample_write program");
You might be wondering why we are tracing do_sys_openat2
instead of syscall openat
.
Long story short, when the openat
syscall gets executed, the actual function inside the kernel that is responsible for doing the work is do_sys_openat2
. More details are available here if you are interested.
Now compile the code and load it
vi Makefile
obj-m += kprobe_example.o
PWD := $(CURDIR)
all:
make -C /lib/modules/$(shell uname -r)/build M=$(PWD) modules
clean:
make -C /lib/modules/$(shell uname -r)/build M=$(PWD) clean
make
Load the module.
sudo insmod ./kprobe_example.ko
Now execute journalctl -f
in another terminal and you can see the message Kprobe attached to do_sys_openat2
Jan 22 18:23:19 ebpf sudo[4176]: ansil : TTY=pts/0 ; PWD=/home/ansil/lkmpg/kprobes ; USER=root ; COMMAND=/usr/sbin/insmod ./kprobe_example.ko
Jan 22 18:23:19 ebpf sudo[4176]: pam_unix(sudo:session): session opened for user root(uid=0) by ansil(uid=1000)
Jan 22 18:23:19 ebpf kernel: Kprobe attached to do_sys_openat2
Jan 22 18:23:19 ebpf sudo[4176]: pam_unix(sudo:session): session closed for user root
Good, the module is loaded.
Now the next step is to execute our sample_write program which we wrote in our previous chapter.
./sample_write
The jounalctl
output will show below which indicates that the openat syscall were made 3 time.
Two calls were for loading the library and the last one to open our text file sample.txt
Jan 22 18:25:17 ebpf kernel: do_sys_openat2 called by:sample_write pid=4182 fname=/etc/ld.so.cache
Jan 22 18:25:18 ebpf kernel: do_sys_openat2 called by:sample_write pid=4182 fname=/lib/x86_64-linux-gnu/libc.so.6
Jan 22 18:25:18 ebpf kernel: do_sys_openat2 called by:sample_write pid=4182 fname=./sample.txt
Yay!!! 🎉
Now we know how to write a module that utilizes kprobes to trace a kernel function.
Instead of tracing the program (like we did with strace
), we traced the kernel function that implements the syscall.!!
You can unload the module using below command.
sudo rmmod kprobe_example
The journalctl
will show below;
Jan 22 18:28:14 ebpf sudo[4183]: ansil : TTY=pts/0 ; PWD=/home/ansil/lkmpg/kprobes ; USER=root ; COMMAND=/usr/sbin/rmmod kprobe_example
Jan 22 18:28:14 ebpf kernel: Kprobe detached from do_sys_openat2
Below topic is completely optional for you, but the understanding of how to navigate Linux kernel source code will make your life easier.
Finding the function call/symbol of a syscall.#
Look at https://elixir.bootlin.com/linux/v6.5/source/include/linux/syscalls.h#L446 for all syscalls.
This will show below line.
asmlinkage long sys_openat(int dfd, const char __user *filename, int flags,
Then click on the sys_openat
and click on the function definition that points to fs/open.c
https://elixir.bootlin.com/linux/v6.5/source/fs/open.c#L1433
Here you will see;
SYSCALL_DEFINE4(openat, int, dfd, const char __user *, filename, int, flags,
umode_t, mode)
{
if (force_o_largefile())
flags |= O_LARGEFILE;
return do_sys_open(dfd, filename, flags, mode);
}
In that macro, the return is coming from do_sys_open
. Now click on do_sys_open
.
That will take you to the function definition on same file https://elixir.bootlin.com/linux/v6.5/source/fs/open.c#L1419
long do_sys_open(int dfd, const char __user *filename, int flags, umode_t mode)
{
struct open_how how = build_open_how(flags, mode);
return do_sys_openat2(dfd, filename, &how);
}
The function again returns another one called do_sys_openat2
.
If you click on do_sys_openat2
, you can see that there is no call to another function in the return
statement.
static long do_sys_openat2(int dfd, const char __user *filename,
struct open_how *how)
{
struct open_flags op;
int fd = build_open_flags(how, &op);
struct filename *tmp;
if (fd)
return fd;
tmp = getname(filename);
if (IS_ERR(tmp))
return PTR_ERR(tmp);
fd = get_unused_fd_flags(how->flags);
if (fd >= 0) {
struct file *f = do_filp_open(dfd, tmp, &op);
if (IS_ERR(f)) {
put_unused_fd(fd);
fd = PTR_ERR(f);
} else {
fd_install(fd, f);
}
}
putname(tmp);
return fd;
}
As a final step, we can check this function in kernel symbol table to make sure our module can access it.
sudo grep -w do_sys_openat2 /proc/kallsyms
Output:-
ffffffff8e2aadf0 t do_sys_openat2
Yes, it’s available. So we are good to use do_sys_openat2
.
So this confirms that the function that kernel executes during syscall openat
is do_sys_openat2
!! 🎉
If it’s too much to digest, you can comeback to this article later. There are tools made to make this steps easier and we will see those going forward.