Skip to main content
  1. Posts/

eBPF for Linux Admins: Part I

·4 mins· loading · loading ·
Ansil H
ebpf
Author
Ansil H
DevOps Guy
eBPF - This article is part of a series.
Part 1: This Article
This article series is based on my journey to demystify eBPF.

Pre-requisites
#

To get the most out of this article, it’s helpful to have some background in Linux networking and packet tracing with tcpdump. Some of the internals were intentionally excluded to simplify the topic.

Classic BPF
#

Let’s take the scenario were you wanted to observe all ARP packet coming to the NIC. The packet first lands in the network device hardware and then later will be placed in an receive queue (RX_RING) inside the Kernel.

For a user to see ARP packets, packets needs to be copied from kernel space to the user space. Then each of the packets needs to be filtered based on its packet type; ARP.

If the system is going to copy all packets get’s into RX_RING to user space and then checking for a matching packet type, system have to do packet copy from kernel space to users pace. Switching CPU from kernel space to user space to copy packet is inefficient and will affect the system performance.

So how can we filter packets which are - on-the way - within the kernel space and copy only the matching packets to user space?

Here comes the BPF or Berkley Packet Filter.

The BPF virtual machine is a pseudo VM inside the Linux kernel. For the sake of simplicity, you can consider this as a JavaScript engine inside your browser!

One of the tool in Linux that uses BPF is the tcpdump which utilizes the BPF for packet filtering.

BPF

The BPF VM supports a limited set of instructions and there are many restrictions to the usage as well.

Below are the registers in BPF VM (or pseudo-machine)

  • A 32bit wide accumulator [A] where the contents of the packet get loaded.
  • A 32bit wide index register [X].
  • A scratch memory area of 16 32bit registers.
  • A program counter.

The filters we pass to tcpdump command will be converted into “byte code” and then injected directly into the kernel.(More about byte code will be coming later in this article.)

The load instructions loads the packet data to accumulator, and then we can examine the packets in BPF VM.

Let’s examine the code generated by the tcpdump command that filters the ARP packets coming to interface ens33.

[root@localhost ~]# tcpdump -i ens33 arp -d
(000) ldh      [12]
(001) jeq      #0x806           jt 2    jf 3
(002) ret      #262144
(003) ret      #0
[root@localhost ~]#

Explanation

(000) ldh - Load half word (16 bits) from index 12 of the packet ; skip 6 byte dst mac and 6 byte src mac.
(001) jeq - If accumulator value is 0x806 ; ie ARP packet, then jump to 2 else jump to 3
(002) ret - Return the contents with buffer size 262144 ; ie entire packet or [max snapshot length](https://github.com/the-tcpdump-group/tcpdump/blob/tcpdump-4.9/netdissect.h#L263)
(003) ret - Return nothing to the users pace 

You can find more details of the inner working of BPF in this Usenix paper

So the above filter skips the source and destination mac fields and then loads 16bits from the index 12 which is the packet type.

So the 16bits - 0x806 (0000100000000110) at offset 12 will try to match ARP packet!

Few points to note;

  • The Ethernet type II packet have below format;

  • Ethernet packets are big-endain.

  • In a 32bit system, a full word is 32bit, half word is 16bit.

  • 1 byte = 8bits, 2 byte = 16bits

  • You can find the Ethernet type hex representation of packet types in IANA

    ------------------------------------------------------------------------------------------------------------------------------------------------
    Ethertype (decimal) 	Ethertype (hex) 	Exp. Ethernet (decimal) 	Exp. Ethernet (octal) 	Description 	                    Reference 
    ------------------------------------------------------------------------------------------------------------------------------------------------
    2054                    0806                -                           -                       Address Resolution Protocol (ARP)   [RFC7042]
    ------------------------------------------------------------------------------------------------------------------------------------------------
    

The Byte Code
#

The BPF program we discussed above can be converted to byte code.

What is byte code?

A compact, platform-independent instruction set designed for execution by a virtual machine, rather than directly by a physical CPU. In this case the VM is a BPF pseudo VM sitting inside the Kernel.

The user space can inject this bytecode to the BPF pseudo VM and the VM will convert that to the architecture dependant assembly code which can be executed directly on the hardware.

We can generate the bytecode of the BPF instruction in tcpdump itself.

[root@localhost ~]# tcpdump -i ens33 arp -ddd
4
40 0 0 12
21 0 1 2054
6 0 0 262144
6 0 0 0

The bytecode can be injected into the system in different ways. The tcmpdump utility have it’s own logic to do this operation.

With that we concludes the Part - 1 of eBPF for Linux Admins here.

In the next part, we will discuss eXpressDataPath - XDP and eBPF.

eBPF - This article is part of a series.
Part 1: This Article