eBPF for Production Debugging: Kernel Instrumentation Without Rebooting

Before eBPF, debugging kernel-level performance problems in production required rebooting with debug parameters, adding invasive instrumentation, or hoping perf stat gave enough signal. eBPF changed that.

What eBPF Is

eBPF is a sandboxed bytecode VM inside the Linux kernel. You write restricted C, compile to eBPF bytecode with clang/LLVM, and load via the bpf() syscall. The kernel verifier checks the program — no infinite loops, bounded memory access, safe stack — then JIT-compiles it to native code.

Result: arbitrary instrumentation running in kernel context, zero risk of crashing the machine, no recompile.

Attachment Points

  • kprobes/kretprobes: Any kernel function entry/return
  • tracepoints: Stable kernel instrumentation points (preferred over kprobes)
  • uprobes: User-space function entry/return
  • XDP: Network driver level, before the kernel network stack
  • perf events: Hardware counter sampling

A Real Debugging Case

We had intermittent 50ms latency spikes. strace showed time in read() syscalls. eBPF gave us the distribution:

c
BPF_HISTOGRAM(read_latency_us);
BPF_HASH(start_times, u32);

int trace_read_entry(struct pt_regs *ctx) {
    u32 pid = bpf_get_current_pid_tgid() >> 32;
    u64 ts  = bpf_ktime_get_ns();
    start_times.update(&pid, &ts);
    return 0;
}

int trace_read_return(struct pt_regs *ctx) {
    u32 pid  = bpf_get_current_pid_tgid() >> 32;
    u64 *tsp = start_times.lookup(&pid);
    if (!tsp) return 0;
    u64 delta_us = (bpf_ktime_get_ns() - *tsp) / 1000;
    read_latency_us.increment(bpf_log2l(delta_us));
    start_times.delete(&pid);
    return 0;
}

Output showed a bimodal distribution: most reads under 100μs, 0.1% hitting 40-60ms. The outliers were reads from a forgotten NFS mount. Mystery solved in 10 minutes with zero application changes.

XDP for Line-Rate Packet Processing

XDP runs eBPF at the network driver layer — before socket buffers, before protocol stacks.

c
SEC("xdp")
int xdp_drop_icmp(struct xdp_md *ctx) {
    void *data     = (void *)(long)ctx->data;
    void *data_end = (void *)(long)ctx->data_end;

    struct ethhdr *eth = data;
    if ((void *)(eth + 1) > data_end) return XDP_PASS;
    if (eth->h_proto != htons(ETH_P_IP)) return XDP_PASS;

    struct iphdr *ip = (void *)(eth + 1);
    if ((void *)(ip + 1) > data_end) return XDP_PASS;
    if (ip->protocol == IPPROTO_ICMP) return XDP_DROP;

    return XDP_PASS;
}

Notice the mandatory bounds checks — the verifier requires proof that every pointer dereference is safe.

Verifier Constraints

  • No unbounded loops (use bpf_loop() in newer kernels for bounded iteration)
  • 512-byte stack limit per eBPF frame
  • NULL check required after every bpf_map_lookup_elem()

The constraints are the price for the safety guarantee. On modern kernels (5.13+) they're much less restrictive.

One-Liners with bpftrace

bash
# Syscall latency histogram by process name
bpftrace -e '
  tracepoint:syscalls:sys_enter_read { @start[tid] = nsecs; }
  tracepoint:syscalls:sys_exit_read  {
    @us[comm] = hist((nsecs - @start[tid]) / 1000);
    delete(@start[tid]);
  }'

When to Use eBPF

Right tool when:

  • Standard perf/strace don't give enough signal
  • You need to filter/aggregate inside the kernel (no user-space overhead)
  • You need packet processing at line rate
  • You can't modify application code

Start with perf stat and perf record. Reach for eBPF when you're out of signal from those.