eBPF for Production Debugging: Kernel Instrumentation Without Rebooting

Before eBPF, debugging kernel-level performance problems in production required rebooting with debug parameters, adding invasive instrumentation, or hoping perf stat gave enough signal. eBPF changed that.

What eBPF Is

eBPF is a sandboxed bytecode VM inside the Linux kernel. You write restricted C, compile to eBPF bytecode with clang/LLVM, and load via the bpf() syscall. The kernel verifier checks the program — no infinite loops, bounded memory access, safe stack — then JIT-compiles it to native code.

Result: arbitrary instrumentation running in kernel context, zero risk of crashing the machine, no recompile.

Attachment Points

kprobes/kretprobes: Any kernel function entry/return
tracepoints: Stable kernel instrumentation points (preferred over kprobes)
uprobes: User-space function entry/return
XDP: Network driver level, before the kernel network stack
perf events: Hardware counter sampling

A Real Debugging Case

We had intermittent 50ms latency spikes. strace showed time in read() syscalls. eBPF gave us the distribution:

BPF_HISTOGRAM(read_latency_us);
BPF_HASH(start_times, u32);

int trace_read_entry(struct pt_regs *ctx) {
    u32 pid = bpf_get_current_pid_tgid() >> 32;
    u64 ts  = bpf_ktime_get_ns();
    start_times.update(&pid, &ts);
    return 0;
}

int trace_read_return(struct pt_regs *ctx) {
    u32 pid  = bpf_get_current_pid_tgid() >> 32;
    u64 *tsp = start_times.lookup(&pid);
    if (!tsp) return 0;
    u64 delta_us = (bpf_ktime_get_ns() - *tsp) / 1000;
    read_latency_us.increment(bpf_log2l(delta_us));
    start_times.delete(&pid);
    return 0;
}

Output showed a bimodal distribution: most reads under 100μs, 0.1% hitting 40-60ms. The outliers were reads from a forgotten NFS mount. Mystery solved in 10 minutes with zero application changes.

XDP for Line-Rate Packet Processing

XDP runs eBPF at the network driver layer — before socket buffers, before protocol stacks.

SEC("xdp")
int xdp_drop_icmp(struct xdp_md *ctx) {
    void *data     = (void *)(long)ctx->data;
    void *data_end = (void *)(long)ctx->data_end;

    struct ethhdr *eth = data;
    if ((void *)(eth + 1) > data_end) return XDP_PASS;
    if (eth->h_proto != htons(ETH_P_IP)) return XDP_PASS;

    struct iphdr *ip = (void *)(eth + 1);
    if ((void *)(ip + 1) > data_end) return XDP_PASS;
    if (ip->protocol == IPPROTO_ICMP) return XDP_DROP;

    return XDP_PASS;
}

Notice the mandatory bounds checks — the verifier requires proof that every pointer dereference is safe.

Verifier Constraints

No unbounded loops (use bpf_loop() in newer kernels for bounded iteration)
512-byte stack limit per eBPF frame
NULL check required after every bpf_map_lookup_elem()

The constraints are the price for the safety guarantee. On modern kernels (5.13+) they're much less restrictive.

One-Liners with bpftrace

bash

# Syscall latency histogram by process name
bpftrace -e '
  tracepoint:syscalls:sys_enter_read { @start[tid] = nsecs; }
  tracepoint:syscalls:sys_exit_read  {
    @us[comm] = hist((nsecs - @start[tid]) / 1000);
    delete(@start[tid]);
  }'

When to Use eBPF

Right tool when:

Standard perf/strace don't give enough signal
You need to filter/aggregate inside the kernel (no user-space overhead)
You need packet processing at line rate
You can't modify application code

Start with perf stat and perf record. Reach for eBPF when you're out of signal from those.