eBPF for Production Debugging: Kernel Instrumentation Without Rebooting
Before eBPF, debugging kernel-level performance problems in production required rebooting with debug parameters, adding invasive instrumentation, or hoping perf stat gave enough signal. eBPF changed that.
What eBPF Is
eBPF is a sandboxed bytecode VM inside the Linux kernel. You write restricted C, compile to eBPF bytecode with clang/LLVM, and load via the bpf() syscall. The kernel verifier checks the program — no infinite loops, bounded memory access, safe stack — then JIT-compiles it to native code.
Result: arbitrary instrumentation running in kernel context, zero risk of crashing the machine, no recompile.
Attachment Points
- kprobes/kretprobes: Any kernel function entry/return
- tracepoints: Stable kernel instrumentation points (preferred over kprobes)
- uprobes: User-space function entry/return
- XDP: Network driver level, before the kernel network stack
- perf events: Hardware counter sampling
A Real Debugging Case
We had intermittent 50ms latency spikes. strace showed time in read() syscalls. eBPF gave us the distribution:
BPF_HISTOGRAM(read_latency_us);
BPF_HASH(start_times, u32);
int trace_read_entry(struct pt_regs *ctx) {
u32 pid = bpf_get_current_pid_tgid() >> 32;
u64 ts = bpf_ktime_get_ns();
start_times.update(&pid, &ts);
return 0;
}
int trace_read_return(struct pt_regs *ctx) {
u32 pid = bpf_get_current_pid_tgid() >> 32;
u64 *tsp = start_times.lookup(&pid);
if (!tsp) return 0;
u64 delta_us = (bpf_ktime_get_ns() - *tsp) / 1000;
read_latency_us.increment(bpf_log2l(delta_us));
start_times.delete(&pid);
return 0;
}
Output showed a bimodal distribution: most reads under 100μs, 0.1% hitting 40-60ms. The outliers were reads from a forgotten NFS mount. Mystery solved in 10 minutes with zero application changes.
XDP for Line-Rate Packet Processing
XDP runs eBPF at the network driver layer — before socket buffers, before protocol stacks.
SEC("xdp")
int xdp_drop_icmp(struct xdp_md *ctx) {
void *data = (void *)(long)ctx->data;
void *data_end = (void *)(long)ctx->data_end;
struct ethhdr *eth = data;
if ((void *)(eth + 1) > data_end) return XDP_PASS;
if (eth->h_proto != htons(ETH_P_IP)) return XDP_PASS;
struct iphdr *ip = (void *)(eth + 1);
if ((void *)(ip + 1) > data_end) return XDP_PASS;
if (ip->protocol == IPPROTO_ICMP) return XDP_DROP;
return XDP_PASS;
}
Notice the mandatory bounds checks — the verifier requires proof that every pointer dereference is safe.
Verifier Constraints
- No unbounded loops (use
bpf_loop()in newer kernels for bounded iteration) - 512-byte stack limit per eBPF frame
- NULL check required after every
bpf_map_lookup_elem()
The constraints are the price for the safety guarantee. On modern kernels (5.13+) they're much less restrictive.
One-Liners with bpftrace
# Syscall latency histogram by process name
bpftrace -e '
tracepoint:syscalls:sys_enter_read { @start[tid] = nsecs; }
tracepoint:syscalls:sys_exit_read {
@us[comm] = hist((nsecs - @start[tid]) / 1000);
delete(@start[tid]);
}'
When to Use eBPF
Right tool when:
- Standard
perf/stracedon't give enough signal - You need to filter/aggregate inside the kernel (no user-space overhead)
- You need packet processing at line rate
- You can't modify application code
Start with perf stat and perf record. Reach for eBPF when you're out of signal from those.