Thread-Safe Logging in C: Lock-Free Queues and Why They're Hard
Every production C application needs logging. The mistake most developers make is a mutex around fprintf. It works, but serializes every thread at every log call — the logging path becomes a global bottleneck.
ThreadSafeCLogger solves this with a logging library that stays entirely out of the critical path.
The Core Problem
Logging has two competing requirements:
- Low write latency: The writing thread must not block
- Ordered output: Messages should appear in roughly chronological order
The naive approach satisfies #2 but destroys #1. The ideal is a wait-free producer with an asynchronous consumer.
Architecture: SPSC Ring Buffer per Thread
One single-producer, single-consumer (SPSC) ring buffer per thread, drained by a dedicated background logger thread.
Thread 1 → [ SPSC Ring Buffer 1 ] ─┐
Thread 2 → [ SPSC Ring Buffer 2 ] ─┤→ Logger Thread → File/Stdout
Thread N → [ SPSC Ring Buffer N ] ─┘
SPSC needs only memory barriers, not CAS. The producer writes to tail, the consumer reads from head.
typedef struct {
volatile uint64_t head;
char _pad1[56]; /* separate cache lines — prevents false sharing */
volatile uint64_t tail;
char _pad2[56];
log_entry_t entries[RING_SIZE];
uint32_t size;
} spsc_ring_t;
The Producer Path (Zero Locks)
static inline int ring_push(spsc_ring_t *r, const log_entry_t *e) {
uint64_t tail = r->tail;
uint64_t next = (tail + 1) & (r->size - 1);
if (next == __atomic_load_n(&r->head, __ATOMIC_ACQUIRE))
return -1; /* full — drop */
r->entries[tail] = *e;
__atomic_store_n(&r->tail, next, __ATOMIC_RELEASE);
return 0;
}
__ATOMIC_RELEASE on the tail store pairs with __ATOMIC_ACQUIRE on the consumer's read — ensuring the entry write is visible before the pointer update, without a full fence.
Benchmarks
On a 16-core machine with 8 producer threads:
| Implementation | Throughput | P99 latency |
|---|---|---|
| Mutex + fprintf | 1.2M msg/s | 8.4μs |
| ThreadSafeCLogger | 18.7M msg/s | 0.12μs |
15× improvement from eliminating cross-thread synchronization on the write path.
Edge Cases
Buffer full: Drop with a counter. Blocking defeats the purpose — size the ring so it never fills in practice.
Timestamp ordering: Messages arrive in drain order, not absolute time. Record clock_gettime(CLOCK_MONOTONIC) at write time and sort in the logger.
Shutdown: The logger thread must drain all rings before the process exits. A barrier flag plus a final drain loop handles this.
Source: ThreadSafeCLogger on GitHub.