Thread-Safe Logging in C: Lock-Free Queues and Why They're Hard

Every production C application needs logging. The mistake most developers make is a mutex around fprintf. It works, but serializes every thread at every log call — the logging path becomes a global bottleneck.

ThreadSafeCLogger solves this with a logging library that stays entirely out of the critical path.

The Core Problem

Logging has two competing requirements:

Low write latency: The writing thread must not block
Ordered output: Messages should appear in roughly chronological order

The naive approach satisfies #2 but destroys #1. The ideal is a wait-free producer with an asynchronous consumer.

Architecture: SPSC Ring Buffer per Thread

One single-producer, single-consumer (SPSC) ring buffer per thread, drained by a dedicated background logger thread.

Thread 1  →  [ SPSC Ring Buffer 1 ] ─┐
Thread 2  →  [ SPSC Ring Buffer 2 ] ─┤→  Logger Thread → File/Stdout
Thread N  →  [ SPSC Ring Buffer N ] ─┘

SPSC needs only memory barriers, not CAS. The producer writes to tail, the consumer reads from head.

typedef struct {
    volatile uint64_t head;
    char _pad1[56];        /* separate cache lines — prevents false sharing */
    volatile uint64_t tail;
    char _pad2[56];
    log_entry_t entries[RING_SIZE];
    uint32_t    size;
} spsc_ring_t;

The Producer Path (Zero Locks)

static inline int ring_push(spsc_ring_t *r, const log_entry_t *e) {
    uint64_t tail = r->tail;
    uint64_t next = (tail + 1) & (r->size - 1);
    if (next == __atomic_load_n(&r->head, __ATOMIC_ACQUIRE))
        return -1;  /* full — drop */
    r->entries[tail] = *e;
    __atomic_store_n(&r->tail, next, __ATOMIC_RELEASE);
    return 0;
}

__ATOMIC_RELEASE on the tail store pairs with __ATOMIC_ACQUIRE on the consumer's read — ensuring the entry write is visible before the pointer update, without a full fence.

Benchmarks

On a 16-core machine with 8 producer threads:

Implementation	Throughput	P99 latency
Mutex + fprintf	1.2M msg/s	8.4μs
ThreadSafeCLogger	18.7M msg/s	0.12μs

15× improvement from eliminating cross-thread synchronization on the write path.

Edge Cases

Buffer full: Drop with a counter. Blocking defeats the purpose — size the ring so it never fills in practice.

Timestamp ordering: Messages arrive in drain order, not absolute time. Record clock_gettime(CLOCK_MONOTONIC) at write time and sort in the logger.

Shutdown: The logger thread must drain all rings before the process exits. A barrier flag plus a final drain loop handles this.

Source: ThreadSafeCLogger on GitHub.