Building a Software TCP/IP Stack from Scratch in C

When I started the SimpleSoftwareBasedTCPIPStack project, the goal was simple: understand every layer of the network stack by building it. Not using it — building it. The kind of understanding you only get when a packet doesn't arrive and you have to figure out why.

Why Build One?

Every network engineer eventually hits a moment where the abstraction breaks. A mysterious drop that tcpdump can't explain. A retransmit storm that shouldn't exist. A connection that hangs in CLOSE_WAIT forever.

Building a stack from scratch gives you that model. You understand why TIME_WAIT exists (delayed packets). Why Nagle's algorithm exists and when to disable it. What "three-way handshake" means at the byte level.

Architecture

Application Layer  →  read()/write() socket API
Transport Layer    →  TCP (state machine + retransmit timer)
Network Layer      →  IP (routing table, fragmentation, ICMP)
Link Layer         →  ARP + Ethernet frame construction
Physical Layer     →  Raw socket / Linux TAP device

The TAP Device Interface

Instead of a kernel module, I used Linux TAP devices — virtual network interfaces that let user-space programs send and receive raw Ethernet frames.

c
int tun_alloc(char *dev, int flags) {
    struct ifreq ifr;
    int fd = open("/dev/net/tun", O_RDWR);
    memset(&ifr, 0, sizeof(ifr));
    ifr.ifr_flags = flags;
    if (*dev) strncpy(ifr.ifr_name, dev, IFNAMSIZ);
    ioctl(fd, TUNSETIFF, &ifr);
    strcpy(dev, ifr.ifr_name);
    return fd;
}

ARP

ARP is where most "why doesn't my packet arrive?" problems live. The protocol broadcasts "who has IP X?" and caches the reply — but the edge cases are not simple.

c
void arp_send_request(struct netdev *dev, uint32_t target_ip) {
    struct arp_hdr *arp = (struct arp_hdr *)eth_frame_payload(frame);
    arp->hwtype    = htons(ARP_ETHERNET);
    arp->protype   = htons(ETH_P_IP);
    arp->opcode    = htons(ARP_REQUEST);
    memcpy(arp->sender_hw, dev->hwaddr, 6);
    arp->sender_ip = dev->addr;
    memset(arp->target_hw, 0x00, 6);
    arp->target_ip = target_ip;
    eth_tx(dev, frame, ETH_P_ARP, broadcast_mac);
}

TCP State Machine

TCP has 11 states. Miss one transition and you get a half-open connection that hangs indefinitely.

CLOSED → LISTEN → SYN_RCVD → ESTABLISHED → CLOSE_WAIT → LAST_ACK → CLOSED
                             ↘ FIN_WAIT_1 → FIN_WAIT_2 → TIME_WAIT → CLOSED

The retransmission timer uses exponential backoff — doubling the RTO on each miss, capped at 64 seconds.

What I Learned

Byte order is everywhere. htons(), ntohl(), htonl() — miss one and the packet looks fine in the debugger but is garbage on the wire.

Checksums must be perfect. TCP and IP checksums use overlapping pseudo-headers. Get this wrong and the receiving kernel silently drops your packets.

TCP is not a stream. You can send 100 bytes and receive two 50-byte reads. Applications that assume otherwise break in subtle ways.

TIME_WAIT is your friend. Tuning it away causes duplicate packet confusion at high connection rates.

Source: SimpleSoftwareBasedTCPIPStack on GitHub.