Building a Software TCP/IP Stack from Scratch in C
When I started the SimpleSoftwareBasedTCPIPStack project, the goal was simple: understand every layer of the network stack by building it. Not using it — building it. The kind of understanding you only get when a packet doesn't arrive and you have to figure out why.
Why Build One?
Every network engineer eventually hits a moment where the abstraction breaks. A mysterious drop that tcpdump can't explain. A retransmit storm that shouldn't exist. A connection that hangs in CLOSE_WAIT forever.
Building a stack from scratch gives you that model. You understand why TIME_WAIT exists (delayed packets). Why Nagle's algorithm exists and when to disable it. What "three-way handshake" means at the byte level.
Architecture
Application Layer → read()/write() socket API
Transport Layer → TCP (state machine + retransmit timer)
Network Layer → IP (routing table, fragmentation, ICMP)
Link Layer → ARP + Ethernet frame construction
Physical Layer → Raw socket / Linux TAP device
The TAP Device Interface
Instead of a kernel module, I used Linux TAP devices — virtual network interfaces that let user-space programs send and receive raw Ethernet frames.
int tun_alloc(char *dev, int flags) {
struct ifreq ifr;
int fd = open("/dev/net/tun", O_RDWR);
memset(&ifr, 0, sizeof(ifr));
ifr.ifr_flags = flags;
if (*dev) strncpy(ifr.ifr_name, dev, IFNAMSIZ);
ioctl(fd, TUNSETIFF, &ifr);
strcpy(dev, ifr.ifr_name);
return fd;
}
ARP
ARP is where most "why doesn't my packet arrive?" problems live. The protocol broadcasts "who has IP X?" and caches the reply — but the edge cases are not simple.
void arp_send_request(struct netdev *dev, uint32_t target_ip) {
struct arp_hdr *arp = (struct arp_hdr *)eth_frame_payload(frame);
arp->hwtype = htons(ARP_ETHERNET);
arp->protype = htons(ETH_P_IP);
arp->opcode = htons(ARP_REQUEST);
memcpy(arp->sender_hw, dev->hwaddr, 6);
arp->sender_ip = dev->addr;
memset(arp->target_hw, 0x00, 6);
arp->target_ip = target_ip;
eth_tx(dev, frame, ETH_P_ARP, broadcast_mac);
}
TCP State Machine
TCP has 11 states. Miss one transition and you get a half-open connection that hangs indefinitely.
CLOSED → LISTEN → SYN_RCVD → ESTABLISHED → CLOSE_WAIT → LAST_ACK → CLOSED
↘ FIN_WAIT_1 → FIN_WAIT_2 → TIME_WAIT → CLOSED
The retransmission timer uses exponential backoff — doubling the RTO on each miss, capped at 64 seconds.
What I Learned
Byte order is everywhere. htons(), ntohl(), htonl() — miss one and the packet looks fine in the debugger but is garbage on the wire.
Checksums must be perfect. TCP and IP checksums use overlapping pseudo-headers. Get this wrong and the receiving kernel silently drops your packets.
TCP is not a stream. You can send 100 bytes and receive two 50-byte reads. Applications that assume otherwise break in subtle ways.
TIME_WAIT is your friend. Tuning it away causes duplicate packet confusion at high connection rates.