Building FastRace: A High-Performance Traceroute Implementation in Pure C

As a network engineer and C programmer, I've always been fascinated by the tools we use daily for network diagnostics. Traceroute is one of those essential utilities that helps us understand network paths, identify bottlenecks, and diagnose routing issues. However, I was often frustrated by its slow execution and limited visualization capabilities. This led me to create FastRace - a blazingly fast, dependency-free traceroute implementation written in pure C that significantly outperforms the standard traceroute utility. The Problem with Standard Traceroute While the standard traceroute utility works well for basic needs, it has several limitations: Sequential Processing: It probes one TTL at a time, waiting for timeouts before moving to the next hop Inefficient I/O: Blocking socket operations lead to wasted CPU cycles Poor Visualization: Flat output makes it difficult to understand complex routing topologies Static Timeouts: Fixed timeout values don't adapt to network conditions The FastRace Solution FastRace addresses these limitations with a ground-up redesign focused on performance and usability: Concurrent TTL Probing: Probes multiple TTLs simultaneously Non-blocking I/O: Efficient response processing with select() Visual Path Representation: Tree-like format for better visualization Adaptive Timeout Management: Dynamic scaling based on network conditions Technical Architecture Core Design FastRace uses a dual socket architecture - one for sending UDP probes and another for capturing ICMP responses: // Create UDP socket for sending packets send_sock = socket(AF_INET, SOCK_DGRAM, IPPROTO_UDP); // Create raw ICMP socket for receiving responses recv_sock = socket(AF_INET, SOCK_RAW, IPPROTO_ICMP); Tracking Probes Each probe is tracked using a specialized structure that maintains all relevant information: typedef struct { int ttl; /* Time-to-Live value / int probe; / Probe sequence number / struct timeval sent_time; / Timestamp when sent / int received; / Whether response was received / struct in_addr addr; / Address of responding hop / double rtt; / Round-trip time in ms / int port; / UDP port used for this probe / } probe_t; The Secret Sauce: Concurrent TTL Probing The key innovation in FastRace is its ability to probe multiple TTLs concurrently. While standard traceroute must wait for responses from one TTL before moving to the next, FastRace keeps multiple TTLs active simultaneously: #define MAX_ACTIVE_TTLS 5 / Maximum number of TTLs probed concurrently / / Start traceroute with concurrent TTL probing / int next_ttl_to_print = 1; / Next TTL to display results for / int current_ttl = 1; / Next TTL to start probing */ while (next_ttl_to_print ip_hl ip_hl

Mar 12, 2025 - 16:31

Building FastRace: A High-Performance Traceroute Implementation in Pure C

As a network engineer and C programmer, I've always been fascinated by the tools we use daily for network diagnostics. Traceroute is one of those essential utilities that helps us understand network paths, identify bottlenecks, and diagnose routing issues. However, I was often frustrated by its slow execution and limited visualization capabilities.

This led me to create FastRace - a blazingly fast, dependency-free traceroute implementation written in pure C that significantly outperforms the standard traceroute utility.

The Problem with Standard Traceroute

While the standard traceroute utility works well for basic needs, it has several limitations:

Sequential Processing: It probes one TTL at a time, waiting for timeouts before moving to the next hop
Inefficient I/O: Blocking socket operations lead to wasted CPU cycles
Poor Visualization: Flat output makes it difficult to understand complex routing topologies
Static Timeouts: Fixed timeout values don't adapt to network conditions

The FastRace Solution

FastRace addresses these limitations with a ground-up redesign focused on performance and usability:

Concurrent TTL Probing: Probes multiple TTLs simultaneously
Non-blocking I/O: Efficient response processing with select()
Visual Path Representation: Tree-like format for better visualization
Adaptive Timeout Management: Dynamic scaling based on network conditions

Technical Architecture

Core Design

FastRace uses a dual socket architecture - one for sending UDP probes and another for capturing ICMP responses:

// Create UDP socket for sending packets
send_sock = socket(AF_INET, SOCK_DGRAM, IPPROTO_UDP);

// Create raw ICMP socket for receiving responses
recv_sock = socket(AF_INET, SOCK_RAW, IPPROTO_ICMP);

Tracking Probes

Each probe is tracked using a specialized structure that maintains all relevant information:

typedef struct {
    int ttl;                /* Time-to-Live value */
    int probe;              /* Probe sequence number */
    struct timeval sent_time; /* Timestamp when sent */
    int received;           /* Whether response was received */
    struct in_addr addr;    /* Address of responding hop */
    double rtt;             /* Round-trip time in ms */
    int port;              /* UDP port used for this probe */
} probe_t;

The Secret Sauce: Concurrent TTL Probing

The key innovation in FastRace is its ability to probe multiple TTLs concurrently. While standard traceroute must wait for responses from one TTL before moving to the next, FastRace keeps multiple TTLs active simultaneously:

#define MAX_ACTIVE_TTLS 5   /* Maximum number of TTLs probed concurrently */

/* Start traceroute with concurrent TTL probing */
int next_ttl_to_print = 1;  /* Next TTL to display results for */
int current_ttl = 1;        /* Next TTL to start probing */

while (next_ttl_to_print <= MAX_TTL && !finished) {
    /* Send probes for up to MAX_ACTIVE_TTLS TTLs */
    while (current_ttl <= MAX_TTL && 
           (current_ttl - next_ttl_to_print) < MAX_ACTIVE_TTLS) {
        for (int i = 0; i < NUM_PROBES; i++) {
            send_probe(current_ttl, i);
            usleep(50000); // 50ms between probes
        }
        /* Record time after sending all probes for this TTL */
        gettimeofday(&last_probe_time[current_ttl - 1], NULL);
        current_ttl++;
    }

    /* Process responses in a tight loop */
    for (int i = 0; i < 10; i++) {
        process_responses(WAIT_TIMEOUT_MS);
    }

    // Check and print results for completed TTLs...
}

This approach allows FastRace to start probing later hops before earlier ones have completed, dramatically reducing total trace time.

Efficient Response Processing

Instead of using blocking socket operations, FastRace employs select() with configurable timeouts to efficiently process incoming packets:

int process_responses(int timeout_ms) {
    fd_set readfds;
    FD_ZERO(&readfds);
    FD_SET(recv_sock, &readfds);

    struct timeval timeout;
    timeout.tv_sec = timeout_ms / 1000;
    timeout.tv_usec = (timeout_ms % 1000) * 1000;

    int ret = select(recv_sock + 1, &readfds, NULL, NULL, &timeout);
    if (ret <= 0) return ret;

    // Process incoming packet...
}

Advanced Probe Identification

FastRace implements a clever technique to match responses to their corresponding probes by using unique UDP ports:

int port = BASE_PORT + (ttl * NUM_PROBES) + probe_num;

When processing ICMP responses, we extract the original UDP header and match by port number:

struct udphdr *orig_udp = (struct udphdr *)(buffer + ip_header_len + 8 + orig_ip_header_len);
int orig_port = ntohs(orig_udp->uh_dport);

for (int ttl = 1; ttl <= MAX_TTL; ttl++) {
    for (int i = 0; i < NUM_PROBES; i++) {
        int idx = (ttl - 1) * NUM_PROBES + i;

        if (!probes[idx].received && probes[idx].port == orig_port) {
            // Match found! Record the response...
            probes[idx].received = 1;
            probes[idx].addr = recv_addr.sin_addr;
            double rtt = (recv_time.tv_sec - probes[idx].sent_time.tv_sec) * 1000.0 +
                        (recv_time.tv_usec - probes[idx].sent_time.tv_usec) / 1000.0;
            probes[idx].rtt = rtt;
            return 1;
        }
    }
}

Visual Path Representation

FastRace improves on the traditional flat output format with a tree-like visualization that clearly shows branching paths:

// Print first address with hostname
char *hostname = resolve_hostname(hop_addrs[0]);
printf("→ %-15s (%6.2f ms)", inet_ntoa(hop_addrs[0]), hop_rtts[0]);
if (hostname) {
    printf(" %s", hostname);
    free(hostname);
}
printf("\n");

// Print additional addresses if any
for (int i = 1; i < unique_addrs; i++) {
    printf("      └→ %-15s (%6.2f ms)", inet_ntoa(hop_addrs[i]), hop_rtts[i]);
    char *alt_hostname = resolve_hostname(hop_addrs[i]);
    if (alt_hostname) {
        printf(" %s", alt_hostname);
        free(alt_hostname);
    }
    printf("\n");
}

This produces output like:

7   │→ 72.14.209.224    ( 60.76 ms)
      └→ 72.14.223.184   ( 61.65 ms)
8   │→ 142.251.244.109  ( 59.57 ms)
      └→ 216.239.62.49   ( 71.36 ms)
      └→ 142.250.210.95  ( 70.25 ms)

Which makes it easy to identify load-balanced routes and alternative paths.

Performance Benchmarks

FastRace significantly outperforms standard traceroute in several key metrics:

Metric	Standard Traceroute	FastRace	Improvement
Total trace time (30 hops)	~15-20 seconds	~5-8 seconds	60-70% faster
Memory usage	~400-600 KB	~120-150 KB	70-75% less memory
CPU utilization	5-8%	2-3%	60% less CPU
Packet efficiency	1 TTL at a time	Up to 5 TTLs concurrently	5x throughput

Implementation Challenges

Raw Socket Handling

One of the most challenging aspects was properly handling raw sockets and parsing ICMP packets. Unlike standard sockets, raw sockets require careful handling of protocol headers:

struct ip *ip = (struct ip *)buffer;
int ip_header_len = ip->ip_hl << 2;
if (bytes < ip_header_len + ICMP_MINLEN) {
    debug_print("Packet too small: %d bytes\n", bytes);
    return 0;
}

struct icmp *icmp = (struct icmp *)(buffer + ip_header_len);

Additionally, when an ICMP Time Exceeded message is received, the original IP and UDP headers are embedded within it:

struct ip *orig_ip = (struct ip *)(buffer + ip_header_len + 8);
int orig_ip_header_len = orig_ip->ip_hl << 2;

if (bytes < ip_header_len + 8 + orig_ip_header_len + 8) {
    debug_print("  Not enough data for original header\n");
    return 1;
}

struct udphdr *orig_udp = (struct udphdr *)(buffer + ip_header_len + 8 + orig_ip_header_len);

Port-based Probe Identification

Another challenge was reliably matching responses to their corresponding probes. I solved this by using unique port numbers for each probe:

int port = BASE_PORT + (ttl * NUM_PROBES) + probe_num;
struct sockaddr_in probe_dest = dest_addr;
probe_dest.sin_port = htons(port);

This allows for precise tracking of each probe without requiring complex packet identification mechanisms.

Balancing Responsiveness and Performance

Finding the right balance between responsiveness and performance required careful tuning of timeout values and probe rates:

#define WAIT_TIMEOUT_MS 50  /* Moderate timeout for better responsiveness */
#define TTL_TIMEOUT 5000    /* Timeout per TTL in ms */

Too aggressive and the system would flood the network; too conservative and we'd lose our performance advantage.

Optimizations

Compiler Optimization Flags

The Makefile includes optimization flags for maximum performance:

CFLAGS = -O3 -Wall -Wextra

For maximum performance, we use architecture-specific optimizations:

optimized: $(SRC) $(HEADERS)
    $(CC) $(CFLAGS) -march=native -mtune=native -flto -o $(TARGET) $(SRC) $(LDFLAGS)

Memory Efficiency

FastRace maintains a minimal memory footprint by using pre-allocated arrays instead of dynamic memory:

probe_t probes[MAX_TTL * NUM_PROBES];
struct timeval last_probe_time[MAX_TTL];

Memory is only allocated dynamically for hostname resolution, which is immediately freed after use.

Using FastRace

Building from Source

# Standard optimized build
make

# Build with maximum performance optimizations
make optimized

# Install to system (default: /usr/local/bin)
sudo make install

Basic Usage

sudo ./fastrace google.com

Output:

Tracing route to google.com (172.217.168.46)
Maximum hops: 30, Protocol: UDP
TTL │ IP Address         (RTT ms)    Hostname
────┼───────────────────────────────────────────
1   │→ 192.168.1.1      (  2.58 ms) router.local
2   │→ * * * (timeout)
3   │→ * * * (timeout)
4   │→ 37.26.81.21      ( 88.01 ms)
5   │→ 79.140.91.10     ( 31.21 ms)
6   │→ 195.22.202.203   ( 38.73 ms)
7   │→ 72.14.209.224    ( 60.76 ms)
      └→ 72.14.223.184   ( 61.65 ms)
8   │→ 142.251.244.109  ( 59.57 ms)
      └→ 216.239.62.49   ( 71.36 ms)
      └→ 142.250.210.95  ( 70.25 ms)
9   │→ 142.251.247.141  ( 59.79 ms)
10  │→ 34.8.172.215     ( 62.42 ms) 215.172.8.34.bc.googleusercontent.com

Future Improvements

I have several enhancements planned for FastRace:

IPv6 Support: Adding support for tracing IPv6 routes
TCP Probing Mode: Alternative probe method for bypassing UDP-filtered routes
Interactive Mode: Real-time visualization with keystroke controls
Enhanced Statistics: More detailed timing and packet loss analysis
JSON Output Mode: Structured output for programmatic analysis

Conclusion

Building FastRace has been an enlightening journey into the depths of networking protocols and C programming. By focusing on performance and usability from the ground up, I was able to create a tool that significantly outperforms the standard traceroute utility while providing better visualization capabilities.

The project demonstrates how revisiting and rethinking established tools can lead to substantial improvements, even in areas as mature as network diagnostics.

If you're interested in contributing or have suggestions for improvements, please visit the GitHub repository.

Technical Requirements

Operating System: Linux, macOS, or other Unix-like systems with raw socket support
Permissions: Root/sudo access required (raw sockets)
Compiler: GCC with C99 support or later
Architecture: x86, x86_64, ARM, or any platform with standard C library support

About the Author

Davide Santangelo is a software engineer and network specialist with a passion for high-performance systems programming and networking tools.

You can find more about me on GitHub.