Building FastRace: A High-Performance Traceroute Implementation in Pure C
As a network engineer and C programmer, I've always been fascinated by the tools we use daily for network diagnostics. Traceroute is one of those essential utilities that helps us understand network paths, identify bottlenecks, and diagnose routing issues. However, I was often frustrated by its slow execution and limited visualization capabilities. This led me to create FastRace - a blazingly fast, dependency-free traceroute implementation written in pure C that significantly outperforms the standard traceroute utility. The Problem with Standard Traceroute While the standard traceroute utility works well for basic needs, it has several limitations: Sequential Processing: It probes one TTL at a time, waiting for timeouts before moving to the next hop Inefficient I/O: Blocking socket operations lead to wasted CPU cycles Poor Visualization: Flat output makes it difficult to understand complex routing topologies Static Timeouts: Fixed timeout values don't adapt to network conditions The FastRace Solution FastRace addresses these limitations with a ground-up redesign focused on performance and usability: Concurrent TTL Probing: Probes multiple TTLs simultaneously Non-blocking I/O: Efficient response processing with select() Visual Path Representation: Tree-like format for better visualization Adaptive Timeout Management: Dynamic scaling based on network conditions Technical Architecture Core Design FastRace uses a dual socket architecture - one for sending UDP probes and another for capturing ICMP responses: // Create UDP socket for sending packets send_sock = socket(AF_INET, SOCK_DGRAM, IPPROTO_UDP); // Create raw ICMP socket for receiving responses recv_sock = socket(AF_INET, SOCK_RAW, IPPROTO_ICMP); Tracking Probes Each probe is tracked using a specialized structure that maintains all relevant information: typedef struct { int ttl; /* Time-to-Live value */ int probe; /* Probe sequence number */ struct timeval sent_time; /* Timestamp when sent */ int received; /* Whether response was received */ struct in_addr addr; /* Address of responding hop */ double rtt; /* Round-trip time in ms */ int port; /* UDP port used for this probe */ } probe_t; The Secret Sauce: Concurrent TTL Probing The key innovation in FastRace is its ability to probe multiple TTLs concurrently. While standard traceroute must wait for responses from one TTL before moving to the next, FastRace keeps multiple TTLs active simultaneously: #define MAX_ACTIVE_TTLS 5 /* Maximum number of TTLs probed concurrently */ /* Start traceroute with concurrent TTL probing */ int next_ttl_to_print = 1; /* Next TTL to display results for */ int current_ttl = 1; /* Next TTL to start probing */ while (next_ttl_to_print ip_hl ip_hl

As a network engineer and C programmer, I've always been fascinated by the tools we use daily for network diagnostics. Traceroute is one of those essential utilities that helps us understand network paths, identify bottlenecks, and diagnose routing issues. However, I was often frustrated by its slow execution and limited visualization capabilities.
This led me to create FastRace - a blazingly fast, dependency-free traceroute implementation written in pure C that significantly outperforms the standard traceroute utility.
The Problem with Standard Traceroute
While the standard traceroute utility works well for basic needs, it has several limitations:
- Sequential Processing: It probes one TTL at a time, waiting for timeouts before moving to the next hop
- Inefficient I/O: Blocking socket operations lead to wasted CPU cycles
- Poor Visualization: Flat output makes it difficult to understand complex routing topologies
- Static Timeouts: Fixed timeout values don't adapt to network conditions
The FastRace Solution
FastRace addresses these limitations with a ground-up redesign focused on performance and usability:
- Concurrent TTL Probing: Probes multiple TTLs simultaneously
-
Non-blocking I/O: Efficient response processing with
select()
- Visual Path Representation: Tree-like format for better visualization
- Adaptive Timeout Management: Dynamic scaling based on network conditions
Technical Architecture
Core Design
FastRace uses a dual socket architecture - one for sending UDP probes and another for capturing ICMP responses:
// Create UDP socket for sending packets
send_sock = socket(AF_INET, SOCK_DGRAM, IPPROTO_UDP);
// Create raw ICMP socket for receiving responses
recv_sock = socket(AF_INET, SOCK_RAW, IPPROTO_ICMP);
Tracking Probes
Each probe is tracked using a specialized structure that maintains all relevant information:
typedef struct {
int ttl; /* Time-to-Live value */
int probe; /* Probe sequence number */
struct timeval sent_time; /* Timestamp when sent */
int received; /* Whether response was received */
struct in_addr addr; /* Address of responding hop */
double rtt; /* Round-trip time in ms */
int port; /* UDP port used for this probe */
} probe_t;
The Secret Sauce: Concurrent TTL Probing
The key innovation in FastRace is its ability to probe multiple TTLs concurrently. While standard traceroute must wait for responses from one TTL before moving to the next, FastRace keeps multiple TTLs active simultaneously:
#define MAX_ACTIVE_TTLS 5 /* Maximum number of TTLs probed concurrently */
/* Start traceroute with concurrent TTL probing */
int next_ttl_to_print = 1; /* Next TTL to display results for */
int current_ttl = 1; /* Next TTL to start probing */
while (next_ttl_to_print <= MAX_TTL && !finished) {
/* Send probes for up to MAX_ACTIVE_TTLS TTLs */
while (current_ttl <= MAX_TTL &&
(current_ttl - next_ttl_to_print) < MAX_ACTIVE_TTLS) {
for (int i = 0; i < NUM_PROBES; i++) {
send_probe(current_ttl, i);
usleep(50000); // 50ms between probes
}
/* Record time after sending all probes for this TTL */
gettimeofday(&last_probe_time[current_ttl - 1], NULL);
current_ttl++;
}
/* Process responses in a tight loop */
for (int i = 0; i < 10; i++) {
process_responses(WAIT_TIMEOUT_MS);
}
// Check and print results for completed TTLs...
}
This approach allows FastRace to start probing later hops before earlier ones have completed, dramatically reducing total trace time.
Efficient Response Processing
Instead of using blocking socket operations, FastRace employs select()
with configurable timeouts to efficiently process incoming packets:
int process_responses(int timeout_ms) {
fd_set readfds;
FD_ZERO(&readfds);
FD_SET(recv_sock, &readfds);
struct timeval timeout;
timeout.tv_sec = timeout_ms / 1000;
timeout.tv_usec = (timeout_ms % 1000) * 1000;
int ret = select(recv_sock + 1, &readfds, NULL, NULL, &timeout);
if (ret <= 0) return ret;
// Process incoming packet...
}
Advanced Probe Identification
FastRace implements a clever technique to match responses to their corresponding probes by using unique UDP ports:
int port = BASE_PORT + (ttl * NUM_PROBES) + probe_num;
When processing ICMP responses, we extract the original UDP header and match by port number:
struct udphdr *orig_udp = (struct udphdr *)(buffer + ip_header_len + 8 + orig_ip_header_len);
int orig_port = ntohs(orig_udp->uh_dport);
for (int ttl = 1; ttl <= MAX_TTL; ttl++) {
for (int i = 0; i < NUM_PROBES; i++) {
int idx = (ttl - 1) * NUM_PROBES + i;
if (!probes[idx].received && probes[idx].port == orig_port) {
// Match found! Record the response...
probes[idx].received = 1;
probes[idx].addr = recv_addr.sin_addr;
double rtt = (recv_time.tv_sec - probes[idx].sent_time.tv_sec) * 1000.0 +
(recv_time.tv_usec - probes[idx].sent_time.tv_usec) / 1000.0;
probes[idx].rtt = rtt;
return 1;
}
}
}
Visual Path Representation
FastRace improves on the traditional flat output format with a tree-like visualization that clearly shows branching paths:
// Print first address with hostname
char *hostname = resolve_hostname(hop_addrs[0]);
printf("→ %-15s (%6.2f ms)", inet_ntoa(hop_addrs[0]), hop_rtts[0]);
if (hostname) {
printf(" %s", hostname);
free(hostname);
}
printf("\n");
// Print additional addresses if any
for (int i = 1; i < unique_addrs; i++) {
printf(" └→ %-15s (%6.2f ms)", inet_ntoa(hop_addrs[i]), hop_rtts[i]);
char *alt_hostname = resolve_hostname(hop_addrs[i]);
if (alt_hostname) {
printf(" %s", alt_hostname);
free(alt_hostname);
}
printf("\n");
}
This produces output like:
7 │→ 72.14.209.224 ( 60.76 ms)
└→ 72.14.223.184 ( 61.65 ms)
8 │→ 142.251.244.109 ( 59.57 ms)
└→ 216.239.62.49 ( 71.36 ms)
└→ 142.250.210.95 ( 70.25 ms)
Which makes it easy to identify load-balanced routes and alternative paths.
Performance Benchmarks
FastRace significantly outperforms standard traceroute in several key metrics:
Metric | Standard Traceroute | FastRace | Improvement |
---|---|---|---|
Total trace time (30 hops) | ~15-20 seconds | ~5-8 seconds | 60-70% faster |
Memory usage | ~400-600 KB | ~120-150 KB | 70-75% less memory |
CPU utilization | 5-8% | 2-3% | 60% less CPU |
Packet efficiency | 1 TTL at a time | Up to 5 TTLs concurrently | 5x throughput |
Implementation Challenges
Raw Socket Handling
One of the most challenging aspects was properly handling raw sockets and parsing ICMP packets. Unlike standard sockets, raw sockets require careful handling of protocol headers:
struct ip *ip = (struct ip *)buffer;
int ip_header_len = ip->ip_hl << 2;
if (bytes < ip_header_len + ICMP_MINLEN) {
debug_print("Packet too small: %d bytes\n", bytes);
return 0;
}
struct icmp *icmp = (struct icmp *)(buffer + ip_header_len);
Additionally, when an ICMP Time Exceeded message is received, the original IP and UDP headers are embedded within it:
struct ip *orig_ip = (struct ip *)(buffer + ip_header_len + 8);
int orig_ip_header_len = orig_ip->ip_hl << 2;
if (bytes < ip_header_len + 8 + orig_ip_header_len + 8) {
debug_print(" Not enough data for original header\n");
return 1;
}
struct udphdr *orig_udp = (struct udphdr *)(buffer + ip_header_len + 8 + orig_ip_header_len);
Port-based Probe Identification
Another challenge was reliably matching responses to their corresponding probes. I solved this by using unique port numbers for each probe:
int port = BASE_PORT + (ttl * NUM_PROBES) + probe_num;
struct sockaddr_in probe_dest = dest_addr;
probe_dest.sin_port = htons(port);
This allows for precise tracking of each probe without requiring complex packet identification mechanisms.
Balancing Responsiveness and Performance
Finding the right balance between responsiveness and performance required careful tuning of timeout values and probe rates:
#define WAIT_TIMEOUT_MS 50 /* Moderate timeout for better responsiveness */
#define TTL_TIMEOUT 5000 /* Timeout per TTL in ms */
Too aggressive and the system would flood the network; too conservative and we'd lose our performance advantage.
Optimizations
Compiler Optimization Flags
The Makefile includes optimization flags for maximum performance:
CFLAGS = -O3 -Wall -Wextra
For maximum performance, we use architecture-specific optimizations:
optimized: $(SRC) $(HEADERS)
$(CC) $(CFLAGS) -march=native -mtune=native -flto -o $(TARGET) $(SRC) $(LDFLAGS)
Memory Efficiency
FastRace maintains a minimal memory footprint by using pre-allocated arrays instead of dynamic memory:
probe_t probes[MAX_TTL * NUM_PROBES];
struct timeval last_probe_time[MAX_TTL];
Memory is only allocated dynamically for hostname resolution, which is immediately freed after use.
Using FastRace
Building from Source
# Standard optimized build
make
# Build with maximum performance optimizations
make optimized
# Install to system (default: /usr/local/bin)
sudo make install
Basic Usage
sudo ./fastrace google.com
Output:
Tracing route to google.com (172.217.168.46)
Maximum hops: 30, Protocol: UDP
TTL │ IP Address (RTT ms) Hostname
────┼───────────────────────────────────────────
1 │→ 192.168.1.1 ( 2.58 ms) router.local
2 │→ * * * (timeout)
3 │→ * * * (timeout)
4 │→ 37.26.81.21 ( 88.01 ms)
5 │→ 79.140.91.10 ( 31.21 ms)
6 │→ 195.22.202.203 ( 38.73 ms)
7 │→ 72.14.209.224 ( 60.76 ms)
└→ 72.14.223.184 ( 61.65 ms)
8 │→ 142.251.244.109 ( 59.57 ms)
└→ 216.239.62.49 ( 71.36 ms)
└→ 142.250.210.95 ( 70.25 ms)
9 │→ 142.251.247.141 ( 59.79 ms)
10 │→ 34.8.172.215 ( 62.42 ms) 215.172.8.34.bc.googleusercontent.com
Future Improvements
I have several enhancements planned for FastRace:
- IPv6 Support: Adding support for tracing IPv6 routes
- TCP Probing Mode: Alternative probe method for bypassing UDP-filtered routes
- Interactive Mode: Real-time visualization with keystroke controls
- Enhanced Statistics: More detailed timing and packet loss analysis
- JSON Output Mode: Structured output for programmatic analysis
Conclusion
Building FastRace has been an enlightening journey into the depths of networking protocols and C programming. By focusing on performance and usability from the ground up, I was able to create a tool that significantly outperforms the standard traceroute utility while providing better visualization capabilities.
The project demonstrates how revisiting and rethinking established tools can lead to substantial improvements, even in areas as mature as network diagnostics.
If you're interested in contributing or have suggestions for improvements, please visit the GitHub repository.
Technical Requirements
- Operating System: Linux, macOS, or other Unix-like systems with raw socket support
- Permissions: Root/sudo access required (raw sockets)
- Compiler: GCC with C99 support or later
- Architecture: x86, x86_64, ARM, or any platform with standard C library support
About the Author
Davide Santangelo is a software engineer and network specialist with a passion for high-performance systems programming and networking tools.
You can find more about me on GitHub.