TL;DR: TikTok’s payments team kept most of their Go services, but surgically rewrote a few CPU-bound traffic APIs (like “get user balance/stats”) in Rust. The result: 33% lower CPU, 72% lower memory, and 76% lower p99 latency — with about $300K/year saved on infra. This is a story about targeted performance work, not a language war.
The Setup: Go Was Great… Until 100K QPS
TikTok’s payments system started in Go for the right reasons:
- Simplicity & Dev Speed: Go’s clean syntax and tooling helped small teams ship quickly.
- Concurrency: Goroutines + channels are excellent for I/O-heavy services.
- Maturity: Good ecosystem, observability, and ops story.
But success brings volume. At ~100,000 queries per second, the APIs that return user balances and usage stats became CPU hot paths. As traffic climbed, CPU utilization spiked, threatening tail latencies and SLOs.
Where the CPU Went
Three usual suspects showed up:
- Heavy serialization/deserialization (marshaling structs to JSON/Protobuf and back).
- Garbage collection pauses (even small, steady pauses add up at 100K QPS).
- Runtime overhead from allocations (escape analysis misses + short-lived objects).