Slow compilation and massive target/ directories — the classic Rust complaints. This post documents the techniques I’m currently using to make things better, using my project ClewdR (an async web service with 394 crate dependencies) as an example.
Environment: Rust 1.94.1, CachyOS (Arch-based), NVMe SSD, Btrfs.
rust-lld
Linking is the final step of Rust compilation, and traditionally the slowest. GNU ld performs terribly here, especially with LTO enabled.
The old approach was to manually install lld or mold and configure .cargo/config.toml:
[target.x86_64-unknown-linux-gnu]
linker = "clang"
rustflags = ["-C", "link-arg=-fuse-ld=lld"]
But since Rust 1.85 (2025-02-20), rust-lld is the default linker on x86_64-unknown-linux-gnu — no configuration needed:
$ readelf -p .comment target/release/clewdr
String dump of section '.comment':
[ 1] Linker: LLD 21.1.8
[ 5f] rustc version 1.94.1 (e408947bf 2026-03-25)
Just upgrade Rust. Free lunch.
As a side note, there are two other linkers worth keeping an eye on:
- mold: A linker that prioritizes speed. Generally faster than lld in non-LTO scenarios, though its LTO support is limited.
- wild: An experimental linker written in Rust, aiming to be the fastest Linux ELF linker with heavy multithreading optimizations. Still in active development — worth watching if you’re interested.
For most people, the default rust-lld is good enough.
sccache
sccache is a compilation cache, similar to ccache but with Rust support. It caches each crate’s compilation artifacts — identical inputs get reused without recompilation.
Setup is simple. After installing sccache, add one line to ~/.cargo/config.toml:
[build]
rustc-wrapper = "sccache"
Using ClewdR’s release build as an example (opt-level = "z", lto = true, codegen-units = 1):
| Scenario | Time |
|---|---|
| No sccache, clean build | 48.4s |
| sccache cold cache, clean build | 55.7s |
| sccache warm cache, clean build | 34.2s |
The first build is slightly slower due to cache population, but subsequent clean builds are about 30% faster. Cache hit rate with a warm cache:
Cache hits rate 52.00 %
Cache hits rate (Rust) 54.43 %
Cache hits rate (C/C++) 50.00 %
The main benefit scenarios: rebuilding after cargo clean, switching branches, sharing dependencies across multiple projects. Proc-macro crates won’t be cached. The local cache defaults to 10 GiB and also supports S3/GCS remote caching.
Btrfs Transparent Compression
A single ClewdR release build produces 5,121 files totaling 840 MiB. Multiple Rust projects’ target/ directories can easily add up to tens of gigabytes.

My filesystem is Btrfs, mounted with compress=zstd:3:
/dev/nvme0n1p2 on /home type btrfs (rw,noatime,compress=zstd:3,ssd,discard=async,space_cache=v2)
Completely transparent to everything above — no build configuration changes needed. Using compsize to check the actual disk usage of target/:
Processed 5121 files, 7468 regular extents (7782 refs), 2710 inline.
Type Perc Disk Usage Uncompressed Referenced
TOTAL 41% 320M 768M 814M
none 100% 82M 82M 96M
zstd 34% 237M 685M 717M
768 MiB of data occupies only 320 MiB on disk — less than half the original size. Compilation intermediates (.o, .rlib, .rmeta) compress especially well.
One thing to note: sccache’s cache is already compressed, so Btrfs compression on top has virtually no effect:
# ~/.cache/sccache/
Type Perc Disk Usage Uncompressed Referenced
TOTAL 99% 9.8G 9.8G 10G
zstd:3 on NVMe SSDs has basically no noticeable performance impact — a good balance point.
If you’re using ZFS, it also supports transparent compression — just set compression=zstd for a similar effect.
File Deduplication
Multiple Rust projects’ target/ directories often contain massive amounts of duplicate content — identical dependency versions produce byte-for-byte identical .rlib and .rmeta files. Transparent compression can shrink individual files, but it can’t do anything about this cross-project duplication. That’s where filesystem-level deduplication comes in.
Across my 18 local Rust projects, target/ directories totaled 51 GiB referenced. After Btrfs deduplication (reflink) + zstd compression, actual disk usage is just 15 GiB:
Processed 165605 files, 274042 regular extents (562708 refs), 101289 inline.
Type Perc Disk Usage Uncompressed Referenced
TOTAL 47% 15G 32G 51G
none 100% 7.7G 7.7G 9.0G
zstd 30% 7.3G 24G 42G
Deduplication brought 51 GB down to 32 GB (saving ~37%), compression then brought 32 GB down to 15 GB (saving ~53%) — the combined effect is quite impressive.
Btrfs supports offline deduplication, which works by merging identical extents into a single physical copy (reflink). Two commonly used tools:
- duperemove: Scans specified directories, finds duplicate extents, and submits them to the kernel for deduplication. Good for running manually or on a schedule.
- bees: A background daemon that continuously monitors filesystem changes and deduplicates automatically. Better suited for a “set it and forget it” setup, though it uses some CPU and memory continuously.
ZFS has built-in inline deduplication — just set dedup=on. More aggressive than Btrfs’s offline approach — it compares data at write time, so duplicates never hit disk at all. The cost is that every block needs a DDT (Dedup Table) entry in memory, which can become very expensive at scale. Generally recommended only when you have plenty of RAM (NAS/server scenarios) — use with caution on desktops.
Deduplication and sccache might seem similar, but they focus on different things: sccache saves time by skipping redundant compilation and pulling from cache, but each project’s target/ still has its own independent copies of files; deduplication saves space by merging those identical copies on disk. The two are complementary.
Final Thoughts
Rust’s compilation time and disk usage have been long-standing community complaints. The Rust team has been making continuous improvements — rust-lld enabled by default, incremental compilation improvements, frontend parallelization, and more — but community feedback is always “still not fast enough.” The techniques above are ultimately workarounds — things you can do from the outside when the compiler itself can’t get there in one step. And honestly, even with all of them applied, Rust’s compilation experience compared to most other languages is still among the slowest and most disk-hungry. I suppose that’s the price you pay for zero-cost abstractions and the borrow checker.