How I Improved My Rust Compilation Experience

Slow compilation and massive target/ directories — the classic Rust complaints. This post documents the techniques I’m currently using to make things better, using my project ClewdR (an async web service with 394 crate dependencies) as an example.

Environment: Rust 1.94.1, CachyOS (Arch-based), NVMe SSD, Btrfs.

rust-lld

Linking is the final step of Rust compilation, and traditionally the slowest. GNU ld performs terribly here, especially with LTO enabled.

The old approach was to manually install lld or mold and configure .cargo/config.toml:

[target.x86_64-unknown-linux-gnu]
linker = "clang"
rustflags = ["-C", "link-arg=-fuse-ld=lld"]

But since Rust 1.85 (2025-02-20), rust-lld is the default linker on x86_64-unknown-linux-gnu — no configuration needed:

$ readelf -p .comment target/release/clewdr

String dump of section '.comment':
  [     1]  Linker: LLD 21.1.8
  [    5f]  rustc version 1.94.1 (e408947bf 2026-03-25)

Just upgrade Rust. Free lunch.

As a side note, there are two other linkers worth keeping an eye on:

mold: A linker that prioritizes speed. Generally faster than lld in non-LTO scenarios, though its LTO support is limited.
wild: An experimental linker written in Rust, aiming to be the fastest Linux ELF linker with heavy multithreading optimizations. Still in active development — worth watching if you’re interested.

For most people, the default rust-lld is good enough.

sccache

sccache is a compilation cache, similar to ccache but with Rust support. It caches each crate’s compilation artifacts — identical inputs get reused without recompilation.

Setup is simple. After installing sccache, add one line to ~/.cargo/config.toml:

[build]
rustc-wrapper = "sccache"

Using ClewdR’s release build as an example (opt-level = "z", lto = true, codegen-units = 1):

Scenario	Time
No sccache, clean build	48.4s
sccache cold cache, clean build	55.7s
sccache warm cache, clean build	34.2s

The first build is slightly slower due to cache population, but subsequent clean builds are about 30% faster. Cache hit rate with a warm cache:

Cache hits rate                   52.00 %
Cache hits rate (Rust)            54.43 %
Cache hits rate (C/C++)           50.00 %

The main benefit scenarios: rebuilding after cargo clean, switching branches, sharing dependencies across multiple projects. Proc-macro crates won’t be cached. The local cache defaults to 10 GiB and also supports S3/GCS remote caching.

Btrfs Transparent Compression

A single ClewdR release build produces 5,121 files totaling 840 MiB. Multiple Rust projects’ target/ directories can easily add up to tens of gigabytes.

Rust target black hole

My filesystem is Btrfs, mounted with compress=zstd:3:

/dev/nvme0n1p2 on /home type btrfs (rw,noatime,compress=zstd:3,ssd,discard=async,space_cache=v2)

Completely transparent to everything above — no build configuration changes needed. Using compsize to check the actual disk usage of target/:

Processed 5121 files, 7468 regular extents (7782 refs), 2710 inline.
Type       Perc     Disk Usage   Uncompressed Referenced
TOTAL       41%      320M         768M         814M
none       100%       82M          82M          96M
zstd        34%      237M         685M         717M

768 MiB of data occupies only 320 MiB on disk — less than half the original size. Compilation intermediates (.o, .rlib, .rmeta) compress especially well.

One thing to note: sccache’s cache is already compressed, so Btrfs compression on top has virtually no effect:

# ~/.cache/sccache/
Type       Perc     Disk Usage   Uncompressed Referenced
TOTAL       99%      9.8G         9.8G          10G

zstd:3 on NVMe SSDs has basically no noticeable performance impact — a good balance point.

If you’re using ZFS, it also supports transparent compression — just set compression=zstd for a similar effect.

File Deduplication

Multiple Rust projects’ target/ directories often contain massive amounts of duplicate content — identical dependency versions produce byte-for-byte identical .rlib and .rmeta files. Transparent compression can shrink individual files, but it can’t do anything about this cross-project duplication. That’s where filesystem-level deduplication comes in.

Across my 18 local Rust projects, target/ directories totaled 51 GiB referenced. After Btrfs deduplication (reflink) + zstd compression, actual disk usage is just 15 GiB:

Processed 165605 files, 274042 regular extents (562708 refs), 101289 inline.
Type       Perc     Disk Usage   Uncompressed Referenced
TOTAL       47%       15G          32G          51G
none       100%      7.7G         7.7G         9.0G
zstd        30%      7.3G          24G          42G

Deduplication brought 51 GB down to 32 GB (saving ~37%), compression then brought 32 GB down to 15 GB (saving ~53%) — the combined effect is quite impressive.

Btrfs supports offline deduplication, which works by merging identical extents into a single physical copy (reflink). Two commonly used tools:

duperemove: Scans specified directories, finds duplicate extents, and submits them to the kernel for deduplication. Good for running manually or on a schedule.
bees: A background daemon that continuously monitors filesystem changes and deduplicates automatically. Better suited for a “set it and forget it” setup, though it uses some CPU and memory continuously.

ZFS has built-in inline deduplication — just set dedup=on. More aggressive than Btrfs’s offline approach — it compares data at write time, so duplicates never hit disk at all. The cost is that every block needs a DDT (Dedup Table) entry in memory, which can become very expensive at scale. Generally recommended only when you have plenty of RAM (NAS/server scenarios) — use with caution on desktops.

Deduplication and sccache might seem similar, but they focus on different things: sccache saves time by skipping redundant compilation and pulling from cache, but each project’s target/ still has its own independent copies of files; deduplication saves space by merging those identical copies on disk. The two are complementary.

Final Thoughts

Rust’s compilation time and disk usage have been long-standing community complaints. The Rust team has been making continuous improvements — rust-lld enabled by default, incremental compilation improvements, frontend parallelization, and more — but community feedback is always “still not fast enough.” The techniques above are ultimately workarounds — things you can do from the outside when the compiler itself can’t get there in one step. And honestly, even with all of them applied, Rust’s compilation experience compared to most other languages is still among the slowest and most disk-hungry. I suppose that’s the price you pay for zero-cost abstractions and the borrow checker.

rust-lld#

sccache#

Btrfs Transparent Compression#

File Deduplication#

Final Thoughts#

rust-lld

sccache

Btrfs Transparent Compression

File Deduplication

Final Thoughts