A heavily annotated Rust implementation of fuzzy search. Understand every line — the Levenshtein algorithm, the optimisations, and the intuition behind each. Beginner friendly.
Normal search requires an exact match. Fuzzy search finds results even when the input has typos, missing letters, or small differences.
The core idea: count the minimum number of single-character changes (insert, delete, substitute) to transform one string into another. Fewer changes = better match.
We turn the raw distance into a 0–100% score. Identical strings score 100%. Completely different strings score 0%. A threshold (e.g. 60%) filters out poor matches.
Multiple candidates often meet the threshold. We sort by score (best first) so the most relevant result always appears at the top.
Start typing to see fuzzy matches in real time
Every fuzzy search is built on this single idea: how many single-character edits separate two strings? Here are the three allowed operations.
Turn cat → cart by inserting r. Cost: 1.
cat → crat → cart
Turn cart → cat by deleting r. Cost: 1.
cart → cat
Turn cat → bat by replacing c → b. Cost: 1.
cat → bat
When two characters at the same position match, no edit is needed. Cost: 0.
cat ↔ bat
Click Visualize to build the matrix, or Animate to watch it fill in step by step.
Each variant produces identical results. They trade memory and performance for readability.
Start with compute, understand it fully, then read the optimised versions.
The classic Wagner-Fischer algorithm. Builds a complete (m+1)×(n+1) grid where every cell represents the edit distance for a pair of prefixes. Start here — every cell, every loop, every step is annotated in the source.
LevenshteinWithOperations to show which edits were appliedpub fn compute(source: &str, target: &str) -> usize { let src: Vec<char> = source.chars().collect(); let tgt: Vec<char> = target.chars().collect(); let (m, n) = (src.len(), tgt.len()); // Edge cases if m == 0 { return n; } if n == 0 { return m; } // Full (m+1)×(n+1) matrix let mut matrix = vec![vec![0usize; n+1]; m+1]; // Base cases: empty prefix conversions for i in 0..=m { matrix[i][0] = i; } for j in 0..=n { matrix[0][j] = j; } for i in 1..=m { for j in 1..=n { // 0 if chars match, 1 if they differ let cost = if src[i-1] == tgt[j-1] { 0 } else { 1 }; matrix[i][j] = min( matrix[i-1][j] + 1, // deletion min( matrix[i][j-1] + 1, // insertion matrix[i-1][j-1] + cost, // substitution ) ); } } matrix[m][n] }
Key insight: each cell only reads from the row directly above. Once row i is complete we never need rows 0..i−1 again. So we keep just two rows and swap them each iteration.
std::mem::swap is free — just pointer swap, no copyusize cell size as computepub fn compute_optimized(source: &str, target: &str) -> usize { let mut src: Vec<char> = source.chars().collect(); let mut tgt: Vec<char> = target.chars().collect(); let (m, n) = (src.len(), tgt.len()); if m == 0 { return n; } if n == 0 { return m; } // Put shorter string in column dim → smaller row allocation let (src, tgt, m, n) = if m < n { (tgt, src, n, m) } else { (src, tgt, m, n) }; let mut prev: Vec<usize> = (0..=n).collect(); let mut curr: Vec<usize> = vec![0; n+1]; for i in 1..=m { curr[0] = i; for j in 1..=n { let cost = usize::from(src[i-1] != tgt[j-1]); curr[j] = min(curr[j-1]+1, min(prev[j]+1, prev[j-1]+cost)); } // Swap — O(1), just pointer exchange std::mem::swap(&mut prev, &mut curr); } prev[n] }
Two optimisations stack on top of compute_optimized:
ASCII strings skip the Vec<char> allocation entirely,
and cells shrink to the narrowest integer that can't overflow —
u8 for strings under 256 chars, u16 under 65 536,
otherwise u32. Used internally by similarity().
as_bytes() — no heap alloc for Vec<char>max(m,n), so we pick u8/u16/u32 accordingly (up to 8× cache density)u8 inner loop on short inputs — near-streaming memory throughputpub fn compute_fast(source: &str, target: &str) -> usize { // ASCII: every char = 1 byte, skip Vec<char> allocation if source.is_ascii() && target.is_ascii() { return fast_bytes(source.as_bytes(), target.as_bytes()); } let src: Vec<char> = source.chars().collect(); let tgt: Vec<char> = target.chars().collect(); fast_chars(&src, &tgt) } fn fast_bytes(src: &[u8], tgt: &[u8]) -> usize { let (m, n) = (src.len(), tgt.len()); let (src, tgt) = if m < n { (tgt, src) } else { (src, tgt) }; // Gap cost = 1 ⇒ no cell exceeds max(m, n). Pick the narrowest // integer type that can hold it — u8 fits 99% of fuzzy-search inputs. let max_dist = src.len(); if max_dist <= u8::MAX as usize { fast_bytes_dp::<u8>(src, tgt) // 1-byte cells: entire DP fits in L1 } else if max_dist <= u16::MAX as usize { fast_bytes_dp::<u16>(src, tgt) // 2-byte cells } else { fast_bytes_dp::<u32>(src, tgt) // 4-byte cells (4 B strings…) } } // Monomorphised per cell type — three tight specialised loops. fn fast_bytes_dp<T: DpCell>(src: &[u8], tgt: &[u8]) -> usize { let (m, n) = (src.len(), tgt.len()); let mut prev: Vec<T> = (0..=n).map(T::from_usize).collect(); let mut curr: Vec<T> = vec![T::from_usize(0); n+1]; for i in 1..=m { curr[0] = T::from_usize(i); let s = src[i-1]; for j in 1..=n { let cost = if s == tgt[j-1] { T::from_usize(0) } else { T::ONE }; curr[j] = curr[j-1].add(T::ONE) .min(prev[j].add(T::ONE)) .min(prev[j-1].add(cost)); } std::mem::swap(&mut prev, &mut curr); } prev[n].to_usize() }
Gene Myers (1999) observed that the deltas between consecutive cells
in a DP column are each just −1, 0, or +1. That means an entire column
can be encoded in two bit-vectors Pv (positive deltas)
and Mv (negative deltas), packed into one 64-bit word
per vector. Advancing a column is ~7 bitwise ops —
no inner loop over cells. The running score of the bottom cell is
kept as a plain integer, updated by peeking at the high bit of the
horizontal deltas.
u64 — no per-cell loopPeq table for ASCII; no heapu8 DP on ASCII short inputs (both are near the memory ceiling), but with fundamentally different asymptotic behaviourcompute_fast for patterns > 64 chars or Unicodepub fn compute_myers(source: &str, target: &str) -> usize { if source.is_ascii() && target.is_ascii() { let (a, b) = (source.as_bytes(), target.as_bytes()); if a.len().min(b.len()) <= 64 { return myers_bytes(a, b); } } Self::compute_fast(source, target) } fn myers_bytes(a: &[u8], b: &[u8]) -> usize { // Shorter string becomes the pattern `p` (fits in one u64) let (p, t) = if a.len() <= b.len() { (a, b) } else { (b, a) }; let m = p.len(); if m == 0 { return t.len(); } // Peq[c] = bitmask of positions where byte c occurs in p let mut peq = [0u64; 128]; // ASCII only — 128 slots are enough for (i, &c) in p.iter().enumerate() { peq[(c & 0x7F) as usize] |= 1u64 << i; } let mask = if m == 64 { u64::MAX } else { (1u64 << m) - 1 }; let top = 1u64 << (m - 1); let mut pv = mask; // all deltas +1 initially (column = 0,1,…,m) let mut mv = 0u64; let mut score = m; for &c in t { let eq = peq[(c & 0x7F) as usize]; let xv = eq | mv; // The carry trick: propagates match runs in one addition let xh = (((eq & pv).wrapping_add(pv)) ^ pv) | eq; let mut ph = mv | !(xh | pv); let mut mh = pv & xh; if ph & top != 0 { score += 1; } if mh & top != 0 { score -= 1; } ph = (ph << 1) | 1; mh <<= 1; pv = (mh & mask) | !((xv | ph) & mask); pv &= mask; mv = ph & xv & mask; } score }
All four implementations run head-to-head on five stress tiers.
Measured with Criterion.rs
(40 samples, release build, x86-64 Linux). Reproduce: cargo bench
| Tier | String lengths | compute | compute_optimized | compute_fast | compute_myers | Speedup (fast vs compute) |
|---|---|---|---|---|---|---|
| tiny | 6 vs 7 chars | 332 ns | 286 ns | 39 ns | 40 ns | 8.5× faster |
| short | 22 vs 24 chars | 2,478 ns | 1,342 ns | 105 ns | 106 ns | 23× faster |
| medium | 66 vs 68 chars | 14,003 ns | 9,497 ns | 10,085 ns | 10,011 ns | 1.4× faster |
| long | 182 vs 191 chars | 99,178 ns | 104,900 ns | 79,650 ns | 79,973 ns | 1.2× faster |
| stress | 270 vs 268 chars | 119,310 ns | 79,560 ns | 96,125 ns | 94,962 ns | 1.2× faster |
compute_fast's adaptive u8 DP cells fit the entire working set in a few cache lines and LLVM auto-vectorises the inner loop — delivering 8–23× speedups over the baseline.
compute_optimized edges ahead because the full matrix in compute exceeds L2 cache, while the rolling two rows stay warm.
compute_fast on ASCII inputs: the auto-vectorised u8 DP is already memory-bandwidth-bound at these sizes. Its value is algorithmic — O(n) bitwise words per candidate instead of O(m·n) cells — and it's the foundation for future block-Myers and SIMD work. Falls back to compute_fast for patterns longer than 64 chars.
target/criterion/levenshtein/report/index.html after running cargo bench.
Clone the repo to read the annotated source, or add it as a library dependency.
# Add to your Cargo.toml [dependencies] fuzzly = { git = "https://github.com/Fanaperana/fuzzy-search-rs" }
use fuzzly::{FuzzySearcher, LevenshteinDistance}; fn main() { // Raw edit distance let d = LevenshteinDistance::compute("kitten", "sitting"); println!("distance: {d}"); // 3 // Similarity score (0.0 – 1.0) let sim = LevenshteinDistance::similarity("apple", "aple"); println!("{:.0}% similar", sim * 100.0); // 80% // Fuzzy search with threshold let searcher = FuzzySearcher::new(0.6); let candidates = vec!["apple", "application", "banana"]; let results = searcher.search("aple", &candidates); for r in results { println!("{}: {:.0}%", r.text, r.score * 100.0); } // apple: 80% application: 67% }
git clone https://github.com/Fanaperana/fuzzy-search-rs cd fuzzy-search-rs cargo run # interactive demo cargo test # run all tests cargo bench # run benchmarks cargo doc --open # open annotated docs
use fuzzly::{FuzzySearcher, LevenshteinWithOperations}; // See exactly what edits are needed let result = LevenshteinWithOperations::compute("cat", "cart"); println!("{}", result); // insert 'r' — distance: 1 — similarity: 75% // Builder API for search configuration let searcher = FuzzySearcher::new(0.5) .case_insensitive(true) .max_results(5); let best = searcher.find_best("APLE", &words);