Educational Open Source · Rust

Fuzzy Search,
Built to Learn From

A heavily annotated Rust implementation of fuzzy search. Understand every line — the Levenshtein algorithm, the optimisations, and the intuition behind each. Beginner friendly.

fuzzly — live demo
🔍
Searching…

What is Fuzzy Search?

Normal search requires an exact match. Fuzzy search finds results even when the input has typos, missing letters, or small differences.

📏

Edit Distance

The core idea: count the minimum number of single-character changes (insert, delete, substitute) to transform one string into another. Fewer changes = better match.

📊

Similarity Score

We turn the raw distance into a 0–100% score. Identical strings score 100%. Completely different strings score 0%. A threshold (e.g. 60%) filters out poor matches.

🗂️

Ranking Results

Multiple candidates often meet the threshold. We sort by score (best first) so the most relevant result always appears at the top.

Live Demo — type anything
🔍

Start typing to see fuzzy matches in real time

The Levenshtein Distance Algorithm

Every fuzzy search is built on this single idea: how many single-character edits separate two strings? Here are the three allowed operations.

Insert Add a missing character

Turn catcart by inserting r. Cost: 1.

cat → crat → cart

Delete Remove an extra character

Turn cartcat by deleting r. Cost: 1.

cart → cat

Substitute Replace a wrong character

Turn catbat by replacing cb. Cost: 1.

cat → bat

Match Characters are the same

When two characters at the same position match, no edit is needed. Cost: 0.

cat ↔ bat

🧮 Interactive DP Matrix Visualizer Each cell shows the edit distance for the prefixes up to that point

fast

Click Visualize to build the matrix, or Animate to watch it fill in step by step.

Base case Match (cost 0) Substitute (cost 1) Insert Delete Optimal path

Four Ways to Implement It

Each variant produces identical results. They trade memory and performance for readability. Start with compute, understand it fully, then read the optimised versions.

compute — The Full Matrix

The classic Wagner-Fischer algorithm. Builds a complete (m+1)×(n+1) grid where every cell represents the edit distance for a pair of prefixes. Start here — every cell, every loop, every step is annotated in the source.

Time: O(m·n) Space: O(m·n)
  • Easiest to understand and trace through by hand
  • Full matrix preserved — supports backtracking to get the edit path
  • Great for learning: you can print the matrix and follow along
  • Used by LevenshteinWithOperations to show which edits were applied
pub fn compute(source: &str, target: &str) -> usize {
    let src: Vec<char> = source.chars().collect();
    let tgt: Vec<char> = target.chars().collect();
    let (m, n) = (src.len(), tgt.len());

    // Edge cases
    if m == 0 { return n; }
    if n == 0 { return m; }

    // Full (m+1)×(n+1) matrix
    let mut matrix = vec![vec![0usize; n+1]; m+1];

    // Base cases: empty prefix conversions
    for i in 0..=m { matrix[i][0] = i; }
    for j in 0..=n { matrix[0][j] = j; }

    for i in 1..=m {
        for j in 1..=n {
            // 0 if chars match, 1 if they differ
            let cost = if src[i-1] == tgt[j-1] { 0 } else { 1 };

            matrix[i][j] = min(
                matrix[i-1][j]   + 1,     // deletion
                min(
                matrix[i][j-1]   + 1,     // insertion
                matrix[i-1][j-1] + cost,  // substitution
                )
            );
        }
    }
    matrix[m][n]
}

compute_optimized — Two-Row Buffer

Key insight: each cell only reads from the row directly above. Once row i is complete we never need rows 0..i−1 again. So we keep just two rows and swap them each iteration.

Time: O(m·n) Space: O(min(m,n))
  • Uses O(min(m,n)) memory instead of O(m·n) — great for long strings
  • Puts the shorter string in the column dimension for smallest rows
  • std::mem::swap is free — just pointer swap, no copy
  • Same usize cell size as compute
pub fn compute_optimized(source: &str, target: &str) -> usize {
    let mut src: Vec<char> = source.chars().collect();
    let mut tgt: Vec<char> = target.chars().collect();
    let (m, n) = (src.len(), tgt.len());
    if m == 0 { return n; }
    if n == 0 { return m; }

    // Put shorter string in column dim → smaller row allocation
    let (src, tgt, m, n) = if m < n {
        (tgt, src, n, m)
    } else {
        (src, tgt, m, n)
    };

    let mut prev: Vec<usize> = (0..=n).collect();
    let mut curr: Vec<usize> = vec![0; n+1];

    for i in 1..=m {
        curr[0] = i;
        for j in 1..=n {
            let cost = usize::from(src[i-1] != tgt[j-1]);
            curr[j] = min(curr[j-1]+1,
                       min(prev[j]+1, prev[j-1]+cost));
        }
        // Swap — O(1), just pointer exchange
        std::mem::swap(&mut prev, &mut curr);
    }
    prev[n]
}

compute_fast — Production Path

Two optimisations stack on top of compute_optimized: ASCII strings skip the Vec<char> allocation entirely, and cells shrink to the narrowest integer that can't overflow — u8 for strings under 256 chars, u16 under 65 536, otherwise u32. Used internally by similarity().

Time: O(m·n) Space: O(min(m,n))
  • ASCII fast path: as_bytes() — no heap alloc for Vec<char>
  • Adaptive cell width: gap cost 1 ⇒ no cell exceeds max(m,n), so we pick u8/u16/u32 accordingly (up to 8× cache density)
  • LLVM auto-vectorises the u8 inner loop on short inputs — near-streaming memory throughput
  • Automatically falls back to char slices for Unicode
  • Read this last — the optimisations only make sense once you know what's being optimised
pub fn compute_fast(source: &str, target: &str) -> usize {
    // ASCII: every char = 1 byte, skip Vec<char> allocation
    if source.is_ascii() && target.is_ascii() {
        return fast_bytes(source.as_bytes(), target.as_bytes());
    }
    let src: Vec<char> = source.chars().collect();
    let tgt: Vec<char> = target.chars().collect();
    fast_chars(&src, &tgt)
}

fn fast_bytes(src: &[u8], tgt: &[u8]) -> usize {
    let (m, n) = (src.len(), tgt.len());
    let (src, tgt) = if m < n { (tgt, src) } else { (src, tgt) };

    // Gap cost = 1 ⇒ no cell exceeds max(m, n). Pick the narrowest
    // integer type that can hold it — u8 fits 99% of fuzzy-search inputs.
    let max_dist = src.len();
    if max_dist <= u8::MAX as usize {
        fast_bytes_dp::<u8>(src, tgt)   // 1-byte cells: entire DP fits in L1
    } else if max_dist <= u16::MAX as usize {
        fast_bytes_dp::<u16>(src, tgt)  // 2-byte cells
    } else {
        fast_bytes_dp::<u32>(src, tgt)  // 4-byte cells (4 B strings…)
    }
}

// Monomorphised per cell type — three tight specialised loops.
fn fast_bytes_dp<T: DpCell>(src: &[u8], tgt: &[u8]) -> usize {
    let (m, n) = (src.len(), tgt.len());
    let mut prev: Vec<T> = (0..=n).map(T::from_usize).collect();
    let mut curr: Vec<T> = vec![T::from_usize(0); n+1];

    for i in 1..=m {
        curr[0] = T::from_usize(i);
        let s = src[i-1];
        for j in 1..=n {
            let cost = if s == tgt[j-1] { T::from_usize(0) } else { T::ONE };
            curr[j] = curr[j-1].add(T::ONE)
                .min(prev[j].add(T::ONE))
                .min(prev[j-1].add(cost));
        }
        std::mem::swap(&mut prev, &mut curr);
    }
    prev[n].to_usize()
}

compute_myers — Bit-Parallel Levenshtein

Gene Myers (1999) observed that the deltas between consecutive cells in a DP column are each just −1, 0, or +1. That means an entire column can be encoded in two bit-vectors Pv (positive deltas) and Mv (negative deltas), packed into one 64-bit word per vector. Advancing a column is ~7 bitwise ops — no inner loop over cells. The running score of the bottom cell is kept as a plain integer, updated by peeking at the high bit of the horizontal deltas.

Time: O(n) words Space: O(1) words
  • Entire DP column packed into a single u64 — no per-cell loop
  • ~7 bitwise ops per text character (a fixed schedule, not dependent on m)
  • Stack-allocated 128-entry Peq table for ASCII; no heap
  • Ties the auto-vectorised u8 DP on ASCII short inputs (both are near the memory ceiling), but with fundamentally different asymptotic behaviour
  • Foundation for future block-Myers (patterns > 64) and SIMD expansion
  • Falls back to compute_fast for patterns > 64 chars or Unicode
pub fn compute_myers(source: &str, target: &str) -> usize {
    if source.is_ascii() && target.is_ascii() {
        let (a, b) = (source.as_bytes(), target.as_bytes());
        if a.len().min(b.len()) <= 64 {
            return myers_bytes(a, b);
        }
    }
    Self::compute_fast(source, target)
}

fn myers_bytes(a: &[u8], b: &[u8]) -> usize {
    // Shorter string becomes the pattern `p` (fits in one u64)
    let (p, t) = if a.len() <= b.len() { (a, b) } else { (b, a) };
    let m = p.len();
    if m == 0 { return t.len(); }

    // Peq[c] = bitmask of positions where byte c occurs in p
    let mut peq = [0u64; 128];  // ASCII only — 128 slots are enough
    for (i, &c) in p.iter().enumerate() {
        peq[(c & 0x7F) as usize] |= 1u64 << i;
    }

    let mask   = if m == 64 { u64::MAX } else { (1u64 << m) - 1 };
    let top    = 1u64 << (m - 1);
    let mut pv = mask;   // all deltas +1 initially (column = 0,1,…,m)
    let mut mv = 0u64;
    let mut score = m;

    for &c in t {
        let eq = peq[(c & 0x7F) as usize];
        let xv = eq | mv;
        // The carry trick: propagates match runs in one addition
        let xh = (((eq & pv).wrapping_add(pv)) ^ pv) | eq;

        let mut ph = mv | !(xh | pv);
        let mut mh = pv & xh;

        if ph & top != 0 { score += 1; }
        if mh & top != 0 { score -= 1; }

        ph = (ph << 1) | 1;   mh <<= 1;
        pv = (mh & mask) | !((xv | ph) & mask);
        pv &= mask;
        mv = ph & xv & mask;
    }
    score
}

Benchmark Results

All four implementations run head-to-head on five stress tiers. Measured with Criterion.rs (40 samples, release build, x86-64 Linux). Reproduce: cargo bench

Tier String lengths compute compute_optimized compute_fast compute_myers Speedup (fast vs compute)
tiny 6 vs 7 chars 332 ns 286 ns 39 ns 40 ns 8.5× faster
short 22 vs 24 chars 2,478 ns 1,342 ns 105 ns 106 ns 23× faster
medium 66 vs 68 chars 14,003 ns 9,497 ns 10,085 ns 10,011 ns 1.4× faster
long 182 vs 191 chars 99,178 ns 104,900 ns 79,650 ns 79,973 ns 1.2× faster
stress 270 vs 268 chars 119,310 ns 79,560 ns 96,125 ns 94,962 ns 1.2× faster
💡 tiny & short tiers: compute_fast's adaptive u8 DP cells fit the entire working set in a few cache lines and LLVM auto-vectorises the inner loop — delivering 8–23× speedups over the baseline.
💡 stress tier: compute_optimized edges ahead because the full matrix in compute exceeds L2 cache, while the rolling two rows stay warm.
compute_myers ties compute_fast on ASCII inputs: the auto-vectorised u8 DP is already memory-bandwidth-bound at these sizes. Its value is algorithmic — O(n) bitwise words per candidate instead of O(m·n) cells — and it's the foundation for future block-Myers and SIMD work. Falls back to compute_fast for patterns longer than 64 chars.
📈 HTML reports with per-sample charts: target/criterion/levenshtein/report/index.html after running cargo bench.

Use fuzzly in Your Project

Clone the repo to read the annotated source, or add it as a library dependency.

Cargo.toml
# Add to your Cargo.toml
[dependencies]
fuzzly = { git = "https://github.com/Fanaperana/fuzzy-search-rs" }
main.rs — basic usage
use fuzzly::{FuzzySearcher, LevenshteinDistance};

fn main() {
    // Raw edit distance
    let d = LevenshteinDistance::compute("kitten", "sitting");
    println!("distance: {d}"); // 3

    // Similarity score (0.0 – 1.0)
    let sim = LevenshteinDistance::similarity("apple", "aple");
    println!("{:.0}% similar", sim * 100.0); // 80%

    // Fuzzy search with threshold
    let searcher = FuzzySearcher::new(0.6);
    let candidates = vec!["apple", "application", "banana"];
    let results = searcher.search("aple", &candidates);

    for r in results {
        println!("{}: {:.0}%", r.text, r.score * 100.0);
    }
    // apple: 80%   application: 67%
}
Clone & explore
git clone https://github.com/Fanaperana/fuzzy-search-rs
cd fuzzy-search-rs

cargo run           # interactive demo
cargo test          # run all tests
cargo bench         # run benchmarks
cargo doc --open    # open annotated docs
Advanced usage
use fuzzly::{FuzzySearcher, LevenshteinWithOperations};

// See exactly what edits are needed
let result = LevenshteinWithOperations::compute("cat", "cart");
println!("{}", result);
// insert 'r'  — distance: 1  — similarity: 75%

// Builder API for search configuration
let searcher = FuzzySearcher::new(0.5)
    .case_insensitive(true)
    .max_results(5);

let best = searcher.find_best("APLE", &words);

Enjoying fuzzly? Give it a star!

If this project helped you understand fuzzy search or Levenshtein algorithms, a GitHub star means the world. It helps others discover the project and motivates continued work.

⭐ Star on GitHub