🔢 Pair Comparison

📊 Similarity Matrix

📖 Score Guide

Text A 0 tokens

Text B 0 tokens

Similarity Result

—

similarity

—

Method Breakdown

TF-IDF

—

BoW

—

Jaccard

—

Overlap

—

Top Shared Terms

—

TF-IDF Vector Preview (top 8 dims)

—

What this means for RAG

—

Passages

Add 2–8 passages to compute an all-pairs similarity matrix.

Presets

🔢

Add at least 2 passages and click Calculate

Understanding Similarity Scores

This tool uses lexical similarity metrics — fast, private, and no API key needed. Real embedding models (OpenAI, Cohere, etc.) capture deeper semantic meaning, but these metrics are excellent for understanding and debugging RAG pipelines.

Score Interpretation

Score Range	Label	RAG Interpretation
0.85 – 1.00	Very High	Near-duplicate or near-identical content. Very likely to be retrieved.
0.65 – 0.85	High	Strongly related. Chunk will almost certainly be in top-k results.
0.45 – 0.65	Moderate	Topically related. May appear in top-k with a good embedding model.
0.25 – 0.45	Low	Weak overlap. Likely not retrieved unless context window is large.
0.00 – 0.25	Very Low	Almost no lexical overlap. Will not be retrieved.

Similarity Methods

TF-IDF Cosine

Weights terms by frequency and inverse document frequency. Rare shared terms count more. Best overall proxy for embedding similarity.

Bag-of-Words Cosine

Counts raw term occurrences. Simpler than TF-IDF — high-frequency common words have more influence.

Jaccard Similarity

|A ∩ B| / |A ∪ B|. Pure token set overlap — ignores frequency. Great for deduplication.

Overlap Coefficient

|A ∩ B| / min(|A|, |B|). Good when texts differ greatly in length — short text fully contained in long gives 1.0.

Lexical vs. Semantic Similarity

This tool computes lexical similarity — matching by shared words. Real embedding models (like text-embedding-3-small or BGE-M3) capture semantic similarity — meaning even when different words are used.

Example: "dog" vs "canine" → lexical score ≈ 0.0, semantic score ≈ 0.85+.

Use this tool to understand the structural/lexical component of your RAG pipeline and identify obvious mismatches. For production, combine with a real embedding model evaluation.

RAG Retrieval Tips

If query–chunk similarity is consistently < 0.3, your chunks may be too large (diluting signal) or your queries use very different vocabulary → consider semantic chunking or query expansion.
If all chunks score > 0.8 against a query, your corpus may be too homogeneous — re-ranking won't help much.
Use the Matrix tab to find near-duplicate chunks in your index — very high pairwise scores indicate redundancy you could remove.
Hybrid search (BM25 + dense vectors) leverages both lexical and semantic signals — use this tool to understand the BM25 side.

—

Active score

TF-IDF

Method

Shared terms

Unique terms total

Embedding Similarity Calculator

Understanding Similarity Scores

Score Interpretation

Similarity Methods

Lexical vs. Semantic Similarity

RAG Retrieval Tips