🔢 Pair Comparison
📊 Similarity Matrix
📖 Score Guide
Text A 0 tokens
Text B 0 tokens
Similarity Result
—
similarity
—
Method Breakdown
TF-IDF
—
BoW
—
Jaccard
—
Overlap
—
—
What this means for RAG
—
Passages
Add 2–8 passages to compute an all-pairs similarity matrix.
Presets
🔢
Add at least 2 passages and click Calculate
Understanding Similarity Scores
This tool uses lexical similarity metrics — fast, private, and no API key needed. Real embedding models (OpenAI, Cohere, etc.) capture deeper semantic meaning, but these metrics are excellent for understanding and debugging RAG pipelines.
Score Interpretation
| Score Range | Label | RAG Interpretation |
|---|---|---|
| 0.85 – 1.00 | Very High | Near-duplicate or near-identical content. Very likely to be retrieved. |
| 0.65 – 0.85 | High | Strongly related. Chunk will almost certainly be in top-k results. |
| 0.45 – 0.65 | Moderate | Topically related. May appear in top-k with a good embedding model. |
| 0.25 – 0.45 | Low | Weak overlap. Likely not retrieved unless context window is large. |
| 0.00 – 0.25 | Very Low | Almost no lexical overlap. Will not be retrieved. |
Similarity Methods
TF-IDF Cosine
Weights terms by frequency and inverse document frequency. Rare shared terms count more. Best overall proxy for embedding similarity.
Bag-of-Words Cosine
Counts raw term occurrences. Simpler than TF-IDF — high-frequency common words have more influence.
Jaccard Similarity
|A ∩ B| / |A ∪ B|. Pure token set overlap — ignores frequency. Great for deduplication.
Overlap Coefficient
|A ∩ B| / min(|A|, |B|). Good when texts differ greatly in length — short text fully contained in long gives 1.0.
Lexical vs. Semantic Similarity
This tool computes lexical similarity — matching by shared words. Real embedding models (like text-embedding-3-small or BGE-M3) capture semantic similarity — meaning even when different words are used.
Example: "dog" vs "canine" → lexical score ≈ 0.0, semantic score ≈ 0.85+.
Use this tool to understand the structural/lexical component of your RAG pipeline and identify obvious mismatches. For production, combine with a real embedding model evaluation.
RAG Retrieval Tips
- If query–chunk similarity is consistently < 0.3, your chunks may be too large (diluting signal) or your queries use very different vocabulary → consider semantic chunking or query expansion.
- If all chunks score > 0.8 against a query, your corpus may be too homogeneous — re-ranking won't help much.
- Use the Matrix tab to find near-duplicate chunks in your index — very high pairwise scores indicate redundancy you could remove.
- Hybrid search (BM25 + dense vectors) leverages both lexical and semantic signals — use this tool to understand the BM25 side.
—
Active score
TF-IDF
Method
Shared terms
0
Unique terms total