Embedding Similarity Calculator
Score:
Method:
🔢 Pair Comparison
📊 Similarity Matrix
📖 Score Guide
Text A 0 tokens
Text B 0 tokens
Similarity Result
similarity
Method Breakdown
TF-IDF
BoW
Jaccard
Overlap
Top Shared Terms
TF-IDF Vector Preview (top 8 dims)
What this means for RAG
Passages
Add 2–8 passages to compute an all-pairs similarity matrix.
Presets
🔢
Add at least 2 passages and click Calculate

Understanding Similarity Scores

This tool uses lexical similarity metrics — fast, private, and no API key needed. Real embedding models (OpenAI, Cohere, etc.) capture deeper semantic meaning, but these metrics are excellent for understanding and debugging RAG pipelines.

Score Interpretation

Score RangeLabelRAG Interpretation
0.85 – 1.00Very HighNear-duplicate or near-identical content. Very likely to be retrieved.
0.65 – 0.85HighStrongly related. Chunk will almost certainly be in top-k results.
0.45 – 0.65ModerateTopically related. May appear in top-k with a good embedding model.
0.25 – 0.45LowWeak overlap. Likely not retrieved unless context window is large.
0.00 – 0.25Very LowAlmost no lexical overlap. Will not be retrieved.

Similarity Methods

TF-IDF Cosine
Weights terms by frequency and inverse document frequency. Rare shared terms count more. Best overall proxy for embedding similarity.
Bag-of-Words Cosine
Counts raw term occurrences. Simpler than TF-IDF — high-frequency common words have more influence.
Jaccard Similarity
|A ∩ B| / |A ∪ B|. Pure token set overlap — ignores frequency. Great for deduplication.
Overlap Coefficient
|A ∩ B| / min(|A|, |B|). Good when texts differ greatly in length — short text fully contained in long gives 1.0.

Lexical vs. Semantic Similarity

This tool computes lexical similarity — matching by shared words. Real embedding models (like text-embedding-3-small or BGE-M3) capture semantic similarity — meaning even when different words are used.

Example: "dog" vs "canine" → lexical score ≈ 0.0, semantic score ≈ 0.85+.

Use this tool to understand the structural/lexical component of your RAG pipeline and identify obvious mismatches. For production, combine with a real embedding model evaluation.

RAG Retrieval Tips

  • If query–chunk similarity is consistently < 0.3, your chunks may be too large (diluting signal) or your queries use very different vocabulary → consider semantic chunking or query expansion.
  • If all chunks score > 0.8 against a query, your corpus may be too homogeneous — re-ranking won't help much.
  • Use the Matrix tab to find near-duplicate chunks in your index — very high pairwise scores indicate redundancy you could remove.
  • Hybrid search (BM25 + dense vectors) leverages both lexical and semantic signals — use this tool to understand the BM25 side.
Active score
TF-IDF
Method
0
Shared terms
0
Unique terms total