Comprehensive Metrics
Go beyond simple click-through rates. Understand search performance with industry-standard IR metrics and interactive visualizations.
Quantifying Search Relevance
Metrics are calculated against a "ground truth", which is called "Expected Results" in our platform. You can have one or more sets, and use all or their subset. This is typically a CSV file you provide, mapping queries to a list of known relevant document URLs. The platform uses this data to automatically calculate a suite of metrics for every test run.
Key Metrics Supported
Precision@K
Measures the exactness of the results by calculating the proportion of relevant documents found in the top K results. A higher score indicates more relevant results were returned at the top of the page.
Recall
Measures the completeness of the results, indicating the fraction of ALL relevant documents that were successfully retrieved.
nDCG@K (Normalized Discounted Cumulative Gain)
A sophisticated, rank-aware metric that evaluates the quality of the ranking. It rewards relevant documents that appear higher in the search results, making it a powerful indicator of overall performance.
Mean Reciprocal Rank (MRR)
Focuses on how quickly the *first* correct answer is found. It's the average of the reciprocal ranks of the first relevant document for a set of queries.
Overlap & Diversity
Analyze the similarity between result sets from different configurations using Jaccard index and other overlap metrics. You can also measure result diversity by counting unique domains and titles.
Statistical Significance
The platform automatically runs pairwise statistical tests (like the Wilcoxon signed-rank test) to determine if the difference in performance between two configurations is statistically significant, helping you avoid making decisions based on random noise.