Dense dual-encoder retrievers outperform BM25 by 9-19% absolute in top-20 passage retrieval accuracy across open-domain QA datasets and enable new state-of-the-art end-to-end QA results.
hub Mixed citations
Billion-scale similarity search with GPUs
Mixed citation behavior. Most common role is method (60%).
abstract
Similarity search finds application in specialized database systems handling complex data such as images or videos, which are typically represented by high-dimensional features and require specific indexing structures. This paper tackles the problem of better utilizing GPUs for this task. While GPUs excel at data-parallel tasks, prior approaches are bottlenecked by algorithms that expose less parallelism, such as k-min selection, or make poor use of the memory hierarchy. We propose a design for k-selection that operates at up to 55% of theoretical peak performance, enabling a nearest neighbor implementation that is 8.5x faster than prior GPU state of the art. We apply it in different similarity search scenarios, by proposing optimized design for brute-force, approximate and compressed-domain search based on product quantization. In all these setups, we outperform the state of the art by large margins. Our implementation enables the construction of a high accuracy k-NN graph on 95 million images from the Yfcc100M dataset in 35 minutes, and of a graph connecting 1 billion vectors in less than 12 hours on 4 Maxwell Titan X GPUs. We have open-sourced our approach for the sake of comparison and reproducibility.
hub tools
citation-role summary
citation-polarity summary
representative citing papers
BEIR is a heterogeneous zero-shot benchmark showing BM25 as a robust baseline while re-ranking and late-interaction models perform best on average at higher cost, with dense and sparse models lagging in generalization.
Sentence-BERT adapts BERT with siamese and triplet networks to produce sentence embeddings for efficient cosine-similarity comparisons, cutting computation time from hours to seconds on similarity search while matching BERT accuracy.
HERMES provides a reusable hierarchical labeling substrate for pre-training data that reveals granularity-specific effects in data mixing rules during model training.
RWGBench is a citation-centric benchmark for related work generation built from 40k CS papers and a 100-paper test set, with multi-dimensional metrics that better match human expert judgment than standard similarity scores.
A 527-item GDPR-aligned privacy preference item bank was developed by extracting 669 statements from 99 GDPR articles and validating them through multi-round expert consensus and semantic clustering.
MIST is a new simulator for heterogeneous multi-stage LLM inference that combines hardware traces with analytical models to explore configuration trade-offs in hybrid CPU-accelerator systems.
ExaGPT uses span-level similarity retrieval from human and LLM datastores to detect machine-generated text while supplying the matching spans as human-interpretable evidence, achieving up to 37-point accuracy gains over prior interpretable detectors at 1% FPR.
RAG models set new state-of-the-art results on open-domain QA by retrieving Wikipedia passages and conditioning a generative model on them, while also producing more factual text than parametric baselines.
A per-component SimHash fingerprint supplies structural identity for AI agent skills, recovering family membership under paraphrase and refactoring with AUC 0.974 while localizing changes.
Large-scale HPC evaluation of Qdrant, Milvus, and Weaviate reveals that workload patterns limit scaling and extra cores can reduce throughput, exposing a cloud-to-HPC design mismatch.
NTILC replaces in-context tool registry lookup with learned latent retrieval using a signature-aware composite loss, reducing context consumption by over 95% and latency by up to 74%.
A pipeline generates 1,732 valid FHIR bundles from clinician cases, revealing lower LLM diagnostic accuracy on structured inputs than on plain text.
CourseBlueprint builds a typed pipeline over a 23-lecture biomedical imaging corpus to generate prerequisite-aware, learner-adaptive videos with auditable engagement contracts and slide grounding.
TGQ-Former uses metadata-guided hybrid queries and dual-gated modulation to improve visual token selection in multimodal e-commerce retrieval, raising average Hit Rate@100 by 6.04% over baselines.
DCLM-Baseline dataset lets a 7B model reach 64% 5-shot MMLU accuracy after 2.6T tokens, beating prior open-data models by 6.6 points on MMLU with 40% less compute.
RAPTOR introduces a tree-organized retrieval method using recursive abstractive summaries, achieving a 20% absolute accuracy improvement on the QuALITY benchmark when paired with GPT-4.
UAGA aligns two graph embedding spaces via adversarial training in a fully unsupervised setting, with an incremental extension iUAGA that uses discovered pseudo-anchors to refine both embeddings and alignments.
Pyramid is a distributed similarity search framework based on HNSW that partitions datasets into similar-item sub-datasets for efficient query processing and includes failure recovery and straggler mitigation.
QRAFTI is a multi-agent framework using tool-calling and reflection-based planning to emulate quant research tasks like factor replication and signal testing on financial data.
A multi-agent system for explainable fake news detection that decomposes claims, retrieves evidence, verifies with calibrated confidence, and aggregates logic verdicts, showing better interpretability than BERT/RoBERTa on the LIAR benchmark despite lower raw accuracy.
A hybrid graph-text retrieval system for cyber threat intelligence improves multi-hop question answering by up to 35% over vector-based RAG on a 3,300-question benchmark.
CICL scores and compresses context evidence for LLM agents via action-shift and outcome-uplift metrics, lifting hit@1 from 0.58 to 0.78 on 50 SWE-bench retrieval tasks.
ESGLens applies RAG and LLM embeddings to extract GRI-aligned information from ESG reports and achieves 0.48 Pearson correlation when regressing environmental scores on 300 company reports.
citing papers explorer
-
Beyond RAG for Cyber Threat Intelligence: A Systematic Evaluation of Graph-Based and Agentic Retrieval
A hybrid graph-text retrieval system for cyber threat intelligence improves multi-hop question answering by up to 35% over vector-based RAG on a 3,300-question benchmark.