super hub Mixed citations

Text Embeddings by Weakly-Supervised Contrastive Pre-training

Binxing Jiao, Daxin Jiang, Liang Wang, Linjun Yang, Nan Yang, Xiaolong Huang · 2022 · cs.CL · arXiv 2212.03533

Mixed citation behavior. Most common role is method (39%).

168 Pith papers citing it

Method 39% of classified citations

open full Pith review browse 168 citing papers more from Binxing Jiao arXiv PDF

abstract

This paper presents E5, a family of state-of-the-art text embeddings that transfer well to a wide range of tasks. The model is trained in a contrastive manner with weak supervision signals from our curated large-scale text pair dataset (called CCPairs). E5 can be readily used as a general-purpose embedding model for any tasks requiring a single-vector representation of texts such as retrieval, clustering, and classification, achieving strong performance in both zero-shot and fine-tuned settings. We conduct extensive evaluations on 56 datasets from the BEIR and MTEB benchmarks. For zero-shot settings, E5 is the first model that outperforms the strong BM25 baseline on the BEIR retrieval benchmark without using any labeled data. When fine-tuned, E5 obtains the best results on the MTEB benchmark, beating existing embedding models with 40x more parameters.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 10 method 9 other 2 baseline 1 dataset 1

citation-polarity summary

use method 9 background 8 support 2 unclear 2 baseline 1 use dataset 1

claims ledger

abstract This paper presents E5, a family of state-of-the-art text embeddings that transfer well to a wide range of tasks. The model is trained in a contrastive manner with weak supervision signals from our curated large-scale text pair dataset (called CCPairs). E5 can be readily used as a general-purpose embedding model for any tasks requiring a single-vector representation of texts such as retrieval, clustering, and classification, achieving strong performance in both zero-shot and fine-tuned settings. We conduct extensive evaluations on 56 datasets from the BEIR and MTEB benchmarks. For zero-shot se

authors

Binxing Jiao Daxin Jiang Liang Wang Linjun Yang Nan Yang Xiaolong Huang

co-cited works

representative citing papers

Is Dimensionality a Barrier for Retrieval Models?

cs.LG · 2026-05-22 · unverdicted · novelty 8.0

Dimension d = O(m^{-2} log n) nearly achieves the optimal margin m^rd(+∞, A) for retrieval embeddings, with matching lower bounds showing d = O(k log(n/k)) suffices and is necessary for m = Θ(k^{-1/2}) on k-sparse query matrices.

STRABLE: Benchmarking Tabular Machine Learning with Strings

cs.LG · 2026-05-12 · unverdicted · novelty 8.0

A new corpus of 108 mixed string-numeric tables shows that advanced tabular learners with basic string embeddings perform well on most real-world data, while large LLM encoders help on free-text heavy tables.

FollowTable: A Benchmark for Instruction-Following Table Retrieval

cs.IR · 2026-05-01 · unverdicted · novelty 8.0

FollowTable is the first large-scale benchmark for instruction-following table retrieval, paired with an Instruction Responsiveness Score, showing that existing models fail to adapt to fine-grained constraints beyond topical similarity.

ALEE: Any-Language Evaluation of Embeddings via English-Centric Minimal Pairs

cs.CL · 2026-06-30 · unverdicted · novelty 7.0

ALEE generates AMR-based English minimal pairs with fine-grained semantic shifts, translates them, and evaluates embedding models on 275+ languages to expose cross-lingual gaps linked to training data and tokenization.

Diagnosing and Mitigating Retrieval Bottlenecks in LLM-Based Cold-Start Recommendation

cs.IR · 2026-06-29 · conditional · novelty 7.0

Retrieval coverage limits LLM rerankers in cold-start recommendation; a learned hybrid fusion improves pool quality but LLM reranking often degrades end-to-end performance while simpler rankers exploit the pool.

Anisotropy Decides Cosine vs. Rank Metrics for Text Embeddings

cs.CL · 2026-06-28 · conditional · novelty 7.0

Anisotropy, quantified by dominant-dimension variance fraction, determines the best parameter-free similarity metric for text embeddings, with rank-based metrics gaining ~20% relative where cosine is weakest.

Agentic Abstention: Do Agents Know When to Stop Instead of Act?

cs.AI · 2026-06-27 · unverdicted · novelty 7.0

LLM agents often fail to abstain at the right time in uncertain multi-turn tasks, and the CONVOLVE context engineering method raises timely abstention rates on WebShop from 26.7 to 57.4 without parameter updates.

A Sensitivity-Aware Test Collection for Search Among Personal Information

cs.IR · 2026-06-25 · accept · novelty 7.0

A new sensitivity-labeled test collection is released from Enron emails with crowdsourced queries, relevance judgments, and LLM extensions for evaluating sensitivity-aware search.

The Voronoi Bottleneck: Capacity-Aware Dense Retrieval for Product Search

cs.IR · 2026-06-09 · unverdicted · novelty 7.0

Proves Voronoi complexity equals sign-rank for top-1 retrieval, introduces CUS diagnostic predicting retrieval failure at AUC >0.8 without labels, and AT-DW-InfoNCE objective with derived alpha^*=2.0 that improves Recall@100 on synthetic data.

Operation-Guided Progressive Human-to-AI Text Transformation Benchmark for Multi-Granularity AI-Text Detection

cs.CL · 2026-06-04 · unverdicted · novelty 7.0

OpAI-Bench provides a new benchmark for evaluating AI-text detectors on progressively human-AI co-edited documents at multiple granularities, revealing non-monotonic detection patterns.

ImageAuditor: Membership Inference Attack against Image-based Retrieval-Augmented Generation

cs.CR · 2026-06-02 · unverdicted · novelty 7.0

ImageAuditor is the first MIA for IRAG that achieves over 80% AUROC with four queries by using reward-guided policy optimization for cross-modal retrieval and task-specific prompting for signal extraction.

SEA-Embedding: Open and Reproducible Text Embeddings for Southeast Asia

cs.CL · 2026-06-02 · unverdicted · novelty 7.0

SEA-Embedding is a fully open text embedding pipeline for Southeast Asian languages that achieves state-of-the-art performance on the SEA-BED benchmark by analyzing data composition, training objectives, and base encoder choices.

MaskForge: Structure-Aware Adaptive Attacks for Jailbreaking Diffusion Large Language Models

cs.CR · 2026-06-01 · unverdicted · novelty 7.0

MaskForge reaches 79.3% average attack success rate on five dLLMs by adaptively searching and accumulating structural attack patterns with a UCB bandit, improving 17.6% over baselines and transferring to 88.2% on AdvBench.

Test-Time Training for Zero-Resource Dense Retrieval Reranking

cs.IR · 2026-05-31 · unverdicted · novelty 7.0

DART adapts a scoring matrix at inference time via gradient updates on pseudo-labels from top/bottom documents to gain +2.1% mean NDCG@10 on six BEIR benchmarks with under 10ms added latency.

Skill or Skip? Learning Selective Skill Invocation in Agentic Tasks via Dual-Granularity Preference Learning

cs.CL · 2026-05-30 · unverdicted · novelty 7.0

SelSkill applies dual-granularity preference learning to selective skill-or-skip decisions, improving task success by 10.9 points and execution precision by 29.1 points on ALFWorld with Qwen3-8B.

Hijacking Agent Memory: Stealthy Trojan Attacks Through Conversational Interaction

cs.CR · 2026-05-28 · unverdicted · novelty 7.0

MemPoison enables stealthy memory poisoning in LLM agents via dialogue by using semantic relational bridges, entity masquerading, and joint embedding optimization to bypass selective extraction and rewriting, achieving up to 0.95 attack success rate.

Towards Cost-effective LLMs Routing with Batch Prompting

cs.DB · 2026-05-27 · unverdicted · novelty 7.0

RoBatch is a two-stage framework that formulates and solves the joint Route with Batching Problem via a batch-aware proxy utility model and greedy scheduling, outperforming separate routing or batching baselines on six benchmarks.

The Harder Text Embedding Benchmark (HTEB): Beyond One-dimensional Static Robustness

cs.CL · 2026-05-27 · unverdicted · novelty 7.0

HTEB introduces dynamic, multi-axis evaluation of text embedding robustness using LLM transformations, finding decoupled profiles across models and that scaling does not close all robustness gaps.

IdioLink: Retrieving Meaning Beyond Words Across Idiomatic and Literal Expressions

cs.CL · 2026-05-21 · unverdicted · novelty 7.0

IdioLink introduces a benchmark dataset and evaluation showing that strong embedding models struggle to retrieve equivalent meanings across idiomatic and literal forms, relying on shallow cues instead.

Generative Conversational Recommender System

cs.IR · 2026-05-21 · unverdicted · novelty 7.0

A single autoregressive model for conversational recommendation that uses semantic item IDs, predicts response intent and target first, then generates the response, reporting up to 29% Recall@1 gains.

Linked Multi-Model Data on Russian Domestic and Foreign Policy Speeches

cs.CL · 2026-05-15 · unverdicted · novelty 7.0

A new linked multimodal dataset of Russian domestic and foreign policy speeches with texts, images, captions, harmonized metadata, and expert-refined topic annotations is introduced to support analyses in political communication and LLM applications.

Think When Needed: Adaptive Reasoning-Driven Multimodal Embeddings with a Dual-LoRA Architecture

cs.CV · 2026-05-14 · unverdicted · novelty 7.0

TWN attaches separate reasoning and embedding LoRA adapters to a frozen backbone with gradient detachment and a self-supervised gate that decides per input whether to generate CoT, achieving SOTA on MMEB-V2 with 3-5% added parameters and up to 50% fewer reasoning tokens.

Very Efficient Listwise Multimodal Reranking for Long Documents

cs.IR · 2026-05-12 · unverdicted · novelty 7.0

ZipRerank delivers state-of-the-art multimodal listwise reranking accuracy for long documents at up to 10x lower latency via early interaction and single-pass scoring.

Breaking $\textit{Winner-Takes-All}$: Cooperative Policy Optimization Improves Diverse LLM Reasoning

cs.AI · 2026-05-12 · unverdicted · novelty 7.0 · 2 refs

GCPO uses team-level credit assignment via determinant volume over reward-weighted semantic embeddings to promote non-redundant correct reasoning paths, improving both accuracy and diversity in LLM training.

citing papers explorer

Showing 50 of 168 citing papers.

MultiHop-RAG: Benchmarking Retrieval-Augmented Generation for Multi-Hop Queries cs.CL · 2024-01-27 · accept · none · ref 42 · internal anchor
MultiHop-RAG is a new benchmark dataset demonstrating that existing retrieval-augmented generation systems perform poorly on multi-hop queries requiring retrieval and reasoning over multiple evidence pieces.
C-Pack: Packed Resources For General Chinese Embeddings cs.CL · 2023-09-14 · accept · none · ref 58 · internal anchor
C-Pack releases a new Chinese embedding benchmark, large training dataset, and optimized models that outperform priors by up to 10% on C-MTEB while also delivering English SOTA results.
KidnapRAG: A Black-Box Attack for Hijacking Reasoning in Agentic Retrieval-Augmented Generation Systems cs.CR · 2026-07-01 · unverdicted · none · ref 21 · internal anchor
KidnapRAG is a sequential black-box poisoning attack on Agentic RAG systems using Bait, Chain-Link, and Mal-Ins documents to redirect retrieval and reasoning, outperforming prior baselines.
The Decomposition Is the Fingerprint: Per-Component Identity for Agent Skills cs.CR · 2026-06-30 · unverdicted · none · ref 50 · internal anchor
A per-component SimHash fingerprint supplies structural identity for AI agent skills, recovering family membership under paraphrase and refactoring with AUC 0.974 while localizing changes.
Field Order Should Not Matter: Permutation-Invariant Embedding Model Fine-Tuning for Structured Metadata Retrieval cs.CL · 2026-06-29 · unverdicted · none · ref 15 · internal anchor
Permutation-invariant fine-tuning (PI-FT) randomizes field order and applies dropout during embedding model training to eliminate sensitivity to serialization order, reducing order-change penalty from 7.4 to 0.2 nDCG@10 on a generated multilingual DevDataBench while outperforming zero-shot baselines
Semantic insurance pricing with large language models stat.AP · 2026-06-28 · unverdicted · none · ref 57 · internal anchor
LLM embeddings from policy text outperform hand-engineered features in a GLM for French motor insurance claim frequency, with larger gains at small sample sizes and further improvement from insurance-specific fine-tuning.
SHARD: cell-keyed residual splitting for alignment-resistant private dense retrieval cs.CR · 2026-06-26 · unverdicted · none · ref 2 · 2 links · internal anchor
SHARD introduces cell-keyed residual splitting that turns dense retrieval embeddings into revocable, renewable, unlinkable templates resistant to alignment attacks while preserving exact utility under CKKS reranking.
Toward a Hybrid Digital Twin of Society: Quantifying Cognitive-Spatial Linkages Through Online-Offline Feedback Networks physics.soc-ph · 2026-06-25 · unverdicted · none · ref 12 · internal anchor
A Feedback Network model is developed showing online semantic exploration is more concentrated than physical mobility, with stable retail-business linkages and greater COVID disruption to spatial than cognitive routines, as a step toward hybrid digital twins of society.
Do Neural Retrievers Prefer Certain Documents? Evidence of Learned Relevance Priors cs.IR · 2026-06-01 · unverdicted · none · ref 8 · internal anchor
Supervised bi-encoder retrievers encode document-level relevance priors from annotation biases, producing a findability gap for documents lacking favored features such as comprehensiveness and mainstream topic coverage.
Policy and World Modeling Co-Training for Language Agents cs.LG · 2026-06-01 · unverdicted · none · ref 58 · internal anchor
PaW co-trains policy and world modeling on standard RL rollouts using action-entropy data selection, noise-tolerant loss, and reward-adaptive balancing, yielding consistent gains on three agent benchmarks.
Consistent and Distinctive: LLM Benchmark Efficiency via Maximum Independent Set Prompt Selection on Similarity Graphs cs.CL · 2026-05-31 · unverdicted · none · ref 28 · internal anchor
A graph-based MIS prompt selection method on embedding similarity graphs yields reduced benchmark subsets with highly consistent LLM rankings (Kendall's W ≥ 0.90 in 99.2% of cases) and 25-48% size reduction at higher thresholds.
On the Robustness of Multilingual Text Embedding Rankings Across Learning Tasks, Languages, and Benchmark Datasets cs.CL · 2026-05-29 · unverdicted · none · ref 25 · internal anchor
Meta-study of MTEB rankings introduces dataset-composition and ranking-scheme robustness indicators and finds only a small subset of models stay consistently strong across tasks, languages, and evaluation variations.
Xetrieval: Mechanistically Explaining Dense Retrieval cs.AI · 2026-05-28 · unverdicted · none · ref 9 · internal anchor
Xetrieval enriches sentence embeddings with a single-pass reasoning internalizer and decomposes the result into sparse interpretable features whose overlaps explain individual dense-retrieval decisions.
ReasonOps: Operator Segmentation for LLM Reasoning Traces cs.AI · 2026-05-28 · unverdicted · none · ref 17 · internal anchor
Unsupervised clustering on sentence-initial 3-token pivots extracts 7 universal reasoning operators from 44k traces across 12 LLMs that enable model fingerprinting and answer-correctness prediction.
Search-E1: Self-Distillation Drives Self-Evolution in Search-Augmented Reasoning cs.AI · 2026-05-21 · unverdicted · none · ref 17 · 2 links · internal anchor
Search-E1 uses GRPO interleaved with on-policy self-distillation to reach 0.440 average EM on seven QA benchmarks with Qwen2.5-3B, outperforming open-source baselines.
Structure Retention in Embedding Spaces as a Predictor of Benchmark Performance cs.CL · 2026-05-21 · unverdicted · none · ref 129 · internal anchor
Embedding model performance on MTEB tasks correlates strongly with nearest-neighbor overlap and ICA magnitude differences in their embedding spaces.
When Cases Get Rare: A Retrieval Benchmark for Off-Guideline Clinical Question Answering cs.CL · 2026-05-20 · unverdicted · none · ref 3 · internal anchor
OGCaReBench is a new retrieval-focused benchmark for evaluating LLMs on off-guideline clinical questions from real case reports, showing retrieval augmentation raises accuracy from 56% to 82%.
When to Stop Reusing: Dynamic Gradient Gating for Sample-Efficient RLVR cs.LG · 2026-05-19 · unverdicted · none · ref 32 · internal anchor
Dynamic Gradient Gating monitors lm_head gradient norms to safely reuse rollout batches in RLVR, achieving up to 2.93x sample efficiency and 2.14x wall-clock speedup across math, ALFWorld, WebShop, and QA tasks.
S2Aligner: Pair-Efficient and Transferable Pre-Training for Sparse Text-Attributed Graphs cs.LG · 2026-05-18 · unverdicted · none · ref 35 · 2 links · internal anchor
S2Aligner decouples semantic and structural components in LLM-as-Aligner pre-training for sparse TAGs and uses structure-oriented reconstruction plus domain risk balancing to improve transferability and reduce generalization gaps.
SD-Search: On-Policy Hindsight Self-Distillation for Search-Augmented Reasoning cs.AI · 2026-05-18 · unverdicted · none · ref 29 · internal anchor
SD-Search derives step-level supervision for search queries in reasoning agents via on-policy hindsight self-distillation using the policy as both student and teacher.
Layer-wise Representation Dynamics: An Empirical Investigation Across Embedders and Base LLMs cs.LG · 2026-05-12 · unverdicted · none · ref 72 · internal anchor
LRD framework with Frenet, NRS, and GFMI metrics shows layer-wise structure in 31 models provides usable signal for model selection and pruning on MTEB tasks.
Test-Time Compute for Frozen Embedding Models through Agentic Program Search cs.LG · 2026-05-12 · unverdicted · none · ref 8 · 2 links · internal anchor
Agentic program search over a frozen encoder API yields retrieval programs that improve nDCG@10 on held-out tasks and unseen encoder families with no per-domain training.
UTS at PsyDefDetect: Multi-Agent Councils and Absence-Based Reasoning for Defense Mechanism Classification cs.AI · 2026-05-10 · unverdicted · none · ref 21 · 2 links · internal anchor
A deliberative council of Gemini agents using absence-based clinical rules achieves 0.382 F1 without fine-tuning and second place overall at 0.406 F1 on defense mechanism classification, with minority-class overrides adding 2.4pp.
TIDE-Bench: Task-Aware and Diagnostic Evaluation of Tool-Integrated Reasoning cs.AI · 2026-05-10 · unverdicted · none · ref 34 · internal anchor
TIDE-Bench is a new benchmark for tool-integrated reasoning that combines diverse tasks, multi-aspect metrics covering answer quality, process reliability, efficiency and cost, plus filtered challenging test sets.
PiCA: Pivot-Based Credit Assignment for Search Agentic Reinforcement Learning cs.AI · 2026-05-10 · unverdicted · none · ref 37 · 2 links · internal anchor
PiCA uses pivot-based potential rewards derived from historical sub-queries to supply trajectory-aware step guidance in agentic RL, delivering 15% gains on QA benchmarks for 3B/7B models.
SearchSkill: Teaching LLMs to Use Search Tools with Evolving Skill Banks cs.AI · 2026-05-09 · unverdicted · none · ref 32 · 3 links · internal anchor
SearchSkill improves LLM query planning on knowledge QA by using explicit skill selection from an evolving SkillBank and a two-stage SFT process that aligns training with inference-time skill-grounded execution.
Do not copy and paste! Rewriting strategies for code retrieval cs.SE · 2026-05-08 · conditional · none · ref 14 · internal anchor
Full natural-language rewriting of code and queries boosts retrieval on code benchmarks while corpus-only rewriting often hurts, with token entropy difference serving as a cheap predictor of gains.
RRCM: Ranking-Driven Retrieval over Collaborative and Meta Memories for LLM Recommendation cs.IR · 2026-05-08 · unverdicted · none · ref 35 · internal anchor
RRCM trains an LLM to dynamically retrieve from collaborative and meta memories using group relative policy optimization driven by final top-k recommendation quality.
Superintelligent Retrieval Agent: The Next Frontier of Agentic Retrieval cs.IR · 2026-05-07 · unverdicted · none · ref 12 · 2 links · internal anchor
SIRA compresses multi-round exploratory retrieval into one corpus-discriminative BM25 action via LLM document enrichment, query-time term prediction, and corpus-statistic filtering, reporting top average performance on ten BEIR benchmarks and strong results on BrowseComp-Wikipedia without relevance
Measuring Black-Box Confidence via Reasoning Trajectories: Geometry, Coverage, and Verbalization cs.AI · 2026-05-07 · unverdicted · none · ref 16 · internal anchor
Trajectory geometry in embedding space fused with coverage and verbalization yields better black-box CoT confidence estimation than self-consistency at lower sample counts across six benchmark-reasoner pairs.
A$^2$TGPO: Agentic Turn-Group Policy Optimization with Adaptive Turn-level Clipping cs.CL · 2026-05-07 · unverdicted · none · ref 35 · internal anchor
A²TGPO improves RL policy optimization for multi-turn agentic LLMs by normalizing information gain within same-depth turn groups, rescaling cumulative advantages by sqrt of term count, and modulating clipping ranges per turn's normalized IG.
CASCADE: Case-Based Continual Adaptation for Large Language Models During Deployment cs.AI · 2026-05-05 · unverdicted · none · ref 34 · internal anchor
CASCADE enables LLMs to continually adapt at deployment via case-based episodic memory and contextual bandits, improving macro-averaged success by 20.9% over zero-shot on 16 tasks spanning medicine, law, code, and robotics.
NH-CROP: Robust Pricing for Governed Language Data Assets under Cost Uncertainty cs.AI · 2026-05-03 · unverdicted · none · ref 5 · internal anchor
NH-CROP introduces a robust online pricing method for governed language data with uncertain costs, using a selective verification gate that improves or matches baselines without relying heavily on paid information acquisition.
Verbal-R3: Verbal Reranker as the Missing Bridge between Retrieval and Reasoning cs.CL · 2026-05-02 · unverdicted · none · ref 54 · internal anchor
Verbal-R3 uses a verbal reranker to generate analytic narratives that guide retrieval and reasoning in LLMs, achieving SOTA results on complex QA benchmarks.
Kernel Affine Hull Machines as Compute-Efficient Encoders for Frozen Semantic Spaces cs.LG · 2026-05-01 · unverdicted · none · ref 48 · 2 links · internal anchor
KAHM yields a compute-efficient query encoder that outperforms matched learned adapters in reconstructing a frozen Mixedbread embedding space on an Austrian-law retrieval task while delivering an 8.53x CPU speedup.
Is Textual Similarity Invariant under Machine Translation? Evidence Based on the Political Manifesto Corpus cs.CL · 2026-05-01 · unverdicted · none · ref 74 · internal anchor
Machine translation preserves embedding similarity structure for ten languages but distorts it for four in the Manifesto Corpus, via a new non-inferiority testing framework.
Iterative Definition Refinement for Zero-Shot Classification via LLM-Based Semantic Prototype Optimization cs.CV · 2026-04-30 · unverdicted · none · ref 28 · internal anchor
Iterative LLM-based refinement of category definitions improves zero-shot classification performance across 13 embedding models on a new 10-category web URL benchmark.
AutoSearch: Adaptive Search Depth for Efficient Agentic RAG via Reinforcement Learning cs.AI · 2026-04-19 · unverdicted · none · ref 39 · internal anchor
AutoSearch applies RL with a self-answering reward to adaptively determine minimal sufficient search depth in agentic RAG, reducing over-searching while maintaining answer quality on complex questions.
RoTRAG: Rule of Thumb Reasoning for Conversation Harm Detection with Retrieval-Augmented Generation cs.CL · 2026-04-19 · unverdicted · none · ref 40 · internal anchor
RoTRAG retrieves Rules of Thumb to ground LLM reasoning for harm detection and severity classification in multi-turn dialogues, reporting roughly 40% relative F1 gains and 8.4% lower distributional error on two safety benchmarks while cutting redundant retrieval.
REZE: Representation Regularization for Domain-adaptive Text Embedding Pre-finetuning cs.CL · 2026-04-19 · unverdicted · none · ref 64 · internal anchor
REZE controls representation shifts in contrastive pre-finetuning of text embeddings via eigenspace decomposition of anchor-positive pairs and adaptive soft-shrinkage on task-variant directions.
Sketching the Readout of Large Language Models for Scalable Data Attribution and Valuation cs.LG · 2026-04-17 · unverdicted · none · ref 52 · internal anchor
RISE applies CountSketch to dual lexical and semantic channels derived from output-layer gradient outer products, cutting data attribution storage by up to 112x and enabling retrospective and prospective influence analysis on LLMs up to 32B parameters.
ViLL-E: Video LLM Embeddings for Retrieval cs.CV · 2026-04-13 · unverdicted · none · ref 50 · internal anchor
ViLL-E introduces a dynamic embedding mechanism and joint contrastive-generative training for VideoLLMs, delivering up to 7% gains in temporal localization and 4% in video retrieval while enabling new zero-shot capabilities.
DualView: Adaptive Local-Global Fusion for Multi-Hop Document Reranking cs.IR · 2026-04-13 · unverdicted · none · ref 31 · internal anchor
DualView fuses local cross-attention and global context aggregation via adaptive gating to rerank fixed candidate sets for multi-hop QA, reporting 99.4% Top-4 Recall on MuSiQue at 4 ms latency while beating larger cross-encoders.
Rag Performance Prediction for Question Answering cs.CL · 2026-04-09 · unverdicted · none · ref 26 · internal anchor
A novel supervised predictor modeling semantic relationships among question, retrieved passages, and generated answer best forecasts when RAG improves QA performance.
Reasoning-Based Refinement of Unsupervised Text Clusters with LLMs cs.CL · 2026-04-08 · unverdicted · none · ref 50 · internal anchor
LLM reasoning refines unsupervised text clusters via coherence checks, redundancy removal, and label grounding, yielding better coherence and human-aligned labels on social media data.
AV-SQL: Decomposing Complex Text-to-SQL Queries with Agentic Views cs.DB · 2026-04-08 · unverdicted · none · ref 44 · internal anchor
AV-SQL uses a pipeline of LLM agents to generate intermediate CTE views that decompose complex Text-to-SQL queries, reaching 70.38% execution accuracy on Spider 2.0.
Data, Not Model: Explaining Bias toward LLM Texts in Neural Retrievers cs.IR · 2026-04-07 · unverdicted · none · ref 22 · internal anchor
Bias toward LLM texts in neural retrievers arises from artifact imbalances between positive and negative documents in training data that are absorbed during contrastive learning.
JU\'A -- A Benchmark for Information Retrieval in Brazilian Legal Text Collections cs.IR · 2026-04-07 · accept · none · ref 19 · internal anchor
JU'A is a new heterogeneous benchmark for Brazilian legal IR that distinguishes retrieval methods and shows domain-adapted models excel on aligned subsets while BM25 stays competitive elsewhere.
Are LLM-Based Retrievers Worth Their Cost? An Empirical Study of Efficiency, Robustness, and Reasoning Overhead cs.IR · 2026-04-04 · accept · none · ref 52 · internal anchor
Empirical comparison across 14 retrievers on the BRIGHT benchmark shows reasoning-specialized models can match strong accuracy with competitive speed while many large LLM bi-encoders add latency for small gains and confidence scores remain poorly calibrated.
Stream2LLM: Overlap Context Streaming and Prefill for Reduced Time-to-First-Token (TTFT) cs.DB · 2026-03-29 · unverdicted · none · ref 1 · 2 links · internal anchor
Stream2LLM introduces adaptive scheduling and preemption for append-mode and update-mode context streaming in disaggregated LLM deployments, delivering up to 11x TTFT improvements on real-world workloads while preserving throughput.

Text Embeddings by Weakly-Supervised Contrastive Pre-training

hub tools

citation-role summary

citation-polarity summary

claims ledger

authors

co-cited works

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer