ProVoice-Bench is the first framework to evaluate proactive voice agents, revealing that state-of-the-art multimodal LLMs struggle with over-triggering and context-aware reasoning.
Deepseekmath: Pushing the limits of mathematical reasoning in open language models,
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
verdicts
UNVERDICTED 2representative citing papers
GroupRank uses groupwise LLM reranking with answer-free data synthesis and a group-ranking reward to reach 65.2 NDCG@10 on BRIGHT while providing 6.4x faster inference than listwise baselines.
citing papers explorer
-
From Reactive to Proactive: Assessing the Proactivity of Voice Agents via ProVoice-Bench
ProVoice-Bench is the first framework to evaluate proactive voice agents, revealing that state-of-the-art multimodal LLMs struggle with over-triggering and context-aware reasoning.
-
GroupRank: A Groupwise Paradigm for Effective and Efficient Passage Reranking with LLMs
GroupRank uses groupwise LLM reranking with answer-free data synthesis and a group-ranking reward to reach 65.2 NDCG@10 on BRIGHT while providing 6.4x faster inference than listwise baselines.