EDEN adaptively sets branching factor proportional to next-token entropy, achieving better accuracy per expansion than fixed beam search while providing a proof that monotone entropy-based branching outperforms any fixed budget allocation.
hub
ISBN 979-8-89176-251-0
14 Pith papers cite this work. Polarity classification is still indexing.
hub tools
citation-role summary
citation-polarity summary
roles
background 2representative citing papers
DialectLLM generates parallel multi-dialect dialog data and a 50k-dialog benchmark showing frontier LLMs achieve under 70% accuracy on dialect tasks while the generated data can improve post-training.
Priority ranking offers a low-cost direct evaluation for harness optimizers that correlates with their real multi-step optimization performance, supported by the Shor dataset of 182 scenarios.
ConfLayers dynamically skips LLM layers based on confidence scores to create adaptive draft models for self-speculative decoding, reporting up to 1.4x speedup over standard generation.
Long-context LLMs refuse explicit harmful requests but often comply when the same harmful goals must be inferred from distributed fragments in long contexts.
WorldCup is a new multi-bit LLM watermarking framework that models token sampling as a communication channel and uses hierarchical competition with entropy-aware modulation for robust message embedding and recovery.
RE-TAB uses a deterministic LCS-based table-state reward for stepwise guidance and test-time scaling, raising LLM table-reasoning accuracy by 26.7 pp on average across six backbones and three benchmarks.
SenSE adds language-model semantic guidance to flow-matching generative speech enhancement via a dual-path masked conditioning strategy and reports SOTA results on distorted speech.
Survey that defines agentic RL for LLMs via POMDPs, introduces a taxonomy of planning/tool-use/memory/reasoning capabilities and domains, and compiles open environments from over 500 papers.
LIFT fine-tunes short-context LLMs on long inputs with synthetic tasks to absorb information into parameters, enabling answers without the input present at inference.
LLMs share task-specific attention heads across prompting styles, with activation strength explaining performance differences and failures arising from competing representations.
A survey synthesizing LLM methods for peer review generation, post-review tasks like rebuttals and meta-reviews, evaluation approaches, datasets, and future directions in AI-assisted academic publishing.
citing papers explorer
-
Entropy-informed Decoding: Adaptive Information-Driven Branching
EDEN adaptively sets branching factor proportional to next-token entropy, achieving better accuracy per expansion than fixed beam search while providing a proof that monotone entropy-based branching outperforms any fixed budget allocation.
-
DialectLLM: A Dialect-Aware Dialog[ue] Generation Framework Beyond Standard American English
DialectLLM generates parallel multi-dialect dialog data and a 50k-dialog benchmark showing frontier LLMs achieve under 70% accuracy on dialect tasks while the generated data can improve post-training.
-
Towards Direct Evaluation of Harness Optimizers via Priority Ranking
Priority ranking offers a low-cost direct evaluation for harness optimizers that correlates with their real multi-step optimization performance, supported by the Shor dataset of 182 scenarios.
-
ConfLayers: Adaptive Confidence-based Layer Skipping for Self-Speculative Decoding
ConfLayers dynamically skips LLM layers based on confidence scores to create adaptive draft models for self-speculative decoding, reporting up to 1.4x speedup over standard generation.
-
Do Reasoning LLMs Refuse What They Infer in Long Contexts?
Long-context LLMs refuse explicit harmful requests but often comply when the same harmful goals must be inferred from distributed fragments in long contexts.
-
WorldCup Sampling for Multi-bit LLM Watermarking
WorldCup is a new multi-bit LLM watermarking framework that models token sampling as a communication channel and uses hierarchical competition with entropy-aware modulation for robust message embedding and recovery.
-
Enhancing Table Reasoning with Deterministic Table-State Rewards
RE-TAB uses a deterministic LCS-based table-state reward for stepwise guidance and test-time scaling, raising LLM table-reasoning accuracy by 26.7 pp on average across six backbones and three benchmarks.
-
SenSE: Semantic-Aware High-Fidelity Universal Speech Enhancement
SenSE adds language-model semantic guidance to flow-matching generative speech enhancement via a dual-path masked conditioning strategy and reports SOTA results on distorted speech.
-
The Landscape of Agentic Reinforcement Learning for LLMs: A Survey
Survey that defines agentic RL for LLMs via POMDPs, introduces a taxonomy of planning/tool-use/memory/reasoning capabilities and domains, and compiles open environments from over 500 papers.
-
LIFT: A Novel Framework for Enhancing Long-Context Understanding of LLMs via Long Input Fine-Tuning
LIFT fine-tunes short-context LLMs on long inputs with synthetic tasks to absorb information into parameters, enabling answers without the input present at inference.
-
Shared Lexical Task Representations Explain Behavioral Variability In LLMs
LLMs share task-specific attention heads across prompting styles, with activation strength explaining performance differences and failures arising from competing representations.
-
Can AI Be a Good Peer Reviewer? A Survey of Peer Review Process, Evaluation, and the Future
A survey synthesizing LLM methods for peer review generation, post-review tasks like rebuttals and meta-reviews, evaluation approaches, datasets, and future directions in AI-assisted academic publishing.
- ANCHOR: Abductive Network Construction with Hierarchical Orchestration for Reliable Probability Inference in Large Language Models
- Distributional Open-Ended Evaluation of LLM Cultural Value Alignment Based on Value Codebook