StreamPro introduces a benchmark and training method using CB-Stream Loss and GRPO to enable proactive decision-making in streaming videos, achieving 41.5 on StreamPro-Bench compared to 10.4 previously.
Rank analysis of incomplete block designs: I. the method of paired comparisons
8 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 1polarities
background 1representative citing papers
LLMs predict story points better in zero-shot prompting than supervised deep learning models trained on 80% of project data, with few-shot examples and comparative judgments further improving performance.
Prefer-DAS integrates self-training, prompt-guided contrastive learning, local direct preference optimization (LPO), and unsupervised preference optimization (UPO) to achieve effective domain adaptive segmentation in electron microscopy using sparse prompts and local preferences.
PrefMoE learns multiple reward experts with adaptive soft routing and a load-balancing regularizer to capture diverse latent preferences under noisy supervision, improving robustness over single-model baselines on D4RL locomotion and MetaWorld manipulation tasks.
LCPO reduces average LRM output length by over 50% across benchmarks via targeted preference optimization while preserving reasoning performance.
A new CoVQD-guided retrieval-augmented generation framework improves multimodal LLMs on visual question answering by using structured reasoning to retrieve better external knowledge.
Reinforcement learning is advanced for communication-efficient federated optimization and for preference-aligned, contextually safe policies in large language models.
citing papers explorer
-
StreamPro: From Reactive Perception to Proactive Decision-Making in Streaming Video
StreamPro introduces a benchmark and training method using CB-Stream Loss and GRPO to enable proactive decision-making in streaming videos, achieving 41.5 on StreamPro-Bench compared to 10.4 previously.
-
Story Point Estimation Using Large Language Models
LLMs predict story points better in zero-shot prompting than supervised deep learning models trained on 80% of project data, with few-shot examples and comparative judgments further improving performance.
-
Prefer-DAS: Learning from Local Preferences and Sparse Prompts for Domain Adaptive Segmentation of Electron Microscopy
Prefer-DAS integrates self-training, prompt-guided contrastive learning, local direct preference optimization (LPO), and unsupervised preference optimization (UPO) to achieve effective domain adaptive segmentation in electron microscopy using sparse prompts and local preferences.
-
PrefMoE: Robust Preference Modeling with Mixture-of-Experts Reward Learning
PrefMoE learns multiple reward experts with adaptive soft routing and a load-balancing regularizer to capture diverse latent preferences under noisy supervision, improving robustness over single-model baselines on D4RL locomotion and MetaWorld manipulation tasks.
-
Pruning Long Chain-of-Thought of Large Reasoning Models via Small-Scale Preference Optimization
LCPO reduces average LRM output length by over 50% across benchmarks via targeted preference optimization while preserving reasoning performance.
-
Enhancing Visual Question Answering with Multimodal LLMs via Chain-of-Question Guided Retrieval-Augmented Generation
A new CoVQD-guided retrieval-augmented generation framework improves multimodal LLMs on visual question answering by using structured reasoning to retrieve better external knowledge.
-
Reinforcement Learning for Scalable and Trustworthy Intelligent Systems
Reinforcement learning is advanced for communication-efficient federated optimization and for preference-aligned, contextually safe policies in large language models.
- RAS: a Reliability Oriented Metric for Automatic Speech Recognition