Omni-think: Scaling cross-domain generalization in llms via multi-task rl with hybrid rewards

Derek Li, Jiaming Zhou, Amirreza Kazemi, Qianyi Sun, Abbas Ghaddar, Mohammad Ali Alomrani, Liheng Ma, Yu Luo, Dong Li, Feng Wen, et al · 2025 · arXiv 2507.14783

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

read on arXiv browse 3 citing papers

representative citing papers

SPEED-Bench: A Unified and Diverse Benchmark for Speculative Decoding

cs.DC · 2026-02-10 · unverdicted · novelty 6.0

SPEED-Bench is a new standardized benchmark for speculative decoding that supplies semantically diverse qualitative data and throughput-oriented splits across concurrency levels, integrated with vLLM and TensorRT-LLM.

SPARD: Self-Paced Curriculum for RL Alignment via Integrating Reward Dynamics and Data Utility

cs.AI · 2026-04-09 · unverdicted · novelty 4.0

SPARD dynamically tunes multi-objective reward weights and data importance in LLM reinforcement learning alignment using a self-paced curriculum driven by reward dynamics and data utility.

A Survey of Reinforcement Learning for Large Reasoning Models

cs.CL · 2025-09-10 · accept · novelty 3.0

A survey compiling RL methods, challenges, data resources, and applications for enhancing reasoning in large language models and large reasoning models since DeepSeek-R1.

citing papers explorer

Showing 3 of 3 citing papers.

SPEED-Bench: A Unified and Diverse Benchmark for Speculative Decoding cs.DC · 2026-02-10 · unverdicted · none · ref 23
SPEED-Bench is a new standardized benchmark for speculative decoding that supplies semantically diverse qualitative data and throughput-oriented splits across concurrency levels, integrated with vLLM and TensorRT-LLM.
SPARD: Self-Paced Curriculum for RL Alignment via Integrating Reward Dynamics and Data Utility cs.AI · 2026-04-09 · unverdicted · none · ref 3
SPARD dynamically tunes multi-objective reward weights and data importance in LLM reinforcement learning alignment using a self-paced curriculum driven by reward dynamics and data utility.
A Survey of Reinforcement Learning for Large Reasoning Models cs.CL · 2025-09-10 · accept · none · ref 270
A survey compiling RL methods, challenges, data resources, and applications for enhancing reasoning in large language models and large reasoning models since DeepSeek-R1.

Omni-think: Scaling cross-domain generalization in llms via multi-task rl with hybrid rewards

fields

years

verdicts

representative citing papers

citing papers explorer