Gsm-infinite: How do your llms behave over infinitely increasing context length and reasoning complexity?

11 Factored Causal Representation Learning for Robust Reward Modeling in RLHF Zhou, Y · 2025 · arXiv 2502.05252

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

representative citing papers

LEAD: Breaking the No-Recovery Bottleneck in Long-Horizon Reasoning

cs.AI · 2026-03-06 · unverdicted · novelty 7.0

LEAD lets LLMs solve checkers jumping puzzles up to size 13 by using lookahead to recover from irreversible errors on hard steps that break extreme decomposition.

Bridging Structure and Language: Graph-Based Visual Reasoning for Autonomous Road Understanding

cs.CV · 2026-05-20 · unverdicted · novelty 6.0

A graph-grounded Combined Road Substrate framework generates traceable QA pairs from road maps to improve small VLMs on compositional road reasoning tasks.

The Power of Power Law: Asymmetry Enables Compositional Reasoning

cs.AI · 2026-04-24 · unverdicted · novelty 6.0

Power-law data sampling creates beneficial asymmetry in the loss landscape that lets models acquire high-frequency skill compositions first, enabling more efficient learning of rare long-tail skills than uniform distributions.

Factored Causal Representation Learning for Robust Reward Modeling in RLHF

cs.LG · 2026-01-29 · unverdicted · novelty 6.0

A factored causal representation learning method improves robustness of reward models in RLHF by isolating causal factors from biases like length and sycophancy using adversarial gradient reversal.

MiMo-V2-Flash Technical Report

cs.CL · 2026-01-06 · unverdicted · novelty 5.0

MiMo-V2-Flash is a 309B/15B MoE model trained on 27T tokens with hybrid attention and multi-teacher on-policy distillation that matches larger models like DeepSeek-V3.2 while enabling 2.6x faster decoding via repurposed MTP layers.

citing papers explorer

Showing 5 of 5 citing papers.

LEAD: Breaking the No-Recovery Bottleneck in Long-Horizon Reasoning cs.AI · 2026-03-06 · unverdicted · none · ref 16
LEAD lets LLMs solve checkers jumping puzzles up to size 13 by using lookahead to recover from irreversible errors on hard steps that break extreme decomposition.
Bridging Structure and Language: Graph-Based Visual Reasoning for Autonomous Road Understanding cs.CV · 2026-05-20 · unverdicted · none · ref 48
A graph-grounded Combined Road Substrate framework generates traceable QA pairs from road maps to improve small VLMs on compositional road reasoning tasks.
The Power of Power Law: Asymmetry Enables Compositional Reasoning cs.AI · 2026-04-24 · unverdicted · none · ref 66
Power-law data sampling creates beneficial asymmetry in the loss landscape that lets models acquire high-frequency skill compositions first, enabling more efficient learning of rare long-tail skills than uniform distributions.
Factored Causal Representation Learning for Robust Reward Modeling in RLHF cs.LG · 2026-01-29 · unverdicted · none · ref 34
A factored causal representation learning method improves robustness of reward models in RLHF by isolating causal factors from biases like length and sycophancy using adversarial gradient reversal.
MiMo-V2-Flash Technical Report cs.CL · 2026-01-06 · unverdicted · none · ref 55
MiMo-V2-Flash is a 309B/15B MoE model trained on 27T tokens with hybrid attention and multi-teacher on-policy distillation that matches larger models like DeepSeek-V3.2 while enabling 2.6x faster decoding via repurposed MTP layers.

Gsm-infinite: How do your llms behave over infinitely increasing context length and reasoning complexity?

fields

years

verdicts

representative citing papers

citing papers explorer