International Conference on Learning Representations (ICLR) , year =

7 Pith papers cite this work. Polarity classification is still indexing.

7 Pith papers citing it

browse 7 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Bridging Spectral Operator Learning and U-Net Hierarchies: SpectraNet for Stable Autoregressive PDE Surrogates

cs.LG · 2026-05-09 · unverdicted · novelty 7.0

SpectraNet delivers stable autoregressive PDE rollouts with lower error and 2.3x fewer parameters than FNO by embedding spectral convolutions in a U-Net and training a residual-target block under semigroup-consistency loss.

VAnim: Rendering-Aware Sparse State Modeling for Structure-Preserving Vector Animation

cs.CV · 2026-05-02 · unverdicted · novelty 7.0

VAnim creates open-domain text-to-SVG animations via sparse state updates on a persistent DOM tree, identification-first planning, and rendering-aware RL with a new 134k-example benchmark.

Distributional Energy-Based Models for Uncertainty-Aware Structured LLM Reasoning

cs.LG · 2026-05-15 · unverdicted · novelty 6.0

A 149M-parameter distributional energy-based verifier with low-rank adapter ensemble reduces constraint violations in structured LLM reasoning and outperforms or matches much larger models on five benchmarks.

When Reviews Disagree: Fine-Grained Contradiction Analysis in Scientific Peer Reviews

cs.CL · 2026-05-11 · unverdicted · novelty 6.0

Introduces RevCI benchmark and IMPACT multi-agent framework for evidence-level contradiction detection and graded intensity scoring in peer reviews, distilled into efficient TIDE model.

Where's the Plan? Locating Latent Planning in Language Models with Lightweight Mechanistic Interventions

cs.LG · 2026-05-08 · unverdicted · novelty 6.0

Future-rhyme information is linearly decodable at line boundaries across model families and strengthens with scale, yet only Gemma-3-27B causally depends on it, with the driver migrating to the boundary around layer 30 and localizing to five attention heads.

Critical Windows of Complexity Control: When Transformers Decide to Reason or Memorize

cs.LG · 2026-05-06 · unverdicted · novelty 6.0

Transformers show a sharp, task-specific critical window for weight decay application that determines reasoning versus memorization, with middle placement optimal and boundaries as narrow as 100 steps.

Adaptive Computation Depth via Learned Token Routing in Transformers

cs.LG · 2026-04-18 · unverdicted · novelty 5.0

TSA adds end-to-end differentiable per-token halting gates to transformers, enabling learned adaptive depth that saves 14-23% token-layer operations with under 0.5% quality loss on language modeling.

citing papers explorer

Showing 7 of 7 citing papers.

Bridging Spectral Operator Learning and U-Net Hierarchies: SpectraNet for Stable Autoregressive PDE Surrogates cs.LG · 2026-05-09 · unverdicted · none · ref 24
SpectraNet delivers stable autoregressive PDE rollouts with lower error and 2.3x fewer parameters than FNO by embedding spectral convolutions in a U-Net and training a residual-target block under semigroup-consistency loss.
VAnim: Rendering-Aware Sparse State Modeling for Structure-Preserving Vector Animation cs.CV · 2026-05-02 · unverdicted · none · ref 87
VAnim creates open-domain text-to-SVG animations via sparse state updates on a persistent DOM tree, identification-first planning, and rendering-aware RL with a new 134k-example benchmark.
Distributional Energy-Based Models for Uncertainty-Aware Structured LLM Reasoning cs.LG · 2026-05-15 · unverdicted · none · ref 37
A 149M-parameter distributional energy-based verifier with low-rank adapter ensemble reduces constraint violations in structured LLM reasoning and outperforms or matches much larger models on five benchmarks.
When Reviews Disagree: Fine-Grained Contradiction Analysis in Scientific Peer Reviews cs.CL · 2026-05-11 · unverdicted · none · ref 28
Introduces RevCI benchmark and IMPACT multi-agent framework for evidence-level contradiction detection and graded intensity scoring in peer reviews, distilled into efficient TIDE model.
Where's the Plan? Locating Latent Planning in Language Models with Lightweight Mechanistic Interventions cs.LG · 2026-05-08 · unverdicted · none · ref 17
Future-rhyme information is linearly decodable at line boundaries across model families and strengthens with scale, yet only Gemma-3-27B causally depends on it, with the driver migrating to the boundary around layer 30 and localizing to five attention heads.
Critical Windows of Complexity Control: When Transformers Decide to Reason or Memorize cs.LG · 2026-05-06 · unverdicted · none · ref 23
Transformers show a sharp, task-specific critical window for weight decay application that determines reasoning versus memorization, with middle placement optimal and boundaries as narrow as 100 steps.
Adaptive Computation Depth via Learned Token Routing in Transformers cs.LG · 2026-04-18 · unverdicted · none · ref 11
TSA adds end-to-end differentiable per-token halting gates to transformers, enabling learned adaptive depth that saves 14-23% token-layer operations with under 0.5% quality loss on language modeling.

International Conference on Learning Representations (ICLR) , year =

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer