pith. sign in

super hub Mixed citations

Human-level control through deep reinforcement learning

Mixed citation behavior. Most common role is background (43%).

56 Pith papers citing it
22.6k external citations · Crossref
Background 43% of classified citations

hub tools

citation-role summary

background 4 method 2 baseline 1

citation-polarity summary

authors

co-cited works

clear filters

representative citing papers

Heavy-Ball Q-Learning with Residual Weighting Correction

cs.LG · 2026-06-25 · unverdicted · novelty 7.0

Corrected heavy-ball Q-learning with convergence and acceleration guarantees is derived via switched linear system and joint spectral radius analysis, extended to linear function approximation.

CHORUS: Decentralized Multi-Embodiment Collaboration with One VLA Policy

cs.RO · 2026-06-10 · unverdicted · novelty 7.0

CHORUS adapts a single VLA backbone for decentralized control of diverse robot teams, achieving 64-point gains over from-scratch decentralized baselines and outperforming centralized methods in real-world tasks using only local observations.

What Type of Inference is Active Inference?

cs.AI · 2026-06-03 · unverdicted · novelty 7.0

EFE-based active inference planning is characterized as VFE on an augmented model plus entropy and planning corrections, with a derived message-passing implementation and grid-world validation.

Inline Critic Steers Image Editing

cs.CV · 2026-05-12 · conditional · novelty 7.0

Inline Critic uses a learnable token to critique and steer a frozen image-editing model's intermediate layers during generation, delivering state-of-the-art results on GEdit-Bench, RISEBench, and KRIS-Bench.

Parametric Open Source Games

cs.GT · 2026-06-25 · unverdicted · novelty 6.0

Introduces parametric open-source games as continuous analogues of program equilibria, proves equilibrium existence, and derives an exact coupling threshold for cooperation in symmetric 2x2 games under gradient ascent.

NASDAQ: Normalized Observation Space Dynamics-Augmented Q-Learning

cs.LG · 2026-06-19 · unverdicted · novelty 6.0

NASDAQ normalizes observations in an online RL setting so that dynamics prediction losses are balanced across dimensions, yielding competitive performance with lower wall-time than prior model-based and self-predictive methods.

Formalizing Task-Space Complexity for Zero-Shot Generalization

cs.LG · 2026-06-18 · unverdicted · novelty 6.0

Introduces signed divergence to bound generalization gaps and defines task-space complexity as the minimum source contexts needed for ε-coverage under local smoothness, with set-cover reduction and empirical validation on LQR and DRL systems.

Rollout-Level Advantage-Prioritized Experience Replay for GRPO

cs.LG · 2026-06-03 · conditional · novelty 6.0

Rollout-level advantage-prioritized experience replay for GRPO recycles high-advantage individual rollouts with age eviction and fresh-anchored batches to outperform standard GRPO on math benchmarks, with gains increasing with model size.

Understanding Goal Generalisation in Sequential Reinforcement Learning

cs.LG · 2026-05-22 · unverdicted · novelty 6.0

Empirical analysis of over 100 sequential RL training pipelines across 250+ OOD environments finds salient features drive generalization and early goals persist, with latent policy gradients simulating latent variable evolution to predict OOD behavior from training history.

citing papers explorer

Showing 3 of 3 citing papers after filters.