Title resolution pending

Sascha Lange, Thomas Gabel, Martin A · 2012 · DOI 10.1007/978-3-642-27645-3

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

open at publisher browse 3 citing papers

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

citation-role summary

background 1 method 1

citation-polarity summary

background 1 use method 1

representative citing papers

Blending Supervised and Reinforcement Fine-Tuning with Prefix Sampling

cs.LG · 2025-07-02 · unverdicted · novelty 7.0

Prefix-RFT blends SFT and RFT via prefix sampling from demonstrations to outperform standalone SFT, RFT, and mixed-policy baselines on math reasoning problems.

$\pi^{*}_{0.6}$: a VLA That Learns From Experience

cs.LG · 2025-11-18 · unverdicted · novelty 6.0

RECAP enables a generalist VLA to self-improve via advantage-conditioned RL on mixed real-world data, more than doubling throughput and halving failure rates on hard manipulation tasks.

AWAC: Accelerating Online Reinforcement Learning with Offline Datasets

cs.LG · 2020-06-16 · unverdicted · novelty 6.0

AWAC combines offline data with online RL via advantage-weighted actor-critic updates to enable faster acquisition of robotic skills such as dexterous manipulation.

citing papers explorer

Showing 3 of 3 citing papers.

Blending Supervised and Reinforcement Fine-Tuning with Prefix Sampling cs.LG · 2025-07-02 · unverdicted · none · ref 19
Prefix-RFT blends SFT and RFT via prefix sampling from demonstrations to outperform standalone SFT, RFT, and mixed-policy baselines on math reasoning problems.
$\pi^{*}_{0.6}$: a VLA That Learns From Experience cs.LG · 2025-11-18 · unverdicted · none · ref 2
RECAP enables a generalist VLA to self-improve via advantage-conditioned RL on mixed real-world data, more than doubling throughput and halving failure rates on hard manipulation tasks.
AWAC: Accelerating Online Reinforcement Learning with Offline Datasets cs.LG · 2020-06-16 · unverdicted · none · ref 32
AWAC combines offline data with online RL via advantage-weighted actor-critic updates to enable faster acquisition of robotic skills such as dexterous manipulation.

Title resolution pending

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer