Ma-rlhf: Rein- forcement learning from human feedback with macro actions.arXiv preprint arXiv:2410.02743, 2024

Yekun Chai, Haoran Sun, Huang Fang, Shuohuan Wang, Yu Sun, Hua Wu · 2024 · arXiv 2410.02743

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

representative citing papers

Reconciling Contradictory Views on the Effectiveness of SFT in LLMs: An Interaction Perspective

cs.AI · 2026-05-18 · unverdicted · novelty 5.0

SFT on LLMs removes noise-like token interactions in a brief early phase before introducing overfitted ones, explaining inconsistent effectiveness across model scales.

citing papers explorer

Showing 1 of 1 citing paper.

Reconciling Contradictory Views on the Effectiveness of SFT in LLMs: An Interaction Perspective cs.AI · 2026-05-18 · unverdicted · none · ref 4
SFT on LLMs removes noise-like token interactions in a brief early phase before introducing overfitted ones, explaining inconsistent effectiveness across model scales.

Ma-rlhf: Rein- forcement learning from human feedback with macro actions.arXiv preprint arXiv:2410.02743, 2024

fields

years

verdicts

representative citing papers

citing papers explorer