Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing , pages=

Rethinking the Role of Demonstrations: What Makes In-Context Learning Work? , author= · 2022

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

browse 4 citing papers

representative citing papers

Rethinking Dense Sequential Chains: Reasoning Language Models Can Extract Answers from Sparse, Order-Shuffling Chain-of-Thoughts

cs.CL · 2026-05-08 · conditional · novelty 6.0

Reasoning language models extract answers from sparse, order-shuffled chain-of-thought traces with little accuracy loss.

Training Language Models to Self-Correct via Reinforcement Learning

cs.LG · 2024-09-19 · unverdicted · novelty 6.0

SCoRe uses multi-turn online RL with regularization on self-generated traces to improve LLM self-correction, achieving 15.6% and 9.1% gains on MATH and HumanEval for Gemini models.

RLAIF vs. RLHF: Scaling Reinforcement Learning from Human Feedback with AI Feedback

cs.CL · 2023-09-01 · conditional · novelty 6.0

RLAIF matches RLHF on summarization and dialogue tasks, with a direct-RLAIF variant achieving superior results by using LLM rewards directly during training.

Entropy-Gradient Inversion: Moving Toward Internal Mechanism of Large Reasoning Models

cs.AI · 2026-05-18

citing papers explorer

Showing 4 of 4 citing papers.

Rethinking Dense Sequential Chains: Reasoning Language Models Can Extract Answers from Sparse, Order-Shuffling Chain-of-Thoughts cs.CL · 2026-05-08 · conditional · none · ref 24
Reasoning language models extract answers from sparse, order-shuffled chain-of-thought traces with little accuracy loss.
Training Language Models to Self-Correct via Reinforcement Learning cs.LG · 2024-09-19 · unverdicted · none · ref 124
SCoRe uses multi-turn online RL with regularization on self-generated traces to improve LLM self-correction, achieving 15.6% and 9.1% gains on MATH and HumanEval for Gemini models.
RLAIF vs. RLHF: Scaling Reinforcement Learning from Human Feedback with AI Feedback cs.CL · 2023-09-01 · conditional · none · ref 117
RLAIF matches RLHF on summarization and dialogue tasks, with a direct-RLAIF variant achieving superior results by using LLM rewards directly during training.
Entropy-Gradient Inversion: Moving Toward Internal Mechanism of Large Reasoning Models cs.AI · 2026-05-18 · unreviewed · ref 60

Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing , pages=

fields

years

verdicts

representative citing papers

citing papers explorer