super hub Canonical reference

Training language models to follow instructions with human feedback

Carroll L. Wainwright, Diogo Almeida, Jeff Wu, Long Ouyang, Pamela Mishkin, Xu Jiang · 2022 · cs.CL · arXiv 2203.02155

Canonical reference. 93% of citing Pith papers cite this work as background.

217 Pith papers citing it

Background 93% of classified citations

open full Pith review browse 217 citing papers more from Carroll L. Wainwright arXiv PDF

abstract

Making language models bigger does not inherently make them better at following a user's intent. For example, large language models can generate outputs that are untruthful, toxic, or simply not helpful to the user. In other words, these models are not aligned with their users. In this paper, we show an avenue for aligning language models with user intent on a wide range of tasks by fine-tuning with human feedback. Starting with a set of labeler-written prompts and prompts submitted through the OpenAI API, we collect a dataset of labeler demonstrations of the desired model behavior, which we use to fine-tune GPT-3 using supervised learning. We then collect a dataset of rankings of model outputs, which we use to further fine-tune this supervised model using reinforcement learning from human feedback. We call the resulting models InstructGPT. In human evaluations on our prompt distribution, outputs from the 1.3B parameter InstructGPT model are preferred to outputs from the 175B GPT-3, despite having 100x fewer parameters. Moreover, InstructGPT models show improvements in truthfulness and reductions in toxic output generation while having minimal performance regressions on public NLP datasets. Even though InstructGPT still makes simple mistakes, our results show that fine-tuning with human feedback is a promising direction for aligning language models with human intent.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 54 method 1 other 1

citation-polarity summary

background 52 unclear 3 use method 1

claims ledger

abstract Making language models bigger does not inherently make them better at following a user's intent. For example, large language models can generate outputs that are untruthful, toxic, or simply not helpful to the user. In other words, these models are not aligned with their users. In this paper, we show an avenue for aligning language models with user intent on a wide range of tasks by fine-tuning with human feedback. Starting with a set of labeler-written prompts and prompts submitted through the OpenAI API, we collect a dataset of labeler demonstrations of the desired model behavior, which we u

authors

Carroll L. Wainwright Diogo Almeida Jeff Wu Long Ouyang Pamela Mishkin Xu Jiang

co-cited works

representative citing papers

Chain-of-Thought Prompting Elicits Reasoning in Large Language Models

cs.CL · 2022-01-28 · accept · novelty 9.0

Chain-of-thought prompting, by including intermediate reasoning steps in few-shot examples, elicits strong reasoning abilities in large language models on arithmetic, commonsense, and symbolic tasks.

RefusalBench: Why Refusal Rate Misranks Frontier LLMs on Biological Research Prompts

cs.SE · 2026-05-20 · conditional · novelty 8.0

RefusalBench shows strict refusal rates fail to rank frontier LLMs correctly on biological safety, with provider effects and partial-compliance patterns that binary metrics miss.

Prompt Infection: LLM-to-LLM Prompt Injection within Multi-Agent Systems

cs.MA · 2024-10-09 · unverdicted · novelty 8.0 · 2 refs

Prompt injection attacks can self-replicate across LLM agents in multi-agent systems, enabling data theft, misinformation, and system disruption while propagating silently.

ORPO: Monolithic Preference Optimization without Reference Model

cs.CL · 2024-03-12 · conditional · novelty 8.0

ORPO performs preference alignment during supervised fine-tuning via a monolithic odds ratio penalty, allowing 7B models to outperform larger state-of-the-art models on alignment benchmarks.

DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines

cs.CL · 2023-10-05 · conditional · novelty 8.0

DSPy compiles short declarative programs into LM pipelines that self-optimize and outperform both standard few-shot prompting and expert-written chains on math, retrieval, and QA tasks.

Generative Agents: Interactive Simulacra of Human Behavior

cs.HC · 2023-04-07 · accept · novelty 8.0

Generative agents with memory streams, reflection, and planning using LLMs exhibit believable individual and emergent social behaviors in a simulated town.

Discovering Latent Knowledge in Language Models Without Supervision

cs.CL · 2022-12-07 · conditional · novelty 8.0

An unsupervised technique extracts latent yes-no knowledge from language model activations by locating a direction that satisfies logical consistency properties, outperforming zero-shot accuracy by 4% on average across models and datasets.

Code as Policies: Language Model Programs for Embodied Control

cs.RO · 2022-09-16 · accept · novelty 8.0

Language models generate robot policy code from natural language commands via few-shot prompting, enabling spatial-geometric reasoning, generalization, and precise control on real robots.

Low-Resource Safety Failures Are Action Failures, Not Representation Failures

cs.CL · 2026-05-31 · conditional · novelty 7.0

Low-resource safety failures are action failures because the harmfulness representation transfers but the decision calibration does not; this is fixed by recalibrating a high-resource gate with 1-4 target-language examples.

EST-PRM: Stress-Testing Process Reward Models Before They Become Load-Bearing

cs.LG · 2026-05-30 · unverdicted · novelty 7.0

EST-PRM stress-tests five PRM models on 4,687 reasoning chains from MATH-500, GSM8K, and PRMBench using three label-preserving transformations and reports model-specific vulnerability patterns.

Aligned but Fragile: Enhancing LLM Safety Robustness via Zeroth-Order Optimization

cs.AI · 2026-05-28 · unverdicted · novelty 7.0

A hybrid first-order then zeroth-order optimization approach improves robustness of safety-aligned LLMs while preserving utility, with layer-wise sensitivity estimation for efficiency.

Distribution-Aware Reward: Reinforcement Learning over Predictive Distributions for LLM Regression

cs.LG · 2026-05-20 · unverdicted · novelty 7.0

Distribution-Aware Reward optimizes LLM regression by treating rollouts as empirical predictive distributions and rewarding marginal improvements in CRPS quality rather than point accuracy alone.

Agent Meltdowns: The Road to Hell Is Paved with Helpful Agents

cs.CL · 2026-05-18 · unverdicted · novelty 7.0

The paper defines accidental meltdowns as unsafe agent behavior triggered by benign errors and reports that such meltdowns occur in 64.7% of evaluated rollouts across GPT, Grok, and Gemini agents.

DecisionBench: A Benchmark for Emergent Delegation in Long-Horizon Agentic Workflows

cs.AI · 2026-05-18 · unverdicted · novelty 7.0

DecisionBench supplies a fixed task suite, model pool, delegation interface, and multi-axis metrics to evaluate emergent delegation, showing similar quality across awareness conditions but 15-31 point headroom under perfect delegation.

PluRule: A Benchmark for Moderating Pluralistic Communities on Social Media

cs.CL · 2026-05-16 · unverdicted · novelty 7.0

PluRule is a new multimodal multilingual benchmark showing that state-of-the-art vision-language models perform only marginally better than a trivial baseline at detecting specific rule violations in pluralistic online communities.

Learning, Fast and Slow: Towards LLMs That Adapt Continually

cs.LG · 2026-05-12 · unverdicted · novelty 7.0 · 2 refs

Fast-Slow Training uses context optimization as fast weights alongside parameter updates as slow weights to achieve up to 3x better sample efficiency, higher performance, and less catastrophic forgetting than standard RL in continual LLM learning.

Collective Alignment in LLM Multi-Agent Systems: Disentangling Bias from Cooperation via Statistical Physics

cond-mat.stat-mech · 2026-05-11 · unverdicted · novelty 7.0

LLM multi-agent systems on lattices show bias-driven order-disorder crossovers instead of true phase transitions, with extracted effective couplings and fields serving as model-specific fingerprints.

Select-then-differentiate: Solving Bilevel Optimization with Manifold Lower-level Solution Sets

math.OC · 2026-05-09 · unverdicted · novelty 7.0

Optimistic bilevel optimization with manifold lower-level minimizers is differentiable if the optimistic selection is unique, yielding a pseudoinverse hyper-gradient and a convergent HG-MS algorithm whose rate depends on intrinsic manifold dimension.

Leveraging Pretrained Language Models as Energy Functions for Glauber Dynamics Text Diffusion

cs.LG · 2026-05-05 · unverdicted · novelty 7.0

Pretrained language models are used as energy functions for Glauber dynamics in discrete text diffusion, improving generation quality over prior diffusion LMs and matching autoregressive models on benchmarks and reasoning tasks.

ContextualJailbreak: Evolutionary Red-Teaming via Simulated Conversational Priming

cs.CL · 2026-05-04 · unverdicted · novelty 7.0

ContextualJailbreak uses evolutionary search over simulated primed dialogues with novel mutations to reach 90-100% attack success on open LLMs and transfers to some closed frontier models at 15-90% rates.

VAnim: Rendering-Aware Sparse State Modeling for Structure-Preserving Vector Animation

cs.CV · 2026-05-02 · unverdicted · novelty 7.0

VAnim creates open-domain text-to-SVG animations via sparse state updates on a persistent DOM tree, identification-first planning, and rendering-aware RL with a new 134k-example benchmark.

Political Bias Audits of LLMs Capture Sycophancy to the Inferred Auditor

cs.AI · 2026-04-30 · conditional · novelty 7.0

Political bias audits of LLMs largely capture sycophantic accommodation to the inferred political identity of the asker rather than any fixed model ideology.

A Systematic Survey of Security Threats and Defenses in LLM-Based AI Agents: A Layered Attack Surface Framework

cs.CR · 2026-04-25 · unverdicted · novelty 7.0

A new 7x4 taxonomy organizes agentic AI security threats by architectural layer and persistence timescale, revealing under-explored upper layers and missing defenses after surveying 116 papers.

Latent Space Probing for Adult Content Detection in Video Generative Models

cs.CV · 2026-04-25 · unverdicted · novelty 7.0

Latent space probing on CogVideoX achieves 97.29% F1 for adult content detection on a new 11k-clip dataset with 4-6ms overhead.

citing papers explorer

Showing 50 of 217 citing papers.

Chain-of-Thought Prompting Elicits Reasoning in Large Language Models cs.CL · 2022-01-28 · accept · none · ref 45 · internal anchor
Chain-of-thought prompting, by including intermediate reasoning steps in few-shot examples, elicits strong reasoning abilities in large language models on arithmetic, commonsense, and symbolic tasks.
RefusalBench: Why Refusal Rate Misranks Frontier LLMs on Biological Research Prompts cs.SE · 2026-05-20 · conditional · none · ref 38 · internal anchor
RefusalBench shows strict refusal rates fail to rank frontier LLMs correctly on biological safety, with provider effects and partial-compliance patterns that binary metrics miss.
Prompt Infection: LLM-to-LLM Prompt Injection within Multi-Agent Systems cs.MA · 2024-10-09 · unverdicted · none · ref 19 · 2 links · internal anchor
Prompt injection attacks can self-replicate across LLM agents in multi-agent systems, enabling data theft, misinformation, and system disruption while propagating silently.
ORPO: Monolithic Preference Optimization without Reference Model cs.CL · 2024-03-12 · conditional · none · ref 37 · internal anchor
ORPO performs preference alignment during supervised fine-tuning via a monolithic odds ratio penalty, allowing 7B models to outperform larger state-of-the-art models on alignment benchmarks.
DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines cs.CL · 2023-10-05 · conditional · none · ref 38 · internal anchor
DSPy compiles short declarative programs into LM pipelines that self-optimize and outperform both standard few-shot prompting and expert-written chains on math, retrieval, and QA tasks.
Generative Agents: Interactive Simulacra of Human Behavior cs.HC · 2023-04-07 · accept · none · ref 85 · internal anchor
Generative agents with memory streams, reflection, and planning using LLMs exhibit believable individual and emergent social behaviors in a simulated town.
Discovering Latent Knowledge in Language Models Without Supervision cs.CL · 2022-12-07 · conditional · none · ref 24 · internal anchor
An unsupervised technique extracts latent yes-no knowledge from language model activations by locating a direction that satisfies logical consistency properties, outperforming zero-shot accuracy by 4% on average across models and datasets.
Code as Policies: Language Model Programs for Embodied Control cs.RO · 2022-09-16 · accept · none · ref 22 · internal anchor
Language models generate robot policy code from natural language commands via few-shot prompting, enabling spatial-geometric reasoning, generalization, and precise control on real robots.
Low-Resource Safety Failures Are Action Failures, Not Representation Failures cs.CL · 2026-05-31 · conditional · none · ref 1 · internal anchor
Low-resource safety failures are action failures because the harmfulness representation transfers but the decision calibration does not; this is fixed by recalibrating a high-resource gate with 1-4 target-language examples.
EST-PRM: Stress-Testing Process Reward Models Before They Become Load-Bearing cs.LG · 2026-05-30 · unverdicted · none · ref 102 · internal anchor
EST-PRM stress-tests five PRM models on 4,687 reasoning chains from MATH-500, GSM8K, and PRMBench using three label-preserving transformations and reports model-specific vulnerability patterns.
Aligned but Fragile: Enhancing LLM Safety Robustness via Zeroth-Order Optimization cs.AI · 2026-05-28 · unverdicted · none · ref 4 · internal anchor
A hybrid first-order then zeroth-order optimization approach improves robustness of safety-aligned LLMs while preserving utility, with layer-wise sensitivity estimation for efficiency.
Distribution-Aware Reward: Reinforcement Learning over Predictive Distributions for LLM Regression cs.LG · 2026-05-20 · unverdicted · none · ref 34 · internal anchor
Distribution-Aware Reward optimizes LLM regression by treating rollouts as empirical predictive distributions and rewarding marginal improvements in CRPS quality rather than point accuracy alone.
Agent Meltdowns: The Road to Hell Is Paved with Helpful Agents cs.CL · 2026-05-18 · unverdicted · none · ref 22 · internal anchor
The paper defines accidental meltdowns as unsafe agent behavior triggered by benign errors and reports that such meltdowns occur in 64.7% of evaluated rollouts across GPT, Grok, and Gemini agents.
DecisionBench: A Benchmark for Emergent Delegation in Long-Horizon Agentic Workflows cs.AI · 2026-05-18 · unverdicted · none · ref 29 · internal anchor
DecisionBench supplies a fixed task suite, model pool, delegation interface, and multi-axis metrics to evaluate emergent delegation, showing similar quality across awareness conditions but 15-31 point headroom under perfect delegation.
PluRule: A Benchmark for Moderating Pluralistic Communities on Social Media cs.CL · 2026-05-16 · unverdicted · none · ref 246 · internal anchor
PluRule is a new multimodal multilingual benchmark showing that state-of-the-art vision-language models perform only marginally better than a trivial baseline at detecting specific rule violations in pluralistic online communities.
Learning, Fast and Slow: Towards LLMs That Adapt Continually cs.LG · 2026-05-12 · unverdicted · none · ref 41 · 2 links · internal anchor
Fast-Slow Training uses context optimization as fast weights alongside parameter updates as slow weights to achieve up to 3x better sample efficiency, higher performance, and less catastrophic forgetting than standard RL in continual LLM learning.
Collective Alignment in LLM Multi-Agent Systems: Disentangling Bias from Cooperation via Statistical Physics cond-mat.stat-mech · 2026-05-11 · unverdicted · none · ref 17 · internal anchor
LLM multi-agent systems on lattices show bias-driven order-disorder crossovers instead of true phase transitions, with extracted effective couplings and fields serving as model-specific fingerprints.
Select-then-differentiate: Solving Bilevel Optimization with Manifold Lower-level Solution Sets math.OC · 2026-05-09 · unverdicted · none · ref 205 · internal anchor
Optimistic bilevel optimization with manifold lower-level minimizers is differentiable if the optimistic selection is unique, yielding a pseudoinverse hyper-gradient and a convergent HG-MS algorithm whose rate depends on intrinsic manifold dimension.
Leveraging Pretrained Language Models as Energy Functions for Glauber Dynamics Text Diffusion cs.LG · 2026-05-05 · unverdicted · none · ref 72 · internal anchor
Pretrained language models are used as energy functions for Glauber dynamics in discrete text diffusion, improving generation quality over prior diffusion LMs and matching autoregressive models on benchmarks and reasoning tasks.
ContextualJailbreak: Evolutionary Red-Teaming via Simulated Conversational Priming cs.CL · 2026-05-04 · unverdicted · none · ref 6 · internal anchor
ContextualJailbreak uses evolutionary search over simulated primed dialogues with novel mutations to reach 90-100% attack success on open LLMs and transfers to some closed frontier models at 15-90% rates.
VAnim: Rendering-Aware Sparse State Modeling for Structure-Preserving Vector Animation cs.CV · 2026-05-02 · unverdicted · none · ref 77 · internal anchor
VAnim creates open-domain text-to-SVG animations via sparse state updates on a persistent DOM tree, identification-first planning, and rendering-aware RL with a new 134k-example benchmark.
Political Bias Audits of LLMs Capture Sycophancy to the Inferred Auditor cs.AI · 2026-04-30 · conditional · none · ref 10 · internal anchor
Political bias audits of LLMs largely capture sycophantic accommodation to the inferred political identity of the asker rather than any fixed model ideology.
A Systematic Survey of Security Threats and Defenses in LLM-Based AI Agents: A Layered Attack Surface Framework cs.CR · 2026-04-25 · unverdicted · none · ref 27 · internal anchor
A new 7x4 taxonomy organizes agentic AI security threats by architectural layer and persistence timescale, revealing under-explored upper layers and missing defenses after surveying 116 papers.
Latent Space Probing for Adult Content Detection in Video Generative Models cs.CV · 2026-04-25 · unverdicted · none · ref 40 · internal anchor
Latent space probing on CogVideoX achieves 97.29% F1 for adult content detection on a new 11k-clip dataset with 4-6ms overhead.
Rates of forgetting for the sequentially Markov coalescent math.PR · 2026-04-22 · unverdicted · none · ref 92 · internal anchor
SMC forgets its initial condition geometrically in the jump chain and as 1/ℓ in continuous genetic distance, justifying independent-locus approximations.
HiPO: Hierarchical Preference Optimization for Adaptive Reasoning in LLMs cs.AI · 2026-04-22 · unverdicted · none · ref 10 · internal anchor
HiPO improves LLM reasoning performance by optimizing preferences separately on response segments rather than entire outputs.
Discrete Tilt Matching cs.LG · 2026-04-20 · unverdicted · none · ref 2 · internal anchor
Discrete Tilt Matching recasts dLLM fine-tuning as state-level matching of tilted local unmasking posteriors, producing a stable weighted cross-entropy loss that improves Sudoku and Countdown performance when applied to LLaDA-8B-Instruct.
S-GRPO: Unified Post-Training for Large Vision-Language Models cs.LG · 2026-04-17 · unverdicted · none · ref 33 · internal anchor
S-GRPO unifies SFT and RL for LVLMs via conditional ground-truth injection that supplies a maximal-reward anchor when group exploration fails completely.
Reinforcement Learning via Value Gradient Flow cs.LG · 2026-04-15 · unverdicted · none · ref 49 · internal anchor
VGF solves behavior-regularized RL by transporting particles from a reference distribution to the value-induced optimal policy via discrete value-guided gradient flow.
Gaslight, Gatekeep, V1-V3: Early Visual Cortex Alignment Shields Vision-Language Models from Sycophantic Manipulation cs.CV · 2026-04-15 · conditional · none · ref 14 · internal anchor
Alignment of vision-language models with human V1-V3 early visual cortex negatively predicts resistance to sycophantic gaslighting attacks.
Too Nice to Tell the Truth: Quantifying Agreeableness-Driven Sycophancy in Role-Playing Language Models cs.CL · 2026-04-12 · unverdicted · none · ref 35 · internal anchor
Agreeableness in AI personas reliably predicts sycophantic behavior in 9 of 13 tested language models.
SPASM: Stable Persona-driven Agent Simulation for Multi-turn Dialogue Generation cs.CL · 2026-04-10 · accept · none · ref 23 · internal anchor
SPASM introduces a stability-first framework with Egocentric Context Projection to maintain consistent personas and eliminate echoing in multi-turn LLM agent dialogues.
MCP-DPT: A Defense-Placement Taxonomy and Coverage Analysis for Model Context Protocol Security cs.CR · 2026-04-08 · conditional · none · ref 38 · internal anchor
MCP-DPT creates a defense-placement taxonomy that organizes MCP threats and defenses across six architectural layers, revealing mostly tool-centric protections and gaps at orchestration, transport, and supply-chain layers.
Springdrift: An Auditable Persistent Runtime for LLM Agents with Case-Based Memory, Normative Safety, and Ambient Self-Perception cs.AI · 2026-04-06 · unverdicted · none · ref 5 · internal anchor
Springdrift provides an auditable persistent runtime for long-lived LLM agents with case-based memory, normative safety gating, and ambient self-perception, shown in a 23-day single-instance deployment where the agent self-diagnosed bugs and maintained cross-channel context.
STEER: Structured Event Evidence for Video Reasoning via Multi-Objective Reinforcement Learning cs.CL · 2026-04-06 · unverdicted · none · ref 1 · internal anchor
STEER represents videos as time-ordered event schemas and uses Pareto-Frontier guided Advantage Balancing in RL to train a 4B model that matches 7B baselines on video tasks with half the frames.
Alignment midtraining for animals cs.CL · 2026-03-21 · unverdicted · none · ref 4 · internal anchor
Midtraining on 3000 synthetic animal compassion documents raises compassionate reasoning scores to 77% on ANIMA benchmark versus 40% for instruction tuning, with generalization to human compassion but degradation after additional tuning.
Fast Single Nitrogen-Vacancy Center Ramsey Characterization using a Physics-Informed Neural Network quant-ph · 2026-03-14 · conditional · none · ref 43 · internal anchor
NVRNet uses pretrained simulation-based U-Nets with attention and parameter-efficient adapters, followed by a transformer estimator, to reconstruct clean Ramsey waveforms and infer hyperfine parameters from minimal-sweep experimental data, achieving 0.44-0.67x noise reduction and 0.10-0.19 FFT error
The Stepwise Informativeness Assumption: Why are Entropy Dynamics and Reasoning Correlated in LLMs? cs.CL · 2026-03-11 · unverdicted · none · ref 17 · internal anchor
The Stepwise Informativeness Assumption explains the correlation between LLM entropy dynamics and reasoning correctness by positing that correct traces accumulate answer-relevant information stepwise during generation.
CapTrack: Multifaceted Evaluation of Forgetting in LLM Post-Training cs.LG · 2026-02-19 · unverdicted · none · ref 39 · internal anchor
CapTrack shows post-training causes drift beyond facts, with instruction fine-tuning producing stronger behavioral changes than preference optimization across model families.
"Tab, Tab, Bug": Security Pitfalls of Next Edit Suggestions in AI-Integrated IDEs cs.CR · 2026-02-06 · conditional · none · ref 37 · internal anchor
NES systems in AI IDEs expand attack surfaces via context poisoning from imperceptible actions and global codebase retrieval, with professional developers largely unaware of the risks.
ContractEval: A Benchmark for Evaluating Contract-Satisfying Assertions in Code Generation cs.AI · 2025-10-14 · unverdicted · none · ref 20 · internal anchor
ContractEval benchmark on 364 tasks shows code LLMs achieve 75-82% functional pass@1 but 0% contract satisfaction under standard prompting, rising only to 23-41% with explicit contracts.
EyeMulator: Improving Code Language Models by Mimicking Human Visual Attention cs.SE · 2025-08-22 · unverdicted · none · ref 39 · internal anchor
EyeMulator augments CodeLLM fine-tuning loss with token weights derived from human eye-tracking scan paths, producing large gains on code translation and summarization across StarCoder, Llama-3.2 and DeepSeek-Coder.
Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach cs.LG · 2025-02-07 · unverdicted · none · ref 121 · internal anchor
A recurrent-depth architecture enables language models to improve reasoning performance by iterating computation in latent space, achieving gains equivalent to much larger models on benchmarks.
Sycophancy to Subterfuge: Investigating Reward-Tampering in Large Language Models cs.AI · 2024-06-14 · conditional · none · ref 244 · internal anchor
LLMs trained on simple specification gaming generalize to zero-shot reward tampering including rewriting their own reward function.
Massive Activations in Large Language Models cs.CL · 2024-02-27 · unverdicted · none · ref 141 · internal anchor
Massive activations are constant large values in LLMs that function as indispensable bias terms and concentrate attention probabilities on specific tokens.
Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads cs.LG · 2024-01-19 · conditional · none · ref 35 · internal anchor
Medusa augments LLMs with multiple decoding heads and tree-based attention to predict and verify several tokens in parallel, yielding 2.2-3.6x inference speedup via two fine-tuning regimes.
VoxPoser: Composable 3D Value Maps for Robotic Manipulation with Language Models cs.RO · 2023-07-12 · unverdicted · none · ref 128 · internal anchor
VoxPoser uses LLMs to compose 3D value maps via VLM interaction for model-based synthesis of robust robot trajectories on open-set language-specified manipulation tasks.
Let's Verify Step by Step cs.LG · 2023-05-31 · accept · none · ref 15 · internal anchor
Process supervision significantly outperforms outcome supervision for training models on the MATH dataset, achieving 78% accuracy on a representative test subset with active learning and a released 800k step-label dataset.
LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention cs.CV · 2023-03-28 · conditional · none · ref 175 · internal anchor
LLaMA-Adapter turns frozen LLaMA 7B into a capable instruction follower using only 1.2M new parameters and zero-init attention, matching Alpaca while extending to image-conditioned reasoning on ScienceQA and COCO.
Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Models cs.CV · 2023-03-08 · accept · none · ref 29 · internal anchor
Visual ChatGPT integrates visual foundation models with ChatGPT via prompts to enable multi-step image understanding, generation, and editing in conversational interactions.

Training language models to follow instructions with human feedback

hub tools

citation-role summary

citation-polarity summary

claims ledger

authors

co-cited works

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer