Generative agents with memory streams, reflection, and planning using LLMs exhibit believable individual and emergent social behaviors in a simulated town.
hub
Recursively Summarizing Books with Human Feedback
25 Pith papers cite this work. Polarity classification is still indexing.
abstract
A major challenge for scaling machine learning is training models to perform tasks that are very difficult or time-consuming for humans to evaluate. We present progress on this problem on the task of abstractive summarization of entire fiction novels. Our method combines learning from human feedback with recursive task decomposition: we use models trained on smaller parts of the task to assist humans in giving feedback on the broader task. We collect a large volume of demonstrations and comparisons from human labelers, and fine-tune GPT-3 using behavioral cloning and reward modeling to do summarization recursively. At inference time, the model first summarizes small sections of the book and then recursively summarizes these summaries to produce a summary of the entire book. Our human labelers are able to supervise and evaluate the models quickly, despite not having read the entire books themselves. Our resulting model generates sensible summaries of entire books, even matching the quality of human-written summaries in a few cases ($\sim5\%$ of books). We achieve state-of-the-art results on the recent BookSum dataset for book-length summarization. A zero-shot question-answering model using these summaries achieves state-of-the-art results on the challenging NarrativeQA benchmark for answering questions about books and movie scripts. We release datasets of samples from our model.
hub tools
citation-role summary
citation-polarity summary
representative citing papers
Language models generate robot policy code from natural language commands via few-shot prompting, enabling spatial-geometric reasoning, generalization, and precise control on real robots.
GPT-3 can learn to express well-calibrated uncertainty about its answers using natural language phrases rather than logits.
Instruction tuning a 137B language model on over 60 NLP tasks described by instructions substantially boosts zero-shot performance on unseen tasks, outperforming larger GPT-3 models.
Introduces robust estimators for linear Markov games in offline MARLHF that achieve O(ε^{1-o(1)}) or O(√ε) bounds on Nash or CCE gaps under uniform or unilateral coverage.
HER trains LLMs on reverse-engineered reasoning data and human preference rewards to improve cognitive persona simulation, reporting 30-point gains on CoSER and 15% on Minimax benchmarks over Qwen3-32B.
LLMs achieve strong results on syntax parsing tasks but show limited and variable performance on dynamic reasoning, with a clear performance hierarchy across model scales.
Socratic Models compose zero-shot multimodal reasoning by prompting pretrained language and vision models to exchange information and enable new capabilities without finetuning.
A proposed pipeline shows LLMs introduce detectable race and gender biases when summarizing life narratives, creating potential for representational harm in research.
QA-guided reasoning via a separate model producing structured traces improves faithfulness, informativeness, and grounding in character description generation from books over long-context LLM baselines.
LLMs show measurable self-recognition that linearly correlates with self-preference bias in evaluations, supported by fine-tuning experiments and controls for confounders.
RAPTOR introduces a tree-organized retrieval method using recursive abstractive summaries, achieving a 20% absolute accuracy improvement on the QuALITY benchmark when paired with GPT-4.
AutoDAN automatically generates semantically meaningful jailbreak prompts for aligned LLMs via a hierarchical genetic algorithm, achieving higher attack success, cross-model transferability, and universality than baselines while bypassing perplexity defenses.
DRaFT fine-tunes diffusion models by differentiating through sampling to maximize rewards, outperforming RL baselines and improving aesthetics on Stable Diffusion 1.4.
ReST improves LLM translation quality on benchmarks via offline RL on self-generated data, achieving gains in a compute-efficient way compared to typical RLHF.
A three-stage fine-tuning process uses human ratings to train a reward model and then improves text-to-image alignment by maximizing reward-weighted likelihood.
On GSM8K, outcome-based supervision achieves similar final-answer error rates to process-based with less labeling, but process-based or learned reward models are needed to reach 3.4% reasoning error among correct solutions.
Humans chatting with an unreliable LLM assistant outperform both the model alone and unaided humans on MMLU and time-limited QuALITY tasks.
Ranked preference modeling outperforms imitation learning for language model alignment and scales more favorably with model size.
CALMem delivers virtually unbounded effective context for LLM conversations via an application-layer dual memory architecture with intra-session retrieval and token-adaptive injection.
DTCRS dynamically builds summary trees only for suitable question types by using sub-question embeddings as cluster centers, cutting construction time while improving QA on three tasks.
Chain-of-thought monitorability provides a promising but fragile method for AI safety oversight that developers should actively preserve.
WildFeedback extracts preference pairs from in-situ user feedback in LLM conversations to fine-tune models for better alignment with real user preferences.
RAFT aligns generative models by ranking samples with a reward model and fine-tuning only on the top-ranked outputs, reporting gains on reward scores and automated metrics for LLMs and diffusion models.
citing papers explorer
-
Generative Agents: Interactive Simulacra of Human Behavior
Generative agents with memory streams, reflection, and planning using LLMs exhibit believable individual and emergent social behaviors in a simulated town.
-
Code as Policies: Language Model Programs for Embodied Control
Language models generate robot policy code from natural language commands via few-shot prompting, enabling spatial-geometric reasoning, generalization, and precise control on real robots.
-
Teaching Models to Express Their Uncertainty in Words
GPT-3 can learn to express well-calibrated uncertainty about its answers using natural language phrases rather than logits.
-
Finetuned Language Models Are Zero-Shot Learners
Instruction tuning a 137B language model on over 60 NLP tasks described by instructions substantially boosts zero-shot performance on unseen tasks, outperforming larger GPT-3 models.
-
Corruption-robust Offline Multi-agent Reinforcement Learning From Human Feedback
Introduces robust estimators for linear Markov games in offline MARLHF that achieve O(ε^{1-o(1)}) or O(√ε) bounds on Nash or CCE gaps under uniform or unilateral coverage.
-
HER: Human-like Reasoning and Reinforcement Learning for LLM Role-playing
HER trains LLMs on reverse-engineered reasoning data and human preference rewards to improve cognitive persona simulation, reporting 30-point gains on CoSER and 15% on Minimax benchmarks over Qwen3-32B.
-
Exploring Code Analysis: Zero-Shot Insights on Syntax and Semantics with LLMs
LLMs achieve strong results on syntax parsing tasks but show limited and variable performance on dynamic reasoning, with a clear performance hierarchy across model scales.
-
Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language
Socratic Models compose zero-shot multimodal reasoning by prompting pretrained language and vision models to exchange information and enable new capabilities without finetuning.
-
Whose Story Gets Told? Positionality and Bias in LLM Summaries of Life Narratives
A proposed pipeline shows LLMs introduce detectable race and gender biases when summarizing life narratives, creating potential for representational harm in research.
-
Think Before you Write: QA-Guided Reasoning for Character Descriptions in Books
QA-guided reasoning via a separate model producing structured traces improves faithfulness, informativeness, and grounding in character description generation from books over long-context LLM baselines.
-
LLM Evaluators Recognize and Favor Their Own Generations
LLMs show measurable self-recognition that linearly correlates with self-preference bias in evaluations, supported by fine-tuning experiments and controls for confounders.
-
RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval
RAPTOR introduces a tree-organized retrieval method using recursive abstractive summaries, achieving a 20% absolute accuracy improvement on the QuALITY benchmark when paired with GPT-4.
-
AutoDAN: Generating Stealthy Jailbreak Prompts on Aligned Large Language Models
AutoDAN automatically generates semantically meaningful jailbreak prompts for aligned LLMs via a hierarchical genetic algorithm, achieving higher attack success, cross-model transferability, and universality than baselines while bypassing perplexity defenses.
-
Directly Fine-Tuning Diffusion Models on Differentiable Rewards
DRaFT fine-tunes diffusion models by differentiating through sampling to maximize rewards, outperforming RL baselines and improving aesthetics on Stable Diffusion 1.4.
-
Reinforced Self-Training (ReST) for Language Modeling
ReST improves LLM translation quality on benchmarks via offline RL on self-generated data, achieving gains in a compute-efficient way compared to typical RLHF.
-
Aligning Text-to-Image Models using Human Feedback
A three-stage fine-tuning process uses human ratings to train a reward model and then improves text-to-image alignment by maximizing reward-weighted likelihood.
-
Solving math word problems with process- and outcome-based feedback
On GSM8K, outcome-based supervision achieves similar final-answer error rates to process-based with less labeling, but process-based or learned reward models are needed to reach 3.4% reasoning error among correct solutions.
-
Measuring Progress on Scalable Oversight for Large Language Models
Humans chatting with an unreliable LLM assistant outperform both the model alone and unaided humans on MMLU and time-limited QuALITY tasks.
-
A General Language Assistant as a Laboratory for Alignment
Ranked preference modeling outperforms imitation learning for language model alignment and scales more favorably with model size.
-
CALMem : Application-Layer Dual Memory for Conversational AI
CALMem delivers virtually unbounded effective context for LLM conversations via an application-layer dual memory architecture with intra-session retrieval and token-adaptive injection.
-
DTCRS: Dynamic Tree Construction for Recursive Summarization
DTCRS dynamically builds summary trees only for suitable question types by using sub-question embeddings as cluster centers, cutting construction time while improving QA on three tasks.
-
Chain of Thought Monitorability: A New and Fragile Opportunity for AI Safety
Chain-of-thought monitorability provides a promising but fragile method for AI safety oversight that developers should actively preserve.
-
WildFeedback: Aligning LLMs With In-situ User Interactions And Feedback
WildFeedback extracts preference pairs from in-situ user feedback in LLM conversations to fine-tune models for better alignment with real user preferences.
-
RAFT: Reward rAnked FineTuning for Generative Foundation Model Alignment
RAFT aligns generative models by ranking samples with a reward model and fine-tuning only on the top-ranked outputs, reporting gains on reward scores and automated metrics for LLMs and diffusion models.
-
Reinforcement Learning from Human Feedback
The book introduces the origins, mathematical setup, and optimization stages of RLHF including reward modeling, reinforcement learning, rejection sampling, and direct alignment algorithms.