A Diversity-Promoting Objective Function for Neural Conversation Models

Bill Dolan; Chris Brockett; Jianfeng Gao; Jiwei Li; Michel Galley

arxiv: 1510.03055 · v3 · pith:SFL6MYDFnew · submitted 2015-10-11 · 💻 cs.CL

A Diversity-Promoting Objective Function for Neural Conversation Models

Jiwei Li , Michel Galley , Chris Brockett , Jianfeng Gao , Bill Dolan This is my paper

classification 💻 cs.CL

keywords modelsfunctionneuralobjectiveresponsesconversationalgenerationinput

0 comments

read the original abstract

Sequence-to-sequence neural network models for generation of conversational responses tend to generate safe, commonplace responses (e.g., "I don't know") regardless of the input. We suggest that the traditional objective function, i.e., the likelihood of output (response) given input (message) is unsuited to response generation tasks. Instead we propose using Maximum Mutual Information (MMI) as the objective function in neural models. Experimental results demonstrate that the proposed MMI models produce more diverse, interesting, and appropriate responses, yielding substantive gains in BLEU scores on two conversational datasets and in human evaluations.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 12 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

TokenRatio: Principled Token-Level Preference Optimization via Ratio Matching
cs.CL 2026-05 unverdicted novelty 7.0

TBPO posits a token-level Bradley-Terry model and derives a Bregman-divergence density-ratio matching loss that generalizes DPO while preserving token-level optimality.
Efficient Test-Time Inference via Deterministic Exploration of Truncated Decoding Trees
cs.LG 2026-04 unverdicted novelty 7.0

Distinct Leaf Enumeration (DLE) replaces stochastic self-consistency sampling with deterministic traversal of a truncated decoding tree to enumerate distinct leaves, increasing coverage and reducing redundant computat...
TF1-EN-3M: Three Million Synthetic Moral Fables for Training Small, Open Language Models
cs.CL 2025-04 unverdicted novelty 7.0

The authors generate and publicly release the first large-scale open dataset of three million structured moral fables produced by small open language models together with a reproducible LLM-judge evaluation pipeline.
Output Composability of QLoRA PEFT Modules for Plug-and-Play Attribute-Controlled Text Generation
cs.CL 2026-05 unverdicted novelty 6.0

Summing outputs from separately trained QLoRA PEFT modules provides strong performance for attribute-controlled text generation, often matching or exceeding single-task modules even on single-attribute tests.
TokenRatio: Principled Token-Level Preference Optimization via Ratio Matching
cs.CL 2026-05 unverdicted novelty 6.0

TBPO derives a token-level preference optimization objective from sequence-level pairwise data via Bregman divergence ratio matching that generalizes DPO and improves alignment quality.
Learn-to-learn on Arbitrary Textual Conditioning: A Hypernetwork-Driven Meta-Gated LLM
cs.CL 2026-05 unverdicted novelty 6.0

A hypernetwork generates meta-gating parameters for SwiGLU blocks to let LLMs adapt their nonlinearity to arbitrary textual conditions, outperforming finetuning and meta-learning baselines with reasonable generalizati...
Nano-EmoX: Unifying Multimodal Emotional Intelligence from Perception to Empathy
cs.AI 2026-03 unverdicted novelty 6.0

Nano-EmoX is a compact 2.2B multimodal model that unifies six core affective tasks across perception, understanding, and interaction levels via a curriculum framework, achieving competitive benchmark performance.
Teach a Reward Model to Correct Itself: Reward Guided Adversarial Failure Discovery for Robust Reward Modeling
cs.CL 2025-07 unverdicted novelty 6.0

REFORM uses reward-guided controlled decoding to generate adversarial failures and augments training data to improve reward model robustness on preference datasets.
LaMDA: Language Models for Dialog Applications
cs.CL 2022-01 unverdicted novelty 6.0

LaMDA shows that fine-tuning on human-value annotations and consulting external knowledge sources significantly improves safety and factual grounding in large dialog models beyond what scaling alone achieves.
A Comparative Study of Controlled Text Generation Systems Using Level-Playing-Field Evaluation Principles
cs.CL 2026-05 unverdicted novelty 5.0

Re-evaluating controlled text generation systems under standardized conditions reveals that many published performance claims do not hold, highlighting the need for consistent evaluation practices.
Breakdowns in Conversational AI: Interactional Failures in Emotionally and Ethically Sensitive Contexts
cs.CL 2026-04 unverdicted novelty 5.0

Mainstream conversational models show escalating affective misalignments and ethical guidance failures during staged emotional trajectories, organized into a taxonomy of interactional breakdowns.
RAFT: Reward rAnked FineTuning for Generative Foundation Model Alignment
cs.LG 2023-04 unverdicted novelty 5.0

RAFT aligns generative models by ranking samples with a reward model and fine-tuning only on the top-ranked outputs, reporting gains on reward scores and automated metrics for LLMs and diffusion models.