The Second Conversational Intelligence Challenge (ConvAI2)

Alan W Black; Alexander Miller; Alexander Rudnicky; Arthur Szlam; Douwe Kiela; Emily Dinan; Iulian Serban; Jack Urbanek; Jason Weston; Jason Williams

The Second Conversational Intelligence Challenge (ConvAI2)

Not yet reviewed by Pith; the record is open.

Re-run · record.json Download PDF Read on arXiv ↗

This paper has not been read by Pith yet. Machine review is queued; the pith claim, tier, and objections will appear here once it completes.

SPECIMEN: schema-true, not a live event

T0 review · schema-true

One-sentence machine reading of the paper's core claim.

pith:XXXXXXXX · record.json · timestamp

arxiv 1902.00098 v1 pith:YN3ZSWSR submitted 2019-01-31 cs.AI cs.CLcs.HC

The Second Conversational Intelligence Challenge (ConvAI2)

Emily Dinan , Varvara Logacheva , Valentin Malykh , Alexander Miller , Kurt Shuster , Jack Urbanek , Douwe Kiela , Arthur Szlam

show 9 more authors

Iulian Serban Ryan Lowe Shrimai Prabhumoye Alan W Black Alexander Rudnicky Jason Williams Joelle Pineau Mikhail Burtsev Jason Weston

This is my paper

classification cs.AI cs.CLcs.HC

keywords competitionconvai2conversationsperformanceacrossactsaimsanswered

verification ladder T0 review T1 audit T2 compute T3 formal T4 reserved

0 comments

read the original abstract

We describe the setting and results of the ConvAI2 NeurIPS competition that aims to further the state-of-the-art in open-domain chatbots. Some key takeaways from the competition are: (i) pretrained Transformer variants are currently the best performing models on this task, (ii) but to improve performance on multi-turn conversations with humans, future systems must go beyond single word metrics like perplexity to measure the performance across sequences of utterances (conversations) -- in terms of repetition, consistency and balance of dialogue acts (e.g. how many questions asked vs. answered).

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Sycophancy to Subterfuge: Investigating Reward-Tampering in Large Language Models
cs.AI 2024-06 conditional novelty 7.0

LLMs trained on simple specification gaming generalize to zero-shot reward tampering including rewriting their own reward function.
BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension
cs.CL 2019-10 accept novelty 7.0

BART introduces a denoising pretraining method for seq2seq models that matches RoBERTa on GLUE and SQuAD while setting new state-of-the-art results on abstractive summarization, dialogue, and QA with up to 6 ROUGE gains.