hub

BlenderBot 3: a deployed conversational agent that continually learns to responsibly engage

· 2022 · arXiv 2208.03188

12 Pith papers cite this work. Polarity classification is still indexing.

12 Pith papers citing it

read on arXiv browse 12 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 3

citation-polarity summary

background 2 support 1

representative citing papers

Evaluating Very Long-Term Conversational Memory of LLM Agents

cs.CL · 2024-02-27 · unverdicted · novelty 8.0

Creates LoCoMo benchmark dataset for very long-term LLM conversational memory and shows current models struggle with lengthy dialogues and long-range temporal dynamics.

GAIA: a benchmark for General AI Assistants

cs.CL · 2023-11-21 · unverdicted · novelty 7.0

GAIA benchmark shows humans at 92% accuracy on simple real-world questions far outperform current AI systems at 15%, proposing this gap as a key milestone for general AI.

Towards Reliable Agentic Progressive Text-to-Visualization with Verification Rules

cs.DB · 2026-05-28 · unverdicted · novelty 6.0

PMVisAgent uses multi-turn progressive interactions and a validation agent with ReAct-style verification to achieve up to 23.21% higher execution accuracy on the new PMVisBench dataset for text-to-vis tasks.

Optimus: A Robust Defense Framework for Mitigating Toxicity while Fine-Tuning Conversational AI

cs.CR · 2025-07-08 · unverdicted · novelty 6.0

Optimus mitigates toxicity during LLM fine-tuning by combining repurposed LLM safety alignments for detection with synthetic data and DPO alignment, remaining effective even with highly biased classifiers and against attacks.

Chain-of-Verification Reduces Hallucination in Large Language Models

cs.CL · 2023-09-20 · unverdicted · novelty 6.0

Chain-of-Verification reduces hallucinations in large language models by drafting responses, planning independent verification questions, answering them separately, and generating a final verified output.

Gorilla: Large Language Model Connected with Massive APIs

cs.CL · 2023-05-24 · conditional · novelty 6.0

Gorilla is a fine-tuned LLM that surpasses GPT-4 in accurate API call generation and uses retrieval to handle documentation updates.

CAMEL: Communicative Agents for "Mind" Exploration of Large Language Model Society

cs.AI · 2023-03-31 · conditional · novelty 6.0

CAMEL proposes a role-playing framework with inception prompting that enables autonomous multi-agent cooperation among LLMs and generates conversational data for studying their behaviors.

DebiasRAG: A Tuning-Free Path to Fair Generation in Large Language Models through Retrieval-Augmented Generation

cs.CL · 2026-05-15 · unverdicted · novelty 5.0

DebiasRAG uses a three-stage RAG process to generate and rerank query-specific debiasing contexts that act as fairness constraints for LLM outputs.

From 'Here' to 'There': Exploring Proximity Semantics in Multimodal Data Exploration

cs.HC · 2026-05-04 · unverdicted · novelty 5.0

A user study with 20 participants shows that closeness between sketches, annotations, and language in a shared space helps disambiguate multimodal queries, leading to the concept of proximity semantics for data exploration systems.

Retrieval-Augmented Generation for AI-Generated Content: A Survey

cs.CV · 2024-02-29 · accept · novelty 5.0

A survey classifying RAG foundations for AIGC, summarizing enhancements, cross-modal applications, benchmarks, limitations, and future directions.

Real-Time Language Model Jamming: A Case Study for Live Music Accompaniment Generation

cs.SD · 2026-06-10 · unverdicted · novelty 4.0

StreamMUSE performs frame-synchronous streaming inference for language models by having a client send high-frequency requests and a server return outputs aligned to an external clock, shown on live music accompaniment with open-source code.

WhiteTesseract: Reframing the Interpretation of Cultural Heritage through XR and Conversational AI

cs.HC · 2026-05-16 · unverdicted · novelty 3.0 · 2 refs

WhiteTesseract integrates XR diminished reality and LLM dialogue to increase viewing duration and interaction depth in physical cultural heritage exhibitions, shown in a 26-participant Monet exhibition study with statistically significant results.

citing papers explorer

Showing 9 of 9 citing papers after filters.

Evaluating Very Long-Term Conversational Memory of LLM Agents cs.CL · 2024-02-27 · unverdicted · none · ref 48
Creates LoCoMo benchmark dataset for very long-term LLM conversational memory and shows current models struggle with lengthy dialogues and long-range temporal dynamics.
GAIA: a benchmark for General AI Assistants cs.CL · 2023-11-21 · unverdicted · none · ref 193
GAIA benchmark shows humans at 92% accuracy on simple real-world questions far outperform current AI systems at 15%, proposing this gap as a key milestone for general AI.
Towards Reliable Agentic Progressive Text-to-Visualization with Verification Rules cs.DB · 2026-05-28 · unverdicted · none · ref 34
PMVisAgent uses multi-turn progressive interactions and a validation agent with ReAct-style verification to achieve up to 23.21% higher execution accuracy on the new PMVisBench dataset for text-to-vis tasks.
Optimus: A Robust Defense Framework for Mitigating Toxicity while Fine-Tuning Conversational AI cs.CR · 2025-07-08 · unverdicted · none · ref 66
Optimus mitigates toxicity during LLM fine-tuning by combining repurposed LLM safety alignments for detection with synthetic data and DPO alignment, remaining effective even with highly biased classifiers and against attacks.
Chain-of-Verification Reduces Hallucination in Large Language Models cs.CL · 2023-09-20 · unverdicted · none · ref 165
Chain-of-Verification reduces hallucinations in large language models by drafting responses, planning independent verification questions, answering them separately, and generating a final verified output.
DebiasRAG: A Tuning-Free Path to Fair Generation in Large Language Models through Retrieval-Augmented Generation cs.CL · 2026-05-15 · unverdicted · none · ref 50
DebiasRAG uses a three-stage RAG process to generate and rerank query-specific debiasing contexts that act as fairness constraints for LLM outputs.
From 'Here' to 'There': Exploring Proximity Semantics in Multimodal Data Exploration cs.HC · 2026-05-04 · unverdicted · none · ref 60
A user study with 20 participants shows that closeness between sketches, annotations, and language in a shared space helps disambiguate multimodal queries, leading to the concept of proximity semantics for data exploration systems.
Real-Time Language Model Jamming: A Case Study for Live Music Accompaniment Generation cs.SD · 2026-06-10 · unverdicted · none · ref 6
StreamMUSE performs frame-synchronous streaming inference for language models by having a client send high-frequency requests and a server return outputs aligned to an external clock, shown on live music accompaniment with open-source code.
WhiteTesseract: Reframing the Interpretation of Cultural Heritage through XR and Conversational AI cs.HC · 2026-05-16 · unverdicted · none · ref 79 · 2 links
WhiteTesseract integrates XR diminished reality and LLM dialogue to increase viewing duration and interaction depth in physical cultural heritage exhibitions, shown in a 26-participant Monet exhibition study with statistically significant results.

BlenderBot 3: a deployed conversational agent that continually learns to responsibly engage

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer