Creates LoCoMo benchmark dataset for very long-term LLM conversational memory and shows current models struggle with lengthy dialogues and long-range temporal dynamics.
hub
BlenderBot 3: a deployed conversational agent that continually learns to responsibly engage
10 Pith papers cite this work. Polarity classification is still indexing.
hub tools
citation-role summary
citation-polarity summary
roles
background 3representative citing papers
GAIA benchmark shows humans at 92% accuracy on simple real-world questions far outperform current AI systems at 15%, proposing this gap as a key milestone for general AI.
Optimus mitigates toxicity during LLM fine-tuning by combining repurposed LLM safety alignments for detection with synthetic data and DPO alignment, remaining effective even with highly biased classifiers and against attacks.
Chain-of-Verification reduces hallucinations in large language models by drafting responses, planning independent verification questions, answering them separately, and generating a final verified output.
Gorilla is a fine-tuned LLM that surpasses GPT-4 in accurate API call generation and uses retrieval to handle documentation updates.
CAMEL proposes a role-playing framework with inception prompting that enables autonomous multi-agent cooperation among LLMs and generates conversational data for studying their behaviors.
DebiasRAG uses a three-stage RAG process to generate and rerank query-specific debiasing contexts that act as fairness constraints for LLM outputs.
A user study with 20 participants shows that closeness between sketches, annotations, and language in a shared space helps disambiguate multimodal queries, leading to the concept of proximity semantics for data exploration systems.
A survey classifying RAG foundations for AIGC, summarizing enhancements, cross-modal applications, benchmarks, limitations, and future directions.
citing papers explorer
-
GAIA: a benchmark for General AI Assistants
GAIA benchmark shows humans at 92% accuracy on simple real-world questions far outperform current AI systems at 15%, proposing this gap as a key milestone for general AI.
-
Chain-of-Verification Reduces Hallucination in Large Language Models
Chain-of-Verification reduces hallucinations in large language models by drafting responses, planning independent verification questions, answering them separately, and generating a final verified output.
-
Gorilla: Large Language Model Connected with Massive APIs
Gorilla is a fine-tuned LLM that surpasses GPT-4 in accurate API call generation and uses retrieval to handle documentation updates.
-
CAMEL: Communicative Agents for "Mind" Exploration of Large Language Model Society
CAMEL proposes a role-playing framework with inception prompting that enables autonomous multi-agent cooperation among LLMs and generates conversational data for studying their behaviors.