Title resolution pending

Liang, T · 2024 · DOI 10.18653/v1/2024.emnlp-main.992

20 Pith papers cite this work. Polarity classification is still indexing.

20 Pith papers citing it

open at publisher browse 20 citing papers

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

citation-role summary

background 3 baseline 1

citation-polarity summary

background 3 baseline 1

representative citing papers

ArgBench: Benchmarking LLMs on Computational Argumentation Tasks

cs.CL · 2026-04-19 · unverdicted · novelty 8.0

ArgBench unifies 33 existing datasets into a standardized benchmark for testing LLMs across 46 argumentation tasks and analyzes the impact of prompting techniques and model factors on performance.

The Moltbook Files: A Harmless Slopocalypse or Humanity's Last Experiment

cs.CL · 2026-05-08 · unverdicted · novelty 7.0

An AI-agent social platform generated mostly neutral content whose use in fine-tuning reduced model truthfulness comparably to human Reddit data, suggesting limited unique harm but flagging tail risks like secret leaks.

From Experience to Skill: Multi-Agent Generative Engine Optimization via Reusable Strategy Learning

cs.AI · 2026-04-21 · unverdicted · novelty 7.0

MAGEO is a multi-agent system that distills validated editing patterns into reusable optimization skills for generative engines, outperforming heuristic baselines on visibility and fidelity via a new benchmark and evaluation protocol.

Wiring the 'Why': A Unified Taxonomy and Survey of Abductive Reasoning in LLMs

cs.AI · 2026-04-09 · accept · novelty 7.0

The paper delivers the first survey of abductive reasoning in LLMs, a unified two-stage taxonomy, a compact benchmark, and an analysis of gaps relative to deductive and inductive reasoning.

Do NOT Think That Much for 2+3=? On the Overthinking of o1-Like LLMs

cs.CL · 2024-12-30 · unverdicted · novelty 7.0

o1-like models overthink easy tasks; self-training reduces compute use without accuracy loss on GSM8K, MATH500, GPQA, and AIME.

Blind Spots in the Guard: How Domain-Camouflaged Injection Attacks Evade Detection in Multi-Agent LLM Systems

cs.CR · 2026-05-21 · conditional · novelty 6.0

Domain-camouflaged injection attacks reduce detection rates from 93.8% to 9.7% on Llama 3.1 8B and 100% to 55.6% on Gemini 2.0 Flash, with the gap persisting in production classifiers and multi-agent debate setups.

Multi-agent AI systems outperform human teams in creativity

cs.CL · 2026-05-18 · unverdicted · novelty 6.0

Multi-agent LLM teams outperform human teams in creativity (d=1.50) across tasks by producing more novel ideas, with distinct semantic exploration patterns predicting success for each group.

Response-Conditioned Parallel-to-Sequential Orchestration for Multi-Agent Systems

cs.CL · 2026-05-15 · unverdicted · novelty 6.0

Nexa learns a response-conditioned policy that starts with parallel agent execution and adds at most one round of sequential message passing via a predicted sparse DAG, strictly subsuming pure parallel mode.

LLM-X: A Scalable Negotiation-Oriented Exchange for Communication Among Personal LLM Agents

cs.AI · 2026-05-12 · unverdicted · novelty 6.0

LLM-X is a scalable architecture for direct negotiation and communication among personal LLM agents, featuring federated gateways, typed protocols, and policy enforcement, shown stable in experiments with up to 12 agents.

Training-Free Cultural Alignment of Large Language Models via Persona Disagreement

cs.CL · 2026-05-11 · conditional · novelty 6.0 · 2 refs

DISCA converts within-country disagreement among World Values Survey personas into a bounded logit correction that reduces cultural misalignment by 10-24% on MultiTP for models 3.8B and larger across 20 countries, without any weight updates.

When Reviews Disagree: Fine-Grained Contradiction Analysis in Scientific Peer Reviews

cs.CL · 2026-05-11 · unverdicted · novelty 6.0

Introduces RevCI benchmark and IMPACT multi-agent framework for evidence-level contradiction detection and graded intensity scoring in peer reviews, distilled into efficient TIDE model.

Bimanual Robot Manipulation via Multi-Agent In-Context Learning

cs.RO · 2026-04-22 · unverdicted · novelty 6.0

BiCICLe frames bimanual robot control as a multi-agent leader-follower problem with Arms' Debate and an LLM judge, achieving up to 71.1% success on 13 TWIN benchmark tasks without fine-tuning.

ECHO: Event-Centric Hypergraph Operations via Multi-Agent Collaboration for Multimedia Event Extraction

cs.CV · 2026-03-04 · unverdicted · novelty 6.0

ECHO reframes multimedia event extraction as multi-agent iterative refinement over an explicit Multimedia Event Hypergraph with a decoupled Link-then-Bind strategy, delivering 7.3 and 15.5 F1 gains on event mention and argument role.

Beyond Inefficiency: Systemic Costs of Incivility in Multi-Agent Monte Carlo Simulations

cs.AI · 2026-05-12 · unverdicted · novelty 5.0

Monte Carlo simulations of LLM agents confirm that toxic debates take 25% longer to converge, with larger delays in smaller models, and show a first-mover advantage independent of toxicity.

AgentRx: A Benchmark Study of LLM Agents for Multimodal Clinical Prediction Tasks

cs.AI · 2026-05-11 · unverdicted · novelty 5.0

Single-agent LLM frameworks outperform naive multi-agent systems in multimodal clinical risk prediction tasks and are better calibrated.

Verify Before You Commit: Towards Faithful Reasoning in LLM Agents via Self-Auditing

cs.AI · 2026-04-09 · unverdicted · novelty 5.0

SAVeR adds self-auditing of internal beliefs in LLM agents via persona-based candidates and constraint-guided repairs, improving faithfulness on six benchmarks without hurting task performance.

Improving Role Consistency in Multi-Agent Collaboration via Quantitative Role Clarity

cs.AI · 2026-04-03 · conditional · novelty 5.0

A role clarity matrix from softmax-normalized behavior-role similarities is employed as a regularizer to enhance role consistency in multi-agent LLM collaborations.

Normative Common Ground Replication (NormCoRe): Replication-by-Translation for Studying Norms in Multi-Agent AI

cs.AI · 2026-03-12 · conditional · novelty 5.0

NormCoRe is a replication-by-translation framework that maps human subject studies onto multi-agent AI environments, showing AI normative judgments on fairness differ from human baselines and vary with model choice and persona language.

Human-LLM Dialogue Improves Diagnostic Accuracy in Emergency Care

cs.AI · 2026-05-08 · unverdicted · novelty 4.0

Interactive LLM dialogue raised residents' hard-case diagnostic correctness from 0.589 to 0.734 and produced medium effect sizes in a blinded study of seven physicians on 52 emergency cases.

Bridging the Linguistic Divide: A Survey on Leveraging Large Language Models for Machine Translation

cs.CL · 2025-04-02 · unverdicted · novelty 3.0

A literature survey that organizes prompting, fine-tuning, preference optimization, and context-aware techniques for LLM-based machine translation with emphasis on low-resource languages.

citing papers explorer

Showing 20 of 20 citing papers.

ArgBench: Benchmarking LLMs on Computational Argumentation Tasks cs.CL · 2026-04-19 · unverdicted · none · ref 35
ArgBench unifies 33 existing datasets into a standardized benchmark for testing LLMs across 46 argumentation tasks and analyzes the impact of prompting techniques and model factors on performance.
The Moltbook Files: A Harmless Slopocalypse or Humanity's Last Experiment cs.CL · 2026-05-08 · unverdicted · none · ref 138
An AI-agent social platform generated mostly neutral content whose use in fine-tuning reduced model truthfulness comparably to human Reddit data, suggesting limited unique harm but flagging tail risks like secret leaks.
From Experience to Skill: Multi-Agent Generative Engine Optimization via Reusable Strategy Learning cs.AI · 2026-04-21 · unverdicted · none · ref 73
MAGEO is a multi-agent system that distills validated editing patterns into reusable optimization skills for generative engines, outperforming heuristic baselines on visibility and fidelity via a new benchmark and evaluation protocol.
Wiring the 'Why': A Unified Taxonomy and Survey of Abductive Reasoning in LLMs cs.AI · 2026-04-09 · accept · none · ref 56
The paper delivers the first survey of abductive reasoning in LLMs, a unified two-stage taxonomy, a compact benchmark, and an analysis of gaps relative to deductive and inductive reasoning.
Do NOT Think That Much for 2+3=? On the Overthinking of o1-Like LLMs cs.CL · 2024-12-30 · unverdicted · none · ref 271
o1-like models overthink easy tasks; self-training reduces compute use without accuracy loss on GSM8K, MATH500, GPQA, and AIME.
Blind Spots in the Guard: How Domain-Camouflaged Injection Attacks Evade Detection in Multi-Agent LLM Systems cs.CR · 2026-05-21 · conditional · none · ref 3
Domain-camouflaged injection attacks reduce detection rates from 93.8% to 9.7% on Llama 3.1 8B and 100% to 55.6% on Gemini 2.0 Flash, with the gap persisting in production classifiers and multi-agent debate setups.
Multi-agent AI systems outperform human teams in creativity cs.CL · 2026-05-18 · unverdicted · none · ref 17
Multi-agent LLM teams outperform human teams in creativity (d=1.50) across tasks by producing more novel ideas, with distinct semantic exploration patterns predicting success for each group.
Response-Conditioned Parallel-to-Sequential Orchestration for Multi-Agent Systems cs.CL · 2026-05-15 · unverdicted · none · ref 27
Nexa learns a response-conditioned policy that starts with parallel agent execution and adds at most one round of sequential message passing via a predicted sparse DAG, strictly subsuming pure parallel mode.
LLM-X: A Scalable Negotiation-Oriented Exchange for Communication Among Personal LLM Agents cs.AI · 2026-05-12 · unverdicted · none · ref 15
LLM-X is a scalable architecture for direct negotiation and communication among personal LLM agents, featuring federated gateways, typed protocols, and policy enforcement, shown stable in experiments with up to 12 agents.
Training-Free Cultural Alignment of Large Language Models via Persona Disagreement cs.CL · 2026-05-11 · conditional · none · ref 24 · 2 links
DISCA converts within-country disagreement among World Values Survey personas into a bounded logit correction that reduces cultural misalignment by 10-24% on MultiTP for models 3.8B and larger across 20 countries, without any weight updates.
When Reviews Disagree: Fine-Grained Contradiction Analysis in Scientific Peer Reviews cs.CL · 2026-05-11 · unverdicted · none · ref 34
Introduces RevCI benchmark and IMPACT multi-agent framework for evidence-level contradiction detection and graded intensity scoring in peer reviews, distilled into efficient TIDE model.
Bimanual Robot Manipulation via Multi-Agent In-Context Learning cs.RO · 2026-04-22 · unverdicted · none · ref 29
BiCICLe frames bimanual robot control as a multi-agent leader-follower problem with Arms' Debate and an LLM judge, achieving up to 71.1% success on 13 TWIN benchmark tasks without fine-tuning.
ECHO: Event-Centric Hypergraph Operations via Multi-Agent Collaboration for Multimedia Event Extraction cs.CV · 2026-03-04 · unverdicted · none · ref 18
ECHO reframes multimedia event extraction as multi-agent iterative refinement over an explicit Multimedia Event Hypergraph with a decoupled Link-then-Bind strategy, delivering 7.3 and 15.5 F1 gains on event mention and argument role.
Beyond Inefficiency: Systemic Costs of Incivility in Multi-Agent Monte Carlo Simulations cs.AI · 2026-05-12 · unverdicted · none · ref 13
Monte Carlo simulations of LLM agents confirm that toxic debates take 25% longer to converge, with larger delays in smaller models, and show a first-mover advantage independent of toxicity.
AgentRx: A Benchmark Study of LLM Agents for Multimodal Clinical Prediction Tasks cs.AI · 2026-05-11 · unverdicted · none · ref 113
Single-agent LLM frameworks outperform naive multi-agent systems in multimodal clinical risk prediction tasks and are better calibrated.
Verify Before You Commit: Towards Faithful Reasoning in LLM Agents via Self-Auditing cs.AI · 2026-04-09 · unverdicted · none · ref 33
SAVeR adds self-auditing of internal beliefs in LLM agents via persona-based candidates and constraint-guided repairs, improving faithfulness on six benchmarks without hurting task performance.
Improving Role Consistency in Multi-Agent Collaboration via Quantitative Role Clarity cs.AI · 2026-04-03 · conditional · none · ref 11
A role clarity matrix from softmax-normalized behavior-role similarities is employed as a regularizer to enhance role consistency in multi-agent LLM collaborations.
Normative Common Ground Replication (NormCoRe): Replication-by-Translation for Studying Norms in Multi-Agent AI cs.AI · 2026-03-12 · conditional · none · ref 34
NormCoRe is a replication-by-translation framework that maps human subject studies onto multi-agent AI environments, showing AI normative judgments on fairness differ from human baselines and vary with model choice and persona language.
Human-LLM Dialogue Improves Diagnostic Accuracy in Emergency Care cs.AI · 2026-05-08 · unverdicted · none · ref 25
Interactive LLM dialogue raised residents' hard-case diagnostic correctness from 0.589 to 0.734 and produced medium effect sizes in a blinded study of seven physicians on 52 emergency cases.
Bridging the Linguistic Divide: A Survey on Leveraging Large Language Models for Machine Translation cs.CL · 2025-04-02 · unverdicted · none · ref 88
A literature survey that organizes prompting, fine-tuning, preference optimization, and context-aware techniques for LLM-based machine translation with emphasis on low-resource languages.

Title resolution pending

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer