ReMMD presents ReMMDBench (500 samples, 2756 images, five languages, five-way veracity) and ReMMD-Agent, which achieves 41.80% accuracy and 39.12% macro-F1 on five-way classification with GPT-5.2 while cutting costs versus prior agents.
Science 359(6380):1146--1151
11 Pith papers cite this work. Polarity classification is still indexing.
representative citing papers
PluRule is a new multimodal multilingual benchmark showing that state-of-the-art vision-language models perform only marginally better than a trivial baseline at detecting specific rule violations in pluralistic online communities.
A scalable Aumann-Shapley attribution method for million-agent systems reveals that small-scale samples structurally misattribute emergence under nonlinear macro indicators, as shown by the Attribution Scaling Bias theorem.
A reproducible framework maps topics from separate corpora into a shared space anchored by the 94 IPTC Media Topics taxonomy via guided BERTopic, centroid scoring, and upward collapse to 17 parent topics.
BOUTEF is a publicly available multilingual corpus for fake news research in Algeria and Tunisia, with narratives, comments, and debunkings across multiple languages and dialects, accompanied by thematic and engagement analyses.
Guardrail sampling strategies embedded in line charts increase user trust, improve accuracy of performance judgments, and raise perceived completeness of context in persuasive visualizations for COVID-19 and stock data.
Foundation models are large adaptable AI systems with emergent capabilities that offer broad opportunities but carry risks from homogenization, opacity, and inherited defects across downstream applications.
Survival analysis of three years of X posts shows conspiracy claims with greater semantic mutations have substantially longer lifespans, linked to changes in pronouns, social words, cognitive terms, and actor-action-target structures.
Simulations show information overload decreases source localization effectiveness in networks, with Erdős-Rényi graphs more resilient than Barabási-Albert ones and a reversal where less dense networks perform better under strong overload.
AI alignment should target objective floors of competence, accuracy, honesty, and lawfulness rather than aggregated human preferences.
Fine-tuned RoBERTa achieves 0.62 macro-F1 on 900 Reddit comments, outperforming best zero-shot LLM at 0.50, with largest gap on detecting belief propagation.
citing papers explorer
-
Guardrail Selection in Line Charts to Contextualize Persuasive Visualizations
Guardrail sampling strategies embedded in line charts increase user trust, improve accuracy of performance judgments, and raise perceived completeness of context in persuasive visualizations for COVID-19 and stock data.