hub

Communications of the ACM , volume=

Datasheets for datasets , author= · 2021

13 Pith papers cite this work. Polarity classification is still indexing.

13 Pith papers citing it

browse 13 citing papers

hub tools

JSON dossier citing papers JSON

citation-role summary

background 2 other 1

citation-polarity summary

background 2 unclear 1

representative citing papers

TAVIS: A Benchmark for Egocentric Active Vision and Anticipatory Gaze in Imitation Learning

cs.RO · 2026-05-08 · accept · novelty 8.0

TAVIS is a released benchmark showing active vision improves imitation learning in a task-dependent manner, multi-task policies struggle with shifts, and imitation produces human-like anticipatory gaze.

DeTox-Fed: Detecting Toxic Conversations in the Fediverse with Federated Graph Neural Networks

cs.SI · 2026-05-20 · unverdicted · novelty 7.0

DeTox-Fed uses federated graph neural networks on local conversation graphs to detect toxic discussions in the Fediverse while keeping all raw data and labels on individual instances.

When 'For You' Isn't For You: Measuring User Agency in TikTok's Algorithmic Feed

cs.CY · 2026-05-11 · unverdicted · novelty 7.0

TikTok's FYP algorithm changes content based on user signals yet reverts to unwanted topics once explicit disinterest stops, with the strongest signal buried in the interface.

Participatory provenance as representational auditing for AI-mediated public consultation

cs.AI · 2026-04-22 · unverdicted · novelty 7.0

Participatory provenance auditing of Canada's AI strategy consultation shows official AI summaries exclude 15-17% of participants more than random baselines, with 33-88% exclusion for dissent clusters.

PaLI: A Jointly-Scaled Multilingual Language-Image Model

cs.CV · 2022-09-14 · conditional · novelty 7.0

PaLI jointly scales a 4B-parameter vision transformer with language models on a new 10B multilingual image-text dataset to reach state-of-the-art results on vision-language tasks while keeping a simple modular design.

DisImpact: Quantifying the Physi-Social Impact of Natural Disasters Through Social Media

cs.SI · 2026-05-20 · unverdicted · novelty 6.0

DisImpact introduces a two-stage MLLM framework to classify disaster-related social media posts into ten impact categories and compute a unified physi-social impact index validated against FEMA and NASA ground-truth data.

Active Tabular Augmentation via Policy-Guided Diffusion Inpainting

cs.LG · 2026-05-11 · unverdicted · novelty 6.0

TAP couples a learner-conditioned policy with diffusion inpainting to generate and selectively inject high-utility tabular augmentations, yielding up to 15.6 pp accuracy gains and 32% RMSE reduction on seven datasets under severe scarcity.

Mage: Multi-Axis Evaluation of LLM-Generated Executable Game Scenes Beyond Compile-Pass Rate

cs.LG · 2026-05-08 · conditional · novelty 6.0

Mage shows compile-pass rate is anti-correlated with functional correctness in LLM game scene generation; direct NL-to-C# yields 43% runtime but F1~0.12 structure, while IR conditioning recovers structure (F1 up to 1.0) but halves runtime, with granularity levels statistically equivalent.

Towards an AI co-scientist

cs.AI · 2025-02-26 · unverdicted · novelty 6.0

A multi-agent AI system generates novel biomedical hypotheses that show promising experimental validation in drug repurposing for leukemia, new targets for liver fibrosis, and a bacterial gene transfer mechanism.

Smaug: Fixing Failure Modes of Preference Optimisation with DPO-Positive

cs.CL · 2024-02-20 · conditional · novelty 6.0

DPOP is a new loss function that prevents DPO from lowering preferred response likelihoods and outperforms standard DPO on diverse datasets, MT-Bench, and enables Smaug-72B to exceed 80% on the Open LLM Leaderboard.

Towards Expert-Level Medical Question Answering with Large Language Models

cs.CL · 2023-05-16 · unverdicted · novelty 6.0

Med-PaLM 2 achieves 86.5% accuracy on MedQA and approaches or exceeds prior state-of-the-art on other medical QA benchmarks while receiving higher physician preference ratings than human answers on consumer questions.

AGIEval: A Human-Centric Benchmark for Evaluating Foundation Models

cs.CL · 2023-04-13 · accept · novelty 6.0

AGIEval shows GPT-4 exceeding average human scores on SAT Math at 95% and Chinese college entrance English at 92.5%, while revealing weaker results on complex reasoning tasks.

StarCoder: may the source be with you!

cs.CL · 2023-05-09 · accept · novelty 5.0

StarCoderBase matches or beats OpenAI's code-cushman-001 on multi-language code benchmarks; the Python-fine-tuned StarCoder reaches 40% pass@1 on HumanEval while retaining other-language performance.

citing papers explorer

Showing 13 of 13 citing papers.

TAVIS: A Benchmark for Egocentric Active Vision and Anticipatory Gaze in Imitation Learning cs.RO · 2026-05-08 · accept · none · ref 32
TAVIS is a released benchmark showing active vision improves imitation learning in a task-dependent manner, multi-task policies struggle with shifts, and imitation produces human-like anticipatory gaze.
DeTox-Fed: Detecting Toxic Conversations in the Fediverse with Federated Graph Neural Networks cs.SI · 2026-05-20 · unverdicted · none · ref 99
DeTox-Fed uses federated graph neural networks on local conversation graphs to detect toxic discussions in the Fediverse while keeping all raw data and labels on individual instances.
When 'For You' Isn't For You: Measuring User Agency in TikTok's Algorithmic Feed cs.CY · 2026-05-11 · unverdicted · none · ref 33
TikTok's FYP algorithm changes content based on user signals yet reverts to unwanted topics once explicit disinterest stops, with the strongest signal buried in the interface.
Participatory provenance as representational auditing for AI-mediated public consultation cs.AI · 2026-04-22 · unverdicted · none · ref 36
Participatory provenance auditing of Canada's AI strategy consultation shows official AI summaries exclude 15-17% of participants more than random baselines, with 33-88% exclusion for dissent clusters.
PaLI: A Jointly-Scaled Multilingual Language-Image Model cs.CV · 2022-09-14 · conditional · none · ref 178
PaLI jointly scales a 4B-parameter vision transformer with language models on a new 10B multilingual image-text dataset to reach state-of-the-art results on vision-language tasks while keeping a simple modular design.
DisImpact: Quantifying the Physi-Social Impact of Natural Disasters Through Social Media cs.SI · 2026-05-20 · unverdicted · none · ref 109
DisImpact introduces a two-stage MLLM framework to classify disaster-related social media posts into ten impact categories and compute a unified physi-social impact index validated against FEMA and NASA ground-truth data.
Active Tabular Augmentation via Policy-Guided Diffusion Inpainting cs.LG · 2026-05-11 · unverdicted · none · ref 68
TAP couples a learner-conditioned policy with diffusion inpainting to generate and selectively inject high-utility tabular augmentations, yielding up to 15.6 pp accuracy gains and 32% RMSE reduction on seven datasets under severe scarcity.
Mage: Multi-Axis Evaluation of LLM-Generated Executable Game Scenes Beyond Compile-Pass Rate cs.LG · 2026-05-08 · conditional · none · ref 26
Mage shows compile-pass rate is anti-correlated with functional correctness in LLM game scene generation; direct NL-to-C# yields 43% runtime but F1~0.12 structure, while IR conditioning recovers structure (F1 up to 1.0) but halves runtime, with granularity levels statistically equivalent.
Towards an AI co-scientist cs.AI · 2025-02-26 · unverdicted · none · ref 173
A multi-agent AI system generates novel biomedical hypotheses that show promising experimental validation in drug repurposing for leukemia, new targets for liver fibrosis, and a bacterial gene transfer mechanism.
Smaug: Fixing Failure Modes of Preference Optimisation with DPO-Positive cs.CL · 2024-02-20 · conditional · none · ref 199
DPOP is a new loss function that prevents DPO from lowering preferred response likelihoods and outperforms standard DPO on diverse datasets, MT-Bench, and enables Smaug-72B to exceed 80% on the Open LLM Leaderboard.
Towards Expert-Level Medical Question Answering with Large Language Models cs.CL · 2023-05-16 · unverdicted · none · ref 91
Med-PaLM 2 achieves 86.5% accuracy on MedQA and approaches or exceeds prior state-of-the-art on other medical QA benchmarks while receiving higher physician preference ratings than human answers on consumer questions.
AGIEval: A Human-Centric Benchmark for Evaluating Foundation Models cs.CL · 2023-04-13 · accept · none · ref 4
AGIEval shows GPT-4 exceeding average human scores on SAT Math at 95% and Chinese college entrance English at 92.5%, while revealing weaker results on complex reasoning tasks.
StarCoder: may the source be with you! cs.CL · 2023-05-09 · accept · none · ref 171
StarCoderBase matches or beats OpenAI's code-cushman-001 on multi-language code benchmarks; the Python-fine-tuned StarCoder reaches 40% pass@1 on HumanEval while retaining other-language performance.

Communications of the ACM , volume=

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer