Title resolution pending

GPT-4o System Card , author= · 2024

17 Pith papers cite this work. Polarity classification is still indexing.

17 Pith papers citing it

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

representative citing papers

UCSF-PDGM-VQA: Visual Question Answering dataset for brain tumor MRI interpretation

cs.CV · 2026-05-16 · unverdicted · novelty 7.0 · 2 refs

Introduces the UCSF-PDGM-VQA dataset of 2387 QA pairs from 473 glioma MRI studies and demonstrates that state-of-the-art VLMs exhibit modality collapse on multi-sequence 3D medical images.

Surprisal Minimisation over Goal-directed Alternatives Predicts Production Choice in Dialogue

cs.CL · 2026-05-01 · unverdicted · novelty 7.0

Surprisal minimization over goal-directed alternatives generated by language models provides the strongest account of production choices in open-ended dialogue compared to uniform information density or length-based costs.

Structure Guided Retrieval-Augmented Generation for Factual Queries

cs.IR · 2026-04-21 · unverdicted · novelty 7.0

SG-RAG frames retrieval as subgraph matching to ensure LLMs meet every condition in factual queries and reports large gains over baselines on a new 120k-pair ERQA dataset.

When Vision-Language Models Judge Without Seeing: Exposing Informativeness Bias

cs.AI · 2026-04-20 · unverdicted · novelty 7.0

VLMs as judges exhibit informativeness bias by favoring detailed but image-inconsistent answers; BIRCH mitigates it by first correcting answers against the image, reducing bias up to 17% and improving performance up to 9.8%.

SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution

cs.SE · 2025-02-25 · unverdicted · novelty 7.0

SWE-RL uses RL on software evolution data to train LLMs achieving 41% on SWE-bench Verified with generalization to other reasoning tasks.

The Illusion of Intervention: Your LLM-Simulated Experiment is an Observational Study

cs.CL · 2026-05-20 · unverdicted · novelty 6.0

Interventions in LLM-simulated user experiments induce distribution shifts in latent attributes that create confounding bias, diagnosable with negative control outcomes and partially mitigated by adding setting-relevant persona details.

DMN: A Compositional Framework for Jailbreaking Multimodal LLMs with Multi-Image Inputs

cs.CR · 2026-05-18 · unverdicted · novelty 6.0

DMN achieves over 90% attack success rate on GPT-4o, Gemini-2.5-pro and Claude Sonnet 4 by distributing instructions, supplying multimodal evidence, and adding number chain tasks across multiple images.

Prefix-Adaptive Block Diffusion for Efficient Document Recognition

cs.CV · 2026-05-16 · unverdicted · novelty 6.0

PA-BDM adapts block diffusion by switching to causal intra-block denoising and dynamically committing reliable prefixes to KV cache, yielding higher accuracy and 71.6% higher throughput than a comparable baseline on document benchmarks.

When Audio-Language Models Fail to Leverage Multimodal Context for Dysarthric Speech Recognition

cs.AI · 2026-05-04 · unverdicted · novelty 6.0

Current audio-language models fail to use clinical multimodal context for dysarthric speech recognition, but context-aware LoRA fine-tuning delivers large accuracy gains on the SAP dataset.

DR-MMSearchAgent: Deepening Reasoning in Multimodal Search Agents

cs.CV · 2026-04-21 · unverdicted · novelty 6.0

DR-MMSearchAgent derives batch-wide trajectory advantages and uses differentiated Gaussian rewards to prevent premature collapse in multimodal agents, outperforming MMSearch-R1 by 8.4% on FVQA-test.

PDDL-Mind: Large Language Models are Capable on Belief Reasoning with Reliable State Tracking

cs.CL · 2026-04-20 · unverdicted · novelty 6.0

PDDL-Mind improves LLM accuracy on theory-of-mind benchmarks by over 5% by translating stories into verifiable PDDL states that decouple environment tracking from belief inference.

Building a Custom Taxonomy of AI Skills and Tasks from the Ground Up with Job Postings

cs.CL · 2026-05-20 · unverdicted · novelty 5.0

Filtering job posting data before LLM-assisted clustering and hierarchical labeling yields taxonomies with better AI skill coverage than unfiltered approaches.

When Tabular Foundation Models Meet Strategic Tabular Data: A Prior Alignment Approach

cs.AI · 2026-05-19 · unverdicted · novelty 5.0

The paper proposes Strategic Prior-data Fitted Network (SPN), an inference-time method that adapts pretrained tabular foundation models to strategic feature manipulation by constructing aligned in-context examples.

Traditional statistical representations outperform generative AI in identifying expert peer reviewers

cs.IR · 2026-05-18 · unverdicted · novelty 5.0

TF-IDF identifies labeled experts in the top 25 recommendations 79.5% of the time versus 51.5% for GPT-4o mini on an astronomy observatory dataset.

AMARIS: A Memory-Augmented Rubric Improvement System for Rubric-Based Reinforcement Learning

cs.LG · 2026-05-18

Bad Seeing or Bad Thinking? Rewarding Perception for Multimodal Reasoning

cs.AI · 2026-05-13

R2IF: Aligning Reasoning with Decisions via Composite Rewards for Interpretable LLM Function Calling

cs.LG · 2026-04-22

citing papers explorer

Showing 17 of 17 citing papers.

UCSF-PDGM-VQA: Visual Question Answering dataset for brain tumor MRI interpretation cs.CV · 2026-05-16 · unverdicted · none · ref 47 · 2 links
Introduces the UCSF-PDGM-VQA dataset of 2387 QA pairs from 473 glioma MRI studies and demonstrates that state-of-the-art VLMs exhibit modality collapse on multi-sequence 3D medical images.
Surprisal Minimisation over Goal-directed Alternatives Predicts Production Choice in Dialogue cs.CL · 2026-05-01 · unverdicted · none · ref 69
Surprisal minimization over goal-directed alternatives generated by language models provides the strongest account of production choices in open-ended dialogue compared to uniform information density or length-based costs.
Structure Guided Retrieval-Augmented Generation for Factual Queries cs.IR · 2026-04-21 · unverdicted · none · ref 54
SG-RAG frames retrieval as subgraph matching to ensure LLMs meet every condition in factual queries and reports large gains over baselines on a new 120k-pair ERQA dataset.
When Vision-Language Models Judge Without Seeing: Exposing Informativeness Bias cs.AI · 2026-04-20 · unverdicted · none · ref 13
VLMs as judges exhibit informativeness bias by favoring detailed but image-inconsistent answers; BIRCH mitigates it by first correcting answers against the image, reducing bias up to 17% and improving performance up to 9.8%.
SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution cs.SE · 2025-02-25 · unverdicted · none · ref 171
SWE-RL uses RL on software evolution data to train LLMs achieving 41% on SWE-bench Verified with generalization to other reasoning tasks.
The Illusion of Intervention: Your LLM-Simulated Experiment is an Observational Study cs.CL · 2026-05-20 · unverdicted · none · ref 46
Interventions in LLM-simulated user experiments induce distribution shifts in latent attributes that create confounding bias, diagnosable with negative control outcomes and partially mitigated by adding setting-relevant persona details.
DMN: A Compositional Framework for Jailbreaking Multimodal LLMs with Multi-Image Inputs cs.CR · 2026-05-18 · unverdicted · none · ref 13
DMN achieves over 90% attack success rate on GPT-4o, Gemini-2.5-pro and Claude Sonnet 4 by distributing instructions, supplying multimodal evidence, and adding number chain tasks across multiple images.
Prefix-Adaptive Block Diffusion for Efficient Document Recognition cs.CV · 2026-05-16 · unverdicted · none · ref 64
PA-BDM adapts block diffusion by switching to causal intra-block denoising and dynamically committing reliable prefixes to KV cache, yielding higher accuracy and 71.6% higher throughput than a comparable baseline on document benchmarks.
When Audio-Language Models Fail to Leverage Multimodal Context for Dysarthric Speech Recognition cs.AI · 2026-05-04 · unverdicted · none · ref 6
Current audio-language models fail to use clinical multimodal context for dysarthric speech recognition, but context-aware LoRA fine-tuning delivers large accuracy gains on the SAP dataset.
DR-MMSearchAgent: Deepening Reasoning in Multimodal Search Agents cs.CV · 2026-04-21 · unverdicted · none · ref 17
DR-MMSearchAgent derives batch-wide trajectory advantages and uses differentiated Gaussian rewards to prevent premature collapse in multimodal agents, outperforming MMSearch-R1 by 8.4% on FVQA-test.
PDDL-Mind: Large Language Models are Capable on Belief Reasoning with Reliable State Tracking cs.CL · 2026-04-20 · unverdicted · none · ref 66
PDDL-Mind improves LLM accuracy on theory-of-mind benchmarks by over 5% by translating stories into verifiable PDDL states that decouple environment tracking from belief inference.
Building a Custom Taxonomy of AI Skills and Tasks from the Ground Up with Job Postings cs.CL · 2026-05-20 · unverdicted · none · ref 59
Filtering job posting data before LLM-assisted clustering and hierarchical labeling yields taxonomies with better AI skill coverage than unfiltered approaches.
When Tabular Foundation Models Meet Strategic Tabular Data: A Prior Alignment Approach cs.AI · 2026-05-19 · unverdicted · none · ref 94
The paper proposes Strategic Prior-data Fitted Network (SPN), an inference-time method that adapts pretrained tabular foundation models to strategic feature manipulation by constructing aligned in-context examples.
Traditional statistical representations outperform generative AI in identifying expert peer reviewers cs.IR · 2026-05-18 · unverdicted · none · ref 62
TF-IDF identifies labeled experts in the top 25 recommendations 79.5% of the time versus 51.5% for GPT-4o mini on an astronomy observatory dataset.
AMARIS: A Memory-Augmented Rubric Improvement System for Rubric-Based Reinforcement Learning cs.LG · 2026-05-18 · unreviewed · ref 22
Bad Seeing or Bad Thinking? Rewarding Perception for Multimodal Reasoning cs.AI · 2026-05-13 · unreviewed · ref 80
R2IF: Aligning Reasoning with Decisions via Composite Rewards for Interpretable LLM Function Calling cs.LG · 2026-04-22 · unreviewed · ref 36

Title resolution pending

fields

years

verdicts

representative citing papers

citing papers explorer