Title resolution pending

OpenAI · 2025

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

browse 4 citing papers

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

representative citing papers

SciFigDetect: A Benchmark for AI-Generated Scientific Figure Detection

cs.CV · 2026-04-09 · unverdicted · novelty 8.0

The first benchmark for AI-generated scientific figure detection shows existing detectors fail in zero-shot transfer, overfit to specific generators, and break under common image corruptions.

Training Computer Use Agents to Assess the Usability of Graphical User Interfaces

cs.CL · 2026-04-28 · unverdicted · novelty 7.0

uxCUA is a trained computer use agent that assesses GUI usability more accurately than larger models by learning to prioritize and execute important user interactions on labeled interface datasets.

In Line with Context: Repository-Level Code Generation via Context Inlining

cs.SE · 2026-01-01 · unverdicted · novelty 7.0

InlineCoder reframes repository-level code generation as function-level coding by using a draft anchor to inline the target function into its call graph for upstream usage and downstream dependency context.

Can Language Models Go Beyond Coding? Assessing the Capability of Language Models to Build Real-World Systems

cs.SE · 2025-11-02 · unverdicted · novelty 7.0

Build-bench is the first architecture-aware benchmark that evaluates LLMs on repairing cross-ISA build failures via iterative tool-augmented reasoning, with the best model reaching 63.19% success.

citing papers explorer

Showing 4 of 4 citing papers.

SciFigDetect: A Benchmark for AI-Generated Scientific Figure Detection cs.CV · 2026-04-09 · unverdicted · none · ref 33
The first benchmark for AI-generated scientific figure detection shows existing detectors fail in zero-shot transfer, overfit to specific generators, and break under common image corruptions.
Training Computer Use Agents to Assess the Usability of Graphical User Interfaces cs.CL · 2026-04-28 · unverdicted · none · ref 53
uxCUA is a trained computer use agent that assesses GUI usability more accurately than larger models by learning to prioritize and execute important user interactions on labeled interface datasets.
In Line with Context: Repository-Level Code Generation via Context Inlining cs.SE · 2026-01-01 · unverdicted · none · ref 44
InlineCoder reframes repository-level code generation as function-level coding by using a draft anchor to inline the target function into its call graph for upstream usage and downstream dependency context.
Can Language Models Go Beyond Coding? Assessing the Capability of Language Models to Build Real-World Systems cs.SE · 2025-11-02 · unverdicted · none · ref 42
Build-bench is the first architecture-aware benchmark that evaluates LLMs on repairing cross-ISA build failures via iterative tool-augmented reasoning, with the best model reaching 63.19% success.

Title resolution pending

fields

years

verdicts

representative citing papers

citing papers explorer