Title resolution pending

OpenAI · 2024

8 Pith papers cite this work. Polarity classification is still indexing.

8 Pith papers citing it

browse 8 citing papers

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

citation-role summary

background 2 baseline 1 method 1

citation-polarity summary

background 3 baseline 1

representative citing papers

ONOTE: Benchmarking Omnimodal Notation Processing for Expert-level Music Intelligence

cs.SD · 2026-04-22 · unverdicted · novelty 7.0

ONOTE is a multi-format benchmark that applies a deterministic pipeline to expose a disconnect between perceptual accuracy and music-theoretic comprehension in leading omnimodal AI models.

LASER: A Data-Centric Method for Low-Cost and Efficient SQL Rewriting based on SQL-GRPO

cs.DB · 2026-04-08 · unverdicted · novelty 7.0

LASER generates complex slow-query training data with MCTS and aligns small models via SQL-GRPO to deliver efficient, low-cost SQL rewriting that outperforms rules and large models.

An Iterative Test-and-Repair Framework for Competitive Code Generation

cs.SE · 2026-04-07 · unverdicted · novelty 7.0

FixAudit improves LLM code generation on competitive programming benchmarks by training a shared model for iterative code-aware test generation and repair, achieving 35%+ gains in Pass@1 over baselines on the same 7B model.

DocShield: Towards AI Document Safety via Evidence-Grounded Agentic Reasoning

cs.CV · 2026-04-03 · unverdicted · novelty 7.0

DocShield presents a new agentic reasoning framework using Cross-Cues-aware Chain of Thought to detect, localize, and explain text-centric forgeries in documents, with reported F1 gains of 41.4% over specialized methods and 23.4% over GPT-4o on T-IC13.

Do MLLMs Really Understand Space? A Mathematical Reasoning Evaluation

cs.AI · 2026-02-12 · unverdicted · novelty 6.0

MLLMs show a large gap in spatial mathematical reasoning compared to humans, and a new 10,000-problem dataset helps narrow it through training.

Automated Knowledge Component Generation for Interpretable Knowledge Tracing in Coding Problems

cs.AI · 2025-02-25 · unverdicted · novelty 6.0

An LLM pipeline generates knowledge components for coding problems, enabling KCGen-KT to outperform existing KT methods and human-written KCs on student response prediction across two datasets.

Agentless: Demystifying LLM-based Software Engineering Agents

cs.SE · 2024-07-01 · conditional · novelty 6.0

Agentless, a basic three-phase LLM pipeline for bug localization, repair, and validation, outperforms complex open-source agents on SWE-bench Lite with 32% success rate at $0.70 cost.

A Survey on Large Language Models for Code Generation

cs.CL · 2024-06-01 · unverdicted · novelty 3.0

A systematic literature review that organizes recent work on LLMs for code generation into a taxonomy covering data curation, model advances, evaluations, ethics, environmental impact, and applications, with benchmark comparisons.

citing papers explorer

Showing 8 of 8 citing papers.

ONOTE: Benchmarking Omnimodal Notation Processing for Expert-level Music Intelligence cs.SD · 2026-04-22 · unverdicted · none · ref 37
ONOTE is a multi-format benchmark that applies a deterministic pipeline to expose a disconnect between perceptual accuracy and music-theoretic comprehension in leading omnimodal AI models.
LASER: A Data-Centric Method for Low-Cost and Efficient SQL Rewriting based on SQL-GRPO cs.DB · 2026-04-08 · unverdicted · none · ref 28
LASER generates complex slow-query training data with MCTS and aligns small models via SQL-GRPO to deliver efficient, low-cost SQL rewriting that outperforms rules and large models.
An Iterative Test-and-Repair Framework for Competitive Code Generation cs.SE · 2026-04-07 · unverdicted · none · ref 40
FixAudit improves LLM code generation on competitive programming benchmarks by training a shared model for iterative code-aware test generation and repair, achieving 35%+ gains in Pass@1 over baselines on the same 7B model.
DocShield: Towards AI Document Safety via Evidence-Grounded Agentic Reasoning cs.CV · 2026-04-03 · unverdicted · none · ref 26
DocShield presents a new agentic reasoning framework using Cross-Cues-aware Chain of Thought to detect, localize, and explain text-centric forgeries in documents, with reported F1 gains of 41.4% over specialized methods and 23.4% over GPT-4o on T-IC13.
Do MLLMs Really Understand Space? A Mathematical Reasoning Evaluation cs.AI · 2026-02-12 · unverdicted · none · ref 22
MLLMs show a large gap in spatial mathematical reasoning compared to humans, and a new 10,000-problem dataset helps narrow it through training.
Automated Knowledge Component Generation for Interpretable Knowledge Tracing in Coding Problems cs.AI · 2025-02-25 · unverdicted · none · ref 27
An LLM pipeline generates knowledge components for coding problems, enabling KCGen-KT to outperform existing KT methods and human-written KCs on student response prediction across two datasets.
Agentless: Demystifying LLM-based Software Engineering Agents cs.SE · 2024-07-01 · conditional · none · ref 75
Agentless, a basic three-phase LLM pipeline for bug localization, repair, and validation, outperforms complex open-source agents on SWE-bench Lite with 32% success rate at $0.70 cost.
A Survey on Large Language Models for Code Generation cs.CL · 2024-06-01 · unverdicted · none · ref 201
A systematic literature review that organizes recent work on LLMs for code generation into a taxonomy covering data curation, model advances, evaluations, ethics, environmental impact, and applications, with benchmark comparisons.

Title resolution pending

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer