Canonical reference

Title resolution pending

Aman Madaan, Niket Tandon, Prakhar Gupta, Skyler Hallinan, Luyu Gao, Sarah Wiegreffe, Uri Alon, Nouha Dziri, Shrimai Prabhumoye, Yiming Yang, et al

Canonical reference. 80% of citing Pith papers cite this work as background.

19 Pith papers citing it

Background 80% of classified citations

browse 19 citing papers

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

citation-role summary

background 4 baseline 1

citation-polarity summary

background 4 baseline 1

representative citing papers

SciFigDetect: A Benchmark for AI-Generated Scientific Figure Detection

cs.CV · 2026-04-09 · unverdicted · novelty 8.0

The first benchmark for AI-generated scientific figure detection shows existing detectors fail in zero-shot transfer, overfit to specific generators, and break under common image corruptions.

X-SYNTH: Beyond Retrieval -- Enterprise Context Synthesis from Observed Digital Human Attention

cs.AI · 2026-05-15 · unverdicted · novelty 7.0 · 2 refs

X-SYNTH synthesizes enterprise context from digital human attention using Digital Twin Signatures and seven attention filters, raising true lead rate from 9.5% to 61.9% while cutting false lead rate to 18.8%.

SRTJ: Self-Evolving Rule-Driven Training-Free LLM Jailbreaking

cs.CR · 2026-05-01 · unverdicted · novelty 7.0

SRTJ is a training-free jailbreak method that evolves hierarchical attack rules using iterative verifier feedback and ASP-based constraint-aware composition to achieve stable high success rates on HarmBench across multiple LLMs.

"What Are You Really Trying to Do?": Co-Creating Life Goals from Everyday Computer Use

cs.HC · 2026-05-01 · unverdicted · novelty 7.0

A co-creation process for inferring and refining personal strivings from computer activity logs yields more representative goals and higher user agency than baselines in a 14-person week-long study.

Discovering Agentic Safety Specifications from 1-Bit Danger Signals

cs.AI · 2026-04-25 · unverdicted · novelty 7.0

LLM agents autonomously evolve human-readable safety specifications from sparse 1-bit danger signals, outperforming reward-based reflection that encourages reward hacking.

SAGE: A Service Agent Graph-guided Evaluation Benchmark

cs.AI · 2026-04-10 · unverdicted · novelty 7.0

SAGE is a new multi-agent benchmark that formalizes service SOPs as dynamic dialogue graphs to measure LLM agents on logical compliance and path coverage, uncovering an execution gap and empathy resilience across 27 models in 6 scenarios.

Toward Executable Repository-Level Code Generation via Environment Alignment

cs.SE · 2026-04-04 · unverdicted · novelty 7.0

EnvGraph improves executable repository-level code generation by jointly modeling external dependencies and internal references through a dual-layer environment representation and targeted iterative alignment.

optimize_anything: A Universal API for Optimizing any Text Parameter

cs.CL · 2026-05-19 · unverdicted · novelty 6.0

A universal LLM optimizer for text artifacts achieves SOTA results on six tasks including tripling ARC-AGI accuracy and cutting cloud costs by 40% via cross-task transfer and side information.

PiLLar: Matching for Pivot Table Schema via LLM-guided Monte-Carlo Tree Search

cs.DB · 2026-04-29 · unverdicted · novelty 6.0

PiLLar is the first LLM-guided Monte-Carlo Tree Search framework for joint schema-value matching on pivot tables, achieving 87.94% average accuracy on a new benchmark PTbench derived from real-world domains.

EvoRAG: Making Knowledge Graph-based RAG Automatically Evolve through Feedback-driven Backpropagation

cs.DB · 2026-04-17 · unverdicted · novelty 6.0

EvoRAG adds a feedback-driven backpropagation step that attributes response quality to individual knowledge-graph triplets and updates the graph to raise reasoning accuracy by 7.34 percent over prior KG-RAG methods.

When LLMs Lag Behind: Knowledge Conflicts from Evolving APIs in Code Generation

cs.SE · 2026-04-10 · unverdicted · novelty 6.0

LLMs produce executable code only 42.55% of the time under API evolution without full documentation, improving to 66.36% with structured docs and by 11% more with reasoning strategies, yet outdated patterns persist.

CHORUS: Effort-Aware Multi-Agent Human-AI Collaboration for Professional Translation

cs.HC · 2026-02-22 · unverdicted · novelty 6.0

CHORUS multi-agent system reduced professional translation time by 33.8% while lowering cognitive effort and raising BLEU/COMET scores in a 30-participant within-subject study.

MGA: Memory-Driven GUI Agent for Observation-Centric Interaction

cs.AI · 2025-10-28 · unverdicted · novelty 6.0

MGA is a memory-driven GUI agent that uses an observer for bias-free screen reading and structured memory for compact state transitions to enable efficient long-horizon automation.

SAKE: Self-aware Knowledge Exploitation-Exploration for Grounded Multimodal Named Entity Recognition

cs.IR · 2026-04-22 · unverdicted · novelty 5.0

SAKE is an agentic framework for GMNER that uses uncertainty-based self-awareness and reinforcement learning to balance internal knowledge exploitation with adaptive external exploration.

Test-time Scaling over Perception: Resolving the Grounding Paradox in Thinking with Images

cs.CV · 2026-04-13 · unverdicted · novelty 5.0

TTSP resolves the Grounding Paradox by treating perception as a scalable test-time process that generates, filters, and iteratively refines multiple visual exploration traces, outperforming baselines on high-resolution and multimodal reasoning tasks.

Spec Kit Agents: Context-Grounded Agentic Workflows

cs.SE · 2026-04-07 · unverdicted · novelty 5.0

A multi-agent SDD framework with phase-level context-grounding hooks improves LLM-judged quality by 0.15 points and SWE-bench Lite Pass@1 by 1.7 percent while preserving near-perfect test compatibility.

SPRINT: Scalable and Predictive Intent Refinement for LLM-Enhanced Session-based Recommendation

cs.IR · 2025-08-01 · unverdicted · novelty 5.0

SPRINT refines LLM-generated intents for session-based recommendation via a global intent pool, performance validation, selective LLM invocation during training, and a lightweight intent predictor for scalable inference without LLM calls.

CurEvo: Curriculum-Guided Self-Evolution for Video Understanding

cs.CV · 2026-04-29 · unverdicted · novelty 4.0

CurEvo integrates curriculum guidance into self-evolution to structure autonomous improvement of video understanding models, yielding gains on VideoQA benchmarks.

A Survey of Scaling in Large Language Model Reasoning

cs.AI · 2025-04-02 · unverdicted · novelty 3.0

A survey categorizing scaling in LLM reasoning across input size, steps, rounds, training, and future directions, noting that scaling can negatively affect performance.

citing papers explorer

Showing 19 of 19 citing papers.

SciFigDetect: A Benchmark for AI-Generated Scientific Figure Detection cs.CV · 2026-04-09 · unverdicted · none · ref 25
The first benchmark for AI-generated scientific figure detection shows existing detectors fail in zero-shot transfer, overfit to specific generators, and break under common image corruptions.
X-SYNTH: Beyond Retrieval -- Enterprise Context Synthesis from Observed Digital Human Attention cs.AI · 2026-05-15 · unverdicted · none · ref 44 · 2 links
X-SYNTH synthesizes enterprise context from digital human attention using Digital Twin Signatures and seven attention filters, raising true lead rate from 9.5% to 61.9% while cutting false lead rate to 18.8%.
SRTJ: Self-Evolving Rule-Driven Training-Free LLM Jailbreaking cs.CR · 2026-05-01 · unverdicted · none · ref 25
SRTJ is a training-free jailbreak method that evolves hierarchical attack rules using iterative verifier feedback and ASP-based constraint-aware composition to achieve stable high success rates on HarmBench across multiple LLMs.
"What Are You Really Trying to Do?": Co-Creating Life Goals from Everyday Computer Use cs.HC · 2026-05-01 · unverdicted · none · ref 61
A co-creation process for inferring and refining personal strivings from computer activity logs yields more representative goals and higher user agency than baselines in a 14-person week-long study.
Discovering Agentic Safety Specifications from 1-Bit Danger Signals cs.AI · 2026-04-25 · unverdicted · none · ref 10
LLM agents autonomously evolve human-readable safety specifications from sparse 1-bit danger signals, outperforming reward-based reflection that encourages reward hacking.
SAGE: A Service Agent Graph-guided Evaluation Benchmark cs.AI · 2026-04-10 · unverdicted · none · ref 31
SAGE is a new multi-agent benchmark that formalizes service SOPs as dynamic dialogue graphs to measure LLM agents on logical compliance and path coverage, uncovering an execution gap and empathy resilience across 27 models in 6 scenarios.
Toward Executable Repository-Level Code Generation via Environment Alignment cs.SE · 2026-04-04 · unverdicted · none · ref 20
EnvGraph improves executable repository-level code generation by jointly modeling external dependencies and internal references through a dual-layer environment representation and targeted iterative alignment.
optimize_anything: A Universal API for Optimizing any Text Parameter cs.CL · 2026-05-19 · unverdicted · none · ref 16
A universal LLM optimizer for text artifacts achieves SOTA results on six tasks including tripling ARC-AGI accuracy and cutting cloud costs by 40% via cross-task transfer and side information.
PiLLar: Matching for Pivot Table Schema via LLM-guided Monte-Carlo Tree Search cs.DB · 2026-04-29 · unverdicted · none · ref 43
PiLLar is the first LLM-guided Monte-Carlo Tree Search framework for joint schema-value matching on pivot tables, achieving 87.94% average accuracy on a new benchmark PTbench derived from real-world domains.
EvoRAG: Making Knowledge Graph-based RAG Automatically Evolve through Feedback-driven Backpropagation cs.DB · 2026-04-17 · unverdicted · none · ref 53
EvoRAG adds a feedback-driven backpropagation step that attributes response quality to individual knowledge-graph triplets and updates the graph to raise reasoning accuracy by 7.34 percent over prior KG-RAG methods.
When LLMs Lag Behind: Knowledge Conflicts from Evolving APIs in Code Generation cs.SE · 2026-04-10 · unverdicted · none · ref 33
LLMs produce executable code only 42.55% of the time under API evolution without full documentation, improving to 66.36% with structured docs and by 11% more with reasoning strategies, yet outdated patterns persist.
CHORUS: Effort-Aware Multi-Agent Human-AI Collaboration for Professional Translation cs.HC · 2026-02-22 · unverdicted · none · ref 39
CHORUS multi-agent system reduced professional translation time by 33.8% while lowering cognitive effort and raising BLEU/COMET scores in a 30-participant within-subject study.
MGA: Memory-Driven GUI Agent for Observation-Centric Interaction cs.AI · 2025-10-28 · unverdicted · none · ref 17
MGA is a memory-driven GUI agent that uses an observer for bias-free screen reading and structured memory for compact state transitions to enable efficient long-horizon automation.
SAKE: Self-aware Knowledge Exploitation-Exploration for Grounded Multimodal Named Entity Recognition cs.IR · 2026-04-22 · unverdicted · none · ref 24
SAKE is an agentic framework for GMNER that uses uncertainty-based self-awareness and reinforcement learning to balance internal knowledge exploitation with adaptive external exploration.
Test-time Scaling over Perception: Resolving the Grounding Paradox in Thinking with Images cs.CV · 2026-04-13 · unverdicted · none · ref 22
TTSP resolves the Grounding Paradox by treating perception as a scalable test-time process that generates, filters, and iteratively refines multiple visual exploration traces, outperforming baselines on high-resolution and multimodal reasoning tasks.
Spec Kit Agents: Context-Grounded Agentic Workflows cs.SE · 2026-04-07 · unverdicted · none · ref 21
A multi-agent SDD framework with phase-level context-grounding hooks improves LLM-judged quality by 0.15 points and SWE-bench Lite Pass@1 by 1.7 percent while preserving near-perfect test compatibility.
SPRINT: Scalable and Predictive Intent Refinement for LLM-Enhanced Session-based Recommendation cs.IR · 2025-08-01 · unverdicted · none · ref 23
SPRINT refines LLM-generated intents for session-based recommendation via a global intent pool, performance validation, selective LLM invocation during training, and a lightweight intent predictor for scalable inference without LLM calls.
CurEvo: Curriculum-Guided Self-Evolution for Video Understanding cs.CV · 2026-04-29 · unverdicted · none · ref 52
CurEvo integrates curriculum guidance into self-evolution to structure autonomous improvement of video understanding models, yielding gains on VideoQA benchmarks.
A Survey of Scaling in Large Language Model Reasoning cs.AI · 2025-04-02 · unverdicted · none · ref 129
A survey categorizing scaling in LLM reasoning across input size, steps, rounds, training, and future directions, noting that scaling can negatively affect performance.

Title resolution pending

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer