arXiv preprint arXiv:2507.23361 , year=

· 2025 · arXiv 2507.23361

9 Pith papers cite this work. Polarity classification is still indexing.

9 Pith papers citing it

read on arXiv browse 9 citing papers

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

Dynamic analysis enhances issue resolution

cs.SE · 2026-03-23 · conditional · novelty 7.0

DAIRA integrates dynamic tracing into LLM agents to achieve 79.4% resolution rate on SWE-bench Verified for code defect repair.

LMEB: Long-horizon Memory Embedding Benchmark

cs.CL · 2026-03-13 · unverdicted · novelty 7.0

LMEB benchmark shows that embedding models' performance on traditional retrieval does not transfer to long-horizon memory tasks, larger models do not always perform better, and LMEB measures capabilities orthogonal to MTEB.

CodeOCR: On the Effectiveness of Vision Language Models in Code Understanding

cs.CL · 2026-02-02 · unverdicted · novelty 7.0

Multimodal LLMs process code as images to achieve up to 8x token compression, with visual cues like syntax highlighting aiding tasks and clone detection remaining resilient or even improving under compression.

In Line with Context: Repository-Level Code Generation via Context Inlining

cs.SE · 2026-01-01 · unverdicted · novelty 7.0

InlineCoder reframes repository-level code generation as function-level coding by using a draft anchor to inline the target function into its call graph for upstream usage and downstream dependency context.

SWE-QA: Can Language Models Answer Repository-level Code Questions?

cs.CL · 2025-09-18 · unverdicted · novelty 7.0

SWE-QA creates a new repository-level code QA benchmark with 576 pairs and an agentic LLM framework, showing promise but open challenges for models handling complex codebases.

Is One Score Enough? Rethinking the Evaluation of Sequentially Evolving LLM Memory

cs.LG · 2026-05-14 · unverdicted · novelty 6.0

SeqMem-Eval reveals that high final accuracy in sequential LLM memory tasks often coexists with substantial forgetting and negative transfer, exposing stability-adaptability trade-offs hidden by standard aggregate metrics.

Step Rejection Fine-Tuning: A Practical Distillation Recipe

cs.LG · 2026-05-11 · unverdicted · novelty 6.0

Step Rejection Fine-Tuning masks loss on erroneous steps identified by a critic LLM in unresolved trajectories, raising SWE-bench Verified resolution rate by 3.7% to 32.2% versus 2.4% for trajectory-level rejection.

Beyond Fixed Tests: Repository-Level Issue Resolution as Coevolution of Code and Behavioral Constraints

cs.SE · 2026-04-06 · unverdicted · novelty 6.0

Agent-CoEvo is a multi-agent LLM framework that coevolves code patches and test patches to resolve repository-level issues, outperforming fixed-test baselines on SWE-bench Lite and SWT-bench Lite.

Externalization in LLM Agents: A Unified Review of Memory, Skills, Protocols and Harness Engineering

cs.SE · 2026-04-09 · accept · novelty 5.0

LLM agent progress depends on externalizing cognitive functions into memory, skills, protocols, and harness engineering that coordinates them reliably.

citing papers explorer

Showing 9 of 9 citing papers.

Dynamic analysis enhances issue resolution cs.SE · 2026-03-23 · conditional · none · ref 4
DAIRA integrates dynamic tracing into LLM agents to achieve 79.4% resolution rate on SWE-bench Verified for code defect repair.
LMEB: Long-horizon Memory Embedding Benchmark cs.CL · 2026-03-13 · unverdicted · none · ref 9
LMEB benchmark shows that embedding models' performance on traditional retrieval does not transfer to long-horizon memory tasks, larger models do not always perform better, and LMEB measures capabilities orthogonal to MTEB.
CodeOCR: On the Effectiveness of Vision Language Models in Code Understanding cs.CL · 2026-02-02 · unverdicted · none · ref 21
Multimodal LLMs process code as images to achieve up to 8x token compression, with visual cues like syntax highlighting aiding tasks and clone detection remaining resilient or even improving under compression.
In Line with Context: Repository-Level Code Generation via Context Inlining cs.SE · 2026-01-01 · unverdicted · none · ref 6
InlineCoder reframes repository-level code generation as function-level coding by using a draft anchor to inline the target function into its call graph for upstream usage and downstream dependency context.
SWE-QA: Can Language Models Answer Repository-level Code Questions? cs.CL · 2025-09-18 · unverdicted · none · ref 5
SWE-QA creates a new repository-level code QA benchmark with 576 pairs and an agentic LLM framework, showing promise but open challenges for models handling complex codebases.
Is One Score Enough? Rethinking the Evaluation of Sequentially Evolving LLM Memory cs.LG · 2026-05-14 · unverdicted · none · ref 6
SeqMem-Eval reveals that high final accuracy in sequential LLM memory tasks often coexists with substantial forgetting and negative transfer, exposing stability-adaptability trade-offs hidden by standard aggregate metrics.
Step Rejection Fine-Tuning: A Practical Distillation Recipe cs.LG · 2026-05-11 · unverdicted · none · ref 7
Step Rejection Fine-Tuning masks loss on erroneous steps identified by a critic LLM in unresolved trajectories, raising SWE-bench Verified resolution rate by 3.7% to 32.2% versus 2.4% for trajectory-level rejection.
Beyond Fixed Tests: Repository-Level Issue Resolution as Coevolution of Code and Behavioral Constraints cs.SE · 2026-04-06 · unverdicted · none · ref 11
Agent-CoEvo is a multi-agent LLM framework that coevolves code patches and test patches to resolve repository-level issues, outperforming fixed-test baselines on SWE-bench Lite and SWT-bench Lite.
Externalization in LLM Agents: A Unified Review of Memory, Skills, Protocols and Harness Engineering cs.SE · 2026-04-09 · accept · none · ref 19
LLM agent progress depends on externalizing cognitive functions into memory, skills, protocols, and harness engineering that coordinates them reliably.

arXiv preprint arXiv:2507.23361 , year=

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer