Under Review

11 Preprint · 2025 · arXiv 2505.01592

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

representative citing papers

Evaluation of LLM-Based Software Engineering Tools: Practices, Challenges, and Future Directions

cs.SE · 2026-04-27 · unverdicted · novelty 4.0

LLM-based SE tools lack stable ground truth and deterministic outputs, making standard evaluation assumptions invalid and requiring new approaches for reliable assessment.

Beyond the Final Answer: Evaluating the Reasoning Trajectories of Tool-Augmented Agents

cs.AI · 2025-10-03

citing papers explorer

Showing 2 of 2 citing papers.

Evaluation of LLM-Based Software Engineering Tools: Practices, Challenges, and Future Directions cs.SE · 2026-04-27 · unverdicted · none · ref 21
LLM-based SE tools lack stable ground truth and deterministic outputs, making standard evaluation assumptions invalid and requiring new approaches for reliable assessment.
Beyond the Final Answer: Evaluating the Reasoning Trajectories of Tool-Augmented Agents cs.AI · 2025-10-03 · unreviewed · ref 8

Under Review

fields

years

verdicts

representative citing papers

citing papers explorer