hub Canonical reference

Test-driven development and llm-based code generation

· 2024 · arXiv 1620.369552

Canonical reference. 100% of citing Pith papers cite this work as background.

16 Pith papers citing it

Background 100% of classified citations

read on arXiv browse 16 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 5

citation-polarity summary

background 5

representative citing papers

Trustworthy Software Project Generation : a Case Study with an Interactive Theorem Prover

cs.SE · 2026-05-25 · conditional · novelty 7.0

An LLM agent with Rocq backend automatically builds a verified RISC-V RV32I interpreter (1859 lines Rocq, 2848 lines extracted C++) that passes 265 tests and 12-hour fuzzing, while a Dafny backend fails.

From Runnable to Shippable: Multi-Agent Test-Driven Development for Generating Full-Stack Web Applications from Requirements

cs.SE · 2026-05-17 · unverdicted · novelty 7.0

TDDev automates the full TDD loop for web app generation from requirements, delivering 34-48 percentage point quality gains and zero manual intervention in user studies.

SkillOps: Managing LLM Agent Skill Libraries as Self-Maintaining Software Ecosystems

cs.SE · 2026-05-13 · unverdicted · novelty 7.0

SkillOps maintains LLM skill libraries via Skill Contracts and ecosystem graphs, raising ALFWorld task success to 79.5% as a standalone agent and improving retrieval baselines by up to 2.9 points with near-zero library-time LLM cost.

A Learning Method for Symbolic Systems Using Large Language Models

cs.SE · 2026-05-09 · unverdicted · novelty 7.0

LLM2Ltac mines symbolic tactics from 11,725 Coq theorems using LLMs and integrates them into CoqHammer, improving proof rates by 23.87% on 6,199 theorems from four large verification projects.

Planning to Hammer: Difficulty-Aware Decomposition for Automating Rocq Proofs

cs.SE · 2026-06-16 · unverdicted · novelty 6.0

Quarry improves Rocq proof automation success rates by 7-13% under 10-minute budgets via LLM-planned decompositions ranked by a proof-state difficulty model for CoqHammer solvability.

Inferring Code Correctness from Specification

cs.SE · 2026-05-28 · unverdicted · novelty 6.0

TRAILS infers code correctness by aggregating LLM judgments on input-output pairs from category-partitioned specification tests, improving MCC by up to 39% over Zero-Shot COT on LiveCodeBench and CoCoClaNeL.

Combined Program Analysis Techniques: A Systematic Mapping Study

cs.SE · 2026-05-19 · unverdicted · novelty 6.0

A systematic mapping study of 248 papers introduces a taxonomy of synergistic effects, inter-analysis workflows, and mapping functions to catalog patterns in combined program analysis techniques.

Guiding LLM-based Loop Invariant Synthesis via Feedback on Local Reasoning Errors

cs.PL · 2026-05-18 · unverdicted · novelty 6.0

LORIS detects local reasoning errors in LLM-generated proofs for loop invariants by translating natural-language steps to first-order logic implications and using invalid implications to refine the invariants, achieving 93.1% success on 460 C programs.

Babbling Suppression: Making LLMs Greener One Token at a Time

cs.SE · 2026-04-08 · unverdicted · novelty 6.0

Babbling Suppression stops LLM code generation upon test passage to reduce token output and energy consumption by up to 65% across Python and Java benchmarks.

LLM vs. Human Unit Tests: Fault Detection on Real Python Bugs

cs.SE · 2026-06-07 · unverdicted · novelty 5.0

LLM-generated unit tests with retrieval-augmented context detect faults in 69% of real Python bugs versus 17.2% for general-purpose human-written tests, with similar coverage levels.

Extraction and Search in Rocq: Theorems, Definitions and Their dependencies

cs.SE · 2026-06-03 · conditional · novelty 5.0

TheoremExtr extracts 71,795 theorems with dependencies and 27,481 definitions from 32 Rocq projects and provides a cross-project similarity search website.

Boosting Automatic Java-to-Cangjie Translation with Multi-Stage LLM Training and Error Repair

cs.SE · 2026-05-08 · unverdicted · novelty 5.0

Multi-stage LLM training plus compiler-guided error repair boosts functional equivalence in Java-to-Cangjie translation by 6.06% over prior methods despite scarce parallel data.

Walking the Tightrope of LLMs for Software Development: A Practitioners' Perspective

cs.SE · 2025-11-09 · conditional · novelty 5.0

Qualitative interview study with 22 practitioners identifies multi-level benefits, challenges, and mitigation strategies for using LLMs in software development.

An Exploratory Case Study of LLM-Assisted Refactoring and Gameplay Feature Generation in an Endless Runner Game

cs.SE · 2026-06-19 · unverdicted · novelty 4.0

GPT-4o successfully completed all three refactoring tasks but only one of three gameplay feature generation tasks in the studied endless runner game.

Evaluating LLM-Generated ACSL Annotations for Formal Verification

cs.SE · 2026-02-14 · unverdicted · novelty 4.0

Rule-based annotation generation for ACSL outperforms LLM-based methods in achieving successful formal verification of C programs.

Search-Based Software Engineering and AI Foundation Models: Current Landscape and Future Roadmap

cs.SE · 2025-05-26 · unverdicted · novelty 4.0

A research roadmap analyzing the current state of search-based software engineering with foundation models, outlining challenges and directions across three integration aspects.

citing papers explorer

Showing 1 of 1 citing paper after filters.

Search-Based Software Engineering and AI Foundation Models: Current Landscape and Future Roadmap cs.SE · 2025-05-26 · unverdicted · none · ref 196
A research roadmap analyzing the current state of search-based software engineering with foundation models, outlining challenges and directions across three integration aspects.

Test-driven development and llm-based code generation

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer