BioCon is the first benchmark dataset and cross-modal framework for detecting inconsistencies between methodological descriptions in bioinformatics papers and their code implementations.
Towards realistic project-level code generation via multi-agent collaboration and semantic architecture modeling
5 Pith papers cite this work. Polarity classification is still indexing.
years
2026 5verdicts
UNVERDICTED 5representative citing papers
R2ABench benchmark shows LLMs generate syntactically valid software architectures from requirements but produce structurally fragmented results due to weak relational reasoning.
CodeTeam is an LLM multi-agent system that improves SketchBLEU by 4.1/2.9 points and achieves top test pass rates (34.6% PE, 42.3% SFT) on repository-level code generation benchmarks via role-specialized planning and implementation stages.
Co-Coder partitions code dependency graphs via community detection to orchestrate multi-agent LLM coding, improving pass rates up to 14%, wall-clock speedup up to 2.1x, and cutting API cost up to 35% on dependency-dense tasks.
Contract-Coding projects ambiguous intents into formal Language Contracts as a single source of truth to enable more reliable repo-level code generation, reporting 47% functional success on the Greenfield-5 benchmark.
citing papers explorer
-
Do Papers Tell the Whole Story? A Benchmark and Framework for Uncovering Hidden Implementation Gaps in Bioinformatics
BioCon is the first benchmark dataset and cross-modal framework for detecting inconsistencies between methodological descriptions in bioinformatics papers and their code implementations.
-
Benchmarking Requirement-to-Architecture Generation with Hybrid Evaluation
R2ABench benchmark shows LLMs generate syntactically valid software architectures from requirements but produce structurally fragmented results due to weak relational reasoning.
-
CodeTeam: An LLM-Powered Multi-Agent Framework for Repository-Level Code Generation
CodeTeam is an LLM multi-agent system that improves SketchBLEU by 4.1/2.9 points and achieves top test pass rates (34.6% PE, 42.3% SFT) on repository-level code generation benchmarks via role-specialized planning and implementation stages.
-
When Parallelism Pays Off: Cohesion-Aware Task Partitioning for Multi-Agent Coding
Co-Coder partitions code dependency graphs via community detection to orchestrate multi-agent LLM coding, improving pass rates up to 14%, wall-clock speedup up to 2.1x, and cutting API cost up to 35% on dependency-dense tasks.
-
Contract-Coding: Towards Repo-Level Generation via Structured Symbolic Paradigm
Contract-Coding projects ambiguous intents into formal Language Contracts as a single source of truth to enable more reliable repo-level code generation, reporting 47% functional success on the Greenfield-5 benchmark.