Codegemma: Open code models based on gemma,

· 2024 · arXiv 2406.11409

9 Pith papers cite this work. Polarity classification is still indexing.

9 Pith papers citing it

read on arXiv browse 9 citing papers

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

SynConfRoute: Syntax-Aware Routing for Efficient Code Completion with Small CodeLLMs

cs.SE · 2026-05-06 · unverdicted · novelty 6.0

SynConfRoute routes code completions using syntax validation and token confidence, improving pass@1 by up to 31% on hard tasks and reducing accelerator usage by 58% versus always using the largest model.

Co-Located Tests, Better AI Code: How Test Syntax Structure Affects Foundation Model Code Generation

cs.SE · 2026-04-20 · unverdicted · novelty 6.0

Co-locating tests with implementation code yields substantially higher preservation and correctness in foundation-model-generated programs than separated test syntax.

RefineStat: Efficient Exploration for Probabilistic Program Synthesis

cs.LG · 2025-09-01 · unverdicted · novelty 6.0

RefineStat improves small language model performance on probabilistic program synthesis by adding semantic constraint enforcement and diagnostic-aware refinement, producing syntactically and statistically reliable code that often matches larger models.

Are Large Language Models Robust in Understanding Code Against Semantics-Preserving Mutations?

cs.SE · 2025-05-15 · conditional · novelty 6.0

LLMs achieve strong initial accuracy on code output prediction but frequently alter their answers under semantics-preserving mutations, with drops up to 70% and flawed reasoning detected in 10-50% of correct cases via human review.

MultiFileTest: A Multi-File-Level LLM Unit Test Generation Benchmark and Impact of Error Fixing Mechanisms

cs.SE · 2025-02-10 · unverdicted · novelty 6.0

Frontier LLMs achieve only moderate performance on multi-file unit test generation, with basic executability and cascade errors common, but manual and self-error-fixing mechanisms yield measurable gains.

Training Language Models to Self-Correct via Reinforcement Learning

cs.LG · 2024-09-19 · unverdicted · novelty 6.0

SCoRe uses multi-turn online RL with regularization on self-generated traces to improve LLM self-correction, achieving 15.6% and 9.1% gains on MATH and HumanEval for Gemini models.

Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities

cs.CL · 2025-07-07 · unverdicted · novelty 4.0

Gemini 2.5 Pro and Flash models are presented as achieving frontier performance in reasoning, coding, and long-context multimodal tasks while spanning a cost-capability Pareto curve.

Are Decoder-Only Large Language Models the Silver Bullet for Code Search?

cs.SE · 2024-10-29 · unverdicted · novelty 4.0

Fine-tuned decoder-only LLMs achieve up to 40.4% higher MAP than UniXcoder on CoSQA+ for code search, with non-monotonic size scaling and data composition sensitivity.

mcdok at SemEval-2026 Task 13: Finetuning LLMs for Detection of Machine-Generated Code

cs.LG · 2026-04-23 · unverdicted · novelty 2.0

Fine-tuning LLMs by adapting the mdok approach produces competitive results on binary detection, source attribution, and hybrid/adversarial code identification in SemEval-2026 Task 13.

citing papers explorer

Showing 9 of 9 citing papers.

SynConfRoute: Syntax-Aware Routing for Efficient Code Completion with Small CodeLLMs cs.SE · 2026-05-06 · unverdicted · none · ref 9
SynConfRoute routes code completions using syntax validation and token confidence, improving pass@1 by up to 31% on hard tasks and reducing accelerator usage by 58% versus always using the largest model.
Co-Located Tests, Better AI Code: How Test Syntax Structure Affects Foundation Model Code Generation cs.SE · 2026-04-20 · unverdicted · none · ref 8
Co-locating tests with implementation code yields substantially higher preservation and correctness in foundation-model-generated programs than separated test syntax.
RefineStat: Efficient Exploration for Probabilistic Program Synthesis cs.LG · 2025-09-01 · unverdicted · none · ref 46
RefineStat improves small language model performance on probabilistic program synthesis by adding semantic constraint enforcement and diagnostic-aware refinement, producing syntactically and statistically reliable code that often matches larger models.
Are Large Language Models Robust in Understanding Code Against Semantics-Preserving Mutations? cs.SE · 2025-05-15 · conditional · none · ref 2
LLMs achieve strong initial accuracy on code output prediction but frequently alter their answers under semantics-preserving mutations, with drops up to 70% and flawed reasoning detected in 10-50% of correct cases via human review.
MultiFileTest: A Multi-File-Level LLM Unit Test Generation Benchmark and Impact of Error Fixing Mechanisms cs.SE · 2025-02-10 · unverdicted · none · ref 25
Frontier LLMs achieve only moderate performance on multi-file unit test generation, with basic executability and cascade errors common, but manual and self-error-fixing mechanisms yield measurable gains.
Training Language Models to Self-Correct via Reinforcement Learning cs.LG · 2024-09-19 · unverdicted · none · ref 32
SCoRe uses multi-turn online RL with regularization on self-generated traces to improve LLM self-correction, achieving 15.6% and 9.1% gains on MATH and HumanEval for Gemini models.
Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities cs.CL · 2025-07-07 · unverdicted · none · ref 15
Gemini 2.5 Pro and Flash models are presented as achieving frontier performance in reasoning, coding, and long-context multimodal tasks while spanning a cost-capability Pareto curve.
Are Decoder-Only Large Language Models the Silver Bullet for Code Search? cs.SE · 2024-10-29 · unverdicted · none · ref 64
Fine-tuned decoder-only LLMs achieve up to 40.4% higher MAP than UniXcoder on CoSQA+ for code search, with non-monotonic size scaling and data composition sensitivity.
mcdok at SemEval-2026 Task 13: Finetuning LLMs for Detection of Machine-Generated Code cs.LG · 2026-04-23 · unverdicted · none · ref 7
Fine-tuning LLMs by adapting the mdok approach produces competitive results on binary detection, source attribution, and hybrid/adversarial code identification in SemEval-2026 Task 13.

Codegemma: Open code models based on gemma,

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer