hub

Science , volume=

Competition-level code generation with alphacode , author= · 2022

14 Pith papers cite this work. Polarity classification is still indexing.

14 Pith papers citing it

browse 14 citing papers

hub tools

JSON dossier citing papers JSON

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

BOOKMARKS: Efficient Active Storyline Memory for Role-playing

cs.CL · 2026-05-13 · unverdicted · novelty 7.0

BOOKMARKS introduces searchable bookmarks as reusable answers to storyline questions, enabling active initialization and passive synchronization for more consistent role-playing agent memory than recurrent summarization.

Self-Consistency from Only Two Samples: CoT-PoT Ensembling for Efficient LLM Reasoning

cs.CL · 2026-04-19 · unverdicted · novelty 7.0

CoT-PoT ensembling achieves self-consistency accuracy in LLMs with only two samples for 78.6% of tasks, reducing computation by 9.3x compared to standard methods.

Viverra: Text-to-Code with Guarantees

cs.SE · 2026-05-14 · unverdicted · novelty 6.0

Viverra generates C code from text descriptions together with assertions that are verified by model checkers, and a user study with over 400 participants shows the verified assertions improve code comprehension.

PaT: Planning-after-Trial for Efficient Test-Time Code Generation

cs.CL · 2026-05-08 · unverdicted · novelty 6.0

PaT defers planning until after failed trials in LLM code generation, enabling heterogeneous cheap-plus-powerful model setups that match large-model performance at roughly 69% lower cost.

Programmatic Context Augmentation for LLM-based Symbolic Regression

cs.AI · 2026-05-04 · unverdicted · novelty 6.0

Programmatic context augmentation lets LLM-based symbolic regression perform code-driven data analysis during search, yielding superior efficiency and accuracy over baselines on LLM-SRBench.

The Model Knows, the Decoder Finds: Future Value Guided Particle Power Sampling

cs.AI · 2026-05-04 · unverdicted · novelty 6.0 · 2 refs

APPS approximates power targets p(x)^alpha via parallel particle propagation with proposal-corrected reweighting and future-value-guided selection at block boundaries, improving accuracy-runtime trade-offs in training-free decoding.

FUSE: Ensembling Verifiers with Zero Labeled Data

stat.ML · 2026-04-20 · unverdicted · novelty 6.0

FUSE ensembles verifiers unsupervisedly by controlling their conditional dependencies to improve spectral ensembling algorithms, matching or exceeding semi-supervised baselines on benchmarks including GPQA Diamond and Humanity's Last Exam.

SOCIA-EVO: Automated Simulator Construction via Dual-Anchored Bi-Level Optimization

cs.AI · 2026-04-19 · unverdicted · novelty 6.0

SOCIA-EVO generates statistically consistent simulators by separating structural refinement from parameter calibration via bi-level optimization and falsifying strategies through execution feedback in a Bayesian-weighted playbook.

ShinkaEvolve: Towards Open-Ended And Sample-Efficient Program Evolution

cs.CL · 2025-09-17 · unverdicted · novelty 6.0

ShinkaEvolve improves sample efficiency in LLM-driven program evolution via parent sampling, code novelty rejection-sampling, and bandit LLM ensemble selection, achieving new SOTA circle packing with 150 samples and gains on math reasoning and competitive programming tasks.

OS-ATLAS: A Foundation Action Model for Generalist GUI Agents

cs.CL · 2024-10-30 · unverdicted · novelty 6.0

OS-Atlas, trained on the largest open-source cross-platform GUI grounding corpus of 13 million elements, outperforms prior open-source models on six benchmarks across mobile, desktop, and web platforms.

Teaching Large Language Models to Self-Debug

cs.CL · 2023-04-11 · unverdicted · novelty 6.0

Self-Debugging teaches LLMs to identify and fix their own code errors through rubber-duck-style natural language explanations and execution feedback, delivering 2-12% gains over baselines on Spider, TransCoder, and MBPP.

Exploring Pass-Rate Reward in Reinforcement Learning for Code Generation

cs.LG · 2026-05-01 · unverdicted · novelty 4.0

Pass-rate rewards in critic-free RL for code generation fail to outperform binary rewards because partial-pass solutions induce conflicting gradient directions that do not consistently favor full correctness.

Evidence of a Cognitive Shift in AI Education: How Students Are Rethinking Human Intelligence?

cs.CY · 2026-04-14 · unverdicted · novelty 4.0

Longitudinal poll data from 471 students in AI courses shows a shift toward preferring human intelligence, reaching 65% in technical courses and 90% in design courses by 2026.

AgentLens: Revealing The Lucky Pass Problem in SWE-Agent Evaluation

cs.SE · 2026-05-13

citing papers explorer

Showing 14 of 14 citing papers.

BOOKMARKS: Efficient Active Storyline Memory for Role-playing cs.CL · 2026-05-13 · unverdicted · none · ref 76
BOOKMARKS introduces searchable bookmarks as reusable answers to storyline questions, enabling active initialization and passive synchronization for more consistent role-playing agent memory than recurrent summarization.
Self-Consistency from Only Two Samples: CoT-PoT Ensembling for Efficient LLM Reasoning cs.CL · 2026-04-19 · unverdicted · none · ref 39
CoT-PoT ensembling achieves self-consistency accuracy in LLMs with only two samples for 78.6% of tasks, reducing computation by 9.3x compared to standard methods.
Viverra: Text-to-Code with Guarantees cs.SE · 2026-05-14 · unverdicted · none · ref 2
Viverra generates C code from text descriptions together with assertions that are verified by model checkers, and a user study with over 400 participants shows the verified assertions improve code comprehension.
PaT: Planning-after-Trial for Efficient Test-Time Code Generation cs.CL · 2026-05-08 · unverdicted · none · ref 20
PaT defers planning until after failed trials in LLM code generation, enabling heterogeneous cheap-plus-powerful model setups that match large-model performance at roughly 69% lower cost.
Programmatic Context Augmentation for LLM-based Symbolic Regression cs.AI · 2026-05-04 · unverdicted · none · ref 33
Programmatic context augmentation lets LLM-based symbolic regression perform code-driven data analysis during search, yielding superior efficiency and accuracy over baselines on LLM-SRBench.
The Model Knows, the Decoder Finds: Future Value Guided Particle Power Sampling cs.AI · 2026-05-04 · unverdicted · none · ref 24 · 2 links
APPS approximates power targets p(x)^alpha via parallel particle propagation with proposal-corrected reweighting and future-value-guided selection at block boundaries, improving accuracy-runtime trade-offs in training-free decoding.
FUSE: Ensembling Verifiers with Zero Labeled Data stat.ML · 2026-04-20 · unverdicted · none · ref 30
FUSE ensembles verifiers unsupervisedly by controlling their conditional dependencies to improve spectral ensembling algorithms, matching or exceeding semi-supervised baselines on benchmarks including GPQA Diamond and Humanity's Last Exam.
SOCIA-EVO: Automated Simulator Construction via Dual-Anchored Bi-Level Optimization cs.AI · 2026-04-19 · unverdicted · none · ref 61
SOCIA-EVO generates statistically consistent simulators by separating structural refinement from parameter calibration via bi-level optimization and falsifying strategies through execution feedback in a Bayesian-weighted playbook.
ShinkaEvolve: Towards Open-Ended And Sample-Efficient Program Evolution cs.CL · 2025-09-17 · unverdicted · none · ref 26
ShinkaEvolve improves sample efficiency in LLM-driven program evolution via parent sampling, code novelty rejection-sampling, and bandit LLM ensemble selection, achieving new SOTA circle packing with 150 samples and gains on math reasoning and competitive programming tasks.
OS-ATLAS: A Foundation Action Model for Generalist GUI Agents cs.CL · 2024-10-30 · unverdicted · none · ref 54
OS-Atlas, trained on the largest open-source cross-platform GUI grounding corpus of 13 million elements, outperforms prior open-source models on six benchmarks across mobile, desktop, and web platforms.
Teaching Large Language Models to Self-Debug cs.CL · 2023-04-11 · unverdicted · none · ref 38
Self-Debugging teaches LLMs to identify and fix their own code errors through rubber-duck-style natural language explanations and execution feedback, delivering 2-12% gains over baselines on Spider, TransCoder, and MBPP.
Exploring Pass-Rate Reward in Reinforcement Learning for Code Generation cs.LG · 2026-05-01 · unverdicted · none · ref 1
Pass-rate rewards in critic-free RL for code generation fail to outperform binary rewards because partial-pass solutions induce conflicting gradient directions that do not consistently favor full correctness.
Evidence of a Cognitive Shift in AI Education: How Students Are Rethinking Human Intelligence? cs.CY · 2026-04-14 · unverdicted · none · ref 11
Longitudinal poll data from 471 students in AI courses shows a shift toward preferring human intelligence, reaching 65% in technical courses and 90% in design courses by 2026.
AgentLens: Revealing The Lucky Pass Problem in SWE-Agent Evaluation cs.SE · 2026-05-13 · unreviewed · ref 15

Science , volume=

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer