Title resolution pending

Program Synthesis with Large Language Models , author= · 2021

16 Pith papers cite this work. Polarity classification is still indexing.

16 Pith papers citing it

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

representative citing papers

SWE-bench: Can Language Models Resolve Real-World GitHub Issues?

cs.CL · 2023-10-10 · unverdicted · novelty 8.0

SWE-bench reveals that even top language models like Claude 2 resolve only 1.96% of 2,294 real-world GitHub issues, highlighting a gap in practical coding capabilities.

Discovering Reinforcement Learning Interfaces with Large Language Models

cs.LG · 2026-05-05 · unverdicted · novelty 7.0

LIMEN discovers effective RL interfaces by using LLMs to evolve observation and reward programs together from raw state, guided by policy training success, outperforming single-component optimization.

DARE: Diffusion Language Model Activation Reuse for Efficient Inference

cs.LG · 2026-05-01 · unverdicted · novelty 7.0

DARE reuses up to 87% of attention activations in diffusion LLMs through KV caching and output reuse, delivering 1.2x per-layer latency gains with average performance drops of 1.2-2.0%.

SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution

cs.SE · 2025-02-25 · unverdicted · novelty 7.0

SWE-RL uses RL on software evolution data to train LLMs achieving 41% on SWE-bench Verified with generalization to other reasoning tasks.

RouterBench: A Benchmark for Multi-LLM Routing System

cs.LG · 2024-03-18 · unverdicted · novelty 7.0

RouterBench supplies a standardized benchmark, 405k+ inference dataset, theoretical framework, and comparative analysis for multi-LLM routing systems.

Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads

cs.LG · 2024-01-19 · conditional · novelty 7.0

Medusa augments LLMs with multiple decoding heads and tree-based attention to predict and verify several tokens in parallel, yielding 2.2-3.6x inference speedup via two fine-tuning regimes.

ASPI: Seeking Ambiguity Clarification Amplifies Prompt Injection Vulnerability in LLM Agents

cs.CR · 2026-05-17 · conditional · novelty 6.0

Clarification-seeking in LLM agents amplifies prompt injection attack success from ~2% to over 30% across ten frontier models in a new 728-scenario benchmark.

Uni-Synergy: Bridging Understanding and Generation for Personalized Reasoning via Co-operative Reinforcement Learning

cs.CV · 2026-05-11 · unverdicted · novelty 6.0

Sync-R1 applies cooperative RL with Sync-GRPO and Dynamic Group Scaling to achieve superior cross-task personalized reasoning in multimodal models on the new UnifyBench++ dataset.

Parallel Prefix Verification for Speculative Generation

cs.AI · 2026-05-05 · unverdicted · novelty 6.0

PARSE accelerates LLM inference via parallel semantic prefix verification in a single forward pass, delivering 1.25x-4.3x speedups alone and up to 4.5x when combined with EAGLE-3.

Muon is Scalable for LLM Training

cs.LG · 2025-02-24 · unverdicted · novelty 6.0

Muon optimizer with weight decay and update scaling achieves ~2x efficiency over AdamW for large LLMs, shown via the Moonlight 3B/16B MoE model trained on 5.7T tokens.

Language Models (Mostly) Know What They Know

cs.CL · 2022-07-11 · unverdicted · novelty 6.0

Language models show good calibration when asked to estimate the probability that their own answers are correct, with performance improving as models get larger.

A General Language Assistant as a Laboratory for Alignment

cs.CL · 2021-12-01 · conditional · novelty 6.0

Ranked preference modeling outperforms imitation learning for language model alignment and scales more favorably with model size.

The Efficiency Gap in Byte Modeling

cs.LG · 2026-05-13 · unverdicted · novelty 5.0

Byte modeling incurs greater scaling overhead for masked diffusion than autoregressive models because the diffusion objective destroys local byte contiguity needed to resolve semantics.

SlimQwen: Exploring the Pruning and Distillation in Large MoE Model Pre-training

cs.LG · 2026-05-09 · unverdicted · novelty 5.0 · 2 refs

Pruning pretrained MoE models outperforms training from scratch under fixed budget, different expert compression methods converge after continued training, and progressive pruning plus multi-token KD improves the final 23A2B model.

DACA-GRPO: Denoising-Aware Credit Assignment for Reinforcement Learning in Diffusion Language Models

cs.LG · 2026-05-08 · unverdicted · novelty 5.0

DACA-GRPO adds denoising-aware credit assignment and bias-reduced likelihood estimation to GRPO, delivering consistent gains up to 36.3pp on math, code, constraint, and schema benchmarks for diffusion LLMs.

Learning in the Fisher Subspace: A Guided Initialization for LoRA Fine-Tuning

cs.LG · 2026-05-01

citing papers explorer

Showing 16 of 16 citing papers.

SWE-bench: Can Language Models Resolve Real-World GitHub Issues? cs.CL · 2023-10-10 · unverdicted · none · ref 62
SWE-bench reveals that even top language models like Claude 2 resolve only 1.96% of 2,294 real-world GitHub issues, highlighting a gap in practical coding capabilities.
Discovering Reinforcement Learning Interfaces with Large Language Models cs.LG · 2026-05-05 · unverdicted · none · ref 42
LIMEN discovers effective RL interfaces by using LLMs to evolve observation and reward programs together from raw state, guided by policy training success, outperforming single-component optimization.
DARE: Diffusion Language Model Activation Reuse for Efficient Inference cs.LG · 2026-05-01 · unverdicted · none · ref 18
DARE reuses up to 87% of attention activations in diffusion LLMs through KV caching and output reuse, delivering 1.2x per-layer latency gains with average performance drops of 1.2-2.0%.
SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution cs.SE · 2025-02-25 · unverdicted · none · ref 164
SWE-RL uses RL on software evolution data to train LLMs achieving 41% on SWE-bench Verified with generalization to other reasoning tasks.
RouterBench: A Benchmark for Multi-LLM Routing System cs.LG · 2024-03-18 · unverdicted · none · ref 48
RouterBench supplies a standardized benchmark, 405k+ inference dataset, theoretical framework, and comparative analysis for multi-LLM routing systems.
Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads cs.LG · 2024-01-19 · conditional · none · ref 125
Medusa augments LLMs with multiple decoding heads and tree-based attention to predict and verify several tokens in parallel, yielding 2.2-3.6x inference speedup via two fine-tuning regimes.
ASPI: Seeking Ambiguity Clarification Amplifies Prompt Injection Vulnerability in LLM Agents cs.CR · 2026-05-17 · conditional · none · ref 52
Clarification-seeking in LLM agents amplifies prompt injection attack success from ~2% to over 30% across ten frontier models in a new 728-scenario benchmark.
Uni-Synergy: Bridging Understanding and Generation for Personalized Reasoning via Co-operative Reinforcement Learning cs.CV · 2026-05-11 · unverdicted · none · ref 40
Sync-R1 applies cooperative RL with Sync-GRPO and Dynamic Group Scaling to achieve superior cross-task personalized reasoning in multimodal models on the new UnifyBench++ dataset.
Parallel Prefix Verification for Speculative Generation cs.AI · 2026-05-05 · unverdicted · none · ref 42
PARSE accelerates LLM inference via parallel semantic prefix verification in a single forward pass, delivering 1.25x-4.3x speedups alone and up to 4.5x when combined with EAGLE-3.
Muon is Scalable for LLM Training cs.LG · 2025-02-24 · unverdicted · none · ref 103
Muon optimizer with weight decay and update scaling achieves ~2x efficiency over AdamW for large LLMs, shown via the Moonlight 3B/16B MoE model trained on 5.7T tokens.
Language Models (Mostly) Know What They Know cs.CL · 2022-07-11 · unverdicted · none · ref 192
Language models show good calibration when asked to estimate the probability that their own answers are correct, with performance improving as models get larger.
A General Language Assistant as a Laboratory for Alignment cs.CL · 2021-12-01 · conditional · none · ref 134
Ranked preference modeling outperforms imitation learning for language model alignment and scales more favorably with model size.
The Efficiency Gap in Byte Modeling cs.LG · 2026-05-13 · unverdicted · none · ref 40
Byte modeling incurs greater scaling overhead for masked diffusion than autoregressive models because the diffusion objective destroys local byte contiguity needed to resolve semantics.
SlimQwen: Exploring the Pruning and Distillation in Large MoE Model Pre-training cs.LG · 2026-05-09 · unverdicted · none · ref 83 · 2 links
Pruning pretrained MoE models outperforms training from scratch under fixed budget, different expert compression methods converge after continued training, and progressive pruning plus multi-token KD improves the final 23A2B model.
DACA-GRPO: Denoising-Aware Credit Assignment for Reinforcement Learning in Diffusion Language Models cs.LG · 2026-05-08 · unverdicted · none · ref 24
DACA-GRPO adds denoising-aware credit assignment and bias-reduced likelihood estimation to GRPO, delivering consistent gains up to 36.3pp on math, code, constraint, and schema benchmarks for diffusion LLMs.
Learning in the Fisher Subspace: A Guided Initialization for LoRA Fine-Tuning cs.LG · 2026-05-01 · unreviewed · ref 40

Title resolution pending

fields

years

verdicts

representative citing papers

citing papers explorer