Canonical reference

Yu, and Aiwei Liu

Lingzhe Zhang, Liancheng Fang, Chiming Duan, Minghua He, Leyi Pan, Pei Xiao, Shiyu Huang, Yunpeng Zhai, Xuming Hu, Philip S Yu, et al · 2025 · arXiv 2508.08712

Canonical reference. 100% of citing Pith papers cite this work as background.

9 Pith papers citing it

Background 100% of classified citations

read on arXiv browse 9 citing papers

citation-role summary

background 5

citation-polarity summary

background 5

representative citing papers

From Feedback Loops to Policy Updates: Reinforcement Fine-Tuning for LLM-Based Alpha Factor Discovery

cs.CE · 2026-05-14 · unverdicted · novelty 7.0

QuantEvolver applies reinforcement fine-tuning to evolve an LLM policy for generating executable alpha factor expressions, yielding higher-quality and more complementary factors than prompt-based baselines on market benchmarks.

GPO-V: Jailbreak Diffusion Vision Language Model by Global Probability Optimization

cs.CV · 2026-05-08 · unverdicted · novelty 7.0 · 2 refs

GPO-V jailbreaks dVLMs by globally optimizing probabilities in the denoising process to bypass refusal patterns, achieving stealthy and transferable attacks.

Regulating Branch Parallelism in LLM Serving

cs.DC · 2026-05-07 · unverdicted · novelty 7.0

TAPER regulates LLM branch parallelism by admitting extra branches opportunistically when predicted externality fits slack, delivering 1.48-1.77x higher goodput than eager or fixed-cap baselines on Qwen3-32B while keeping over 95% SLO attainment.

Towards Robust LLM Post-Training: Automatic Failure Management for Reinforcement Fine-Tuning

cs.SE · 2026-05-06 · unverdicted · novelty 7.0

Introduces the first benchmark for fine-grained failures in reinforcement fine-tuning of LLMs and an automatic management framework that detects, diagnoses, and remediates them.

E2E-REME: Towards End-to-End Microservices Auto-Remediation via Experience-Simulation Reinforcement Fine-Tuning

cs.SE · 2026-04-13 · unverdicted · novelty 7.0

E2E-REME outperforms nine LLMs in accuracy and efficiency for end-to-end microservice remediation by using experience-simulation reinforcement fine-tuning on a new benchmark called MicroRemed.

DMax: Aggressive Parallel Decoding for dLLMs

cs.LG · 2026-04-09 · conditional · novelty 7.0 · 2 refs

DMax uses On-Policy Uniform Training and Soft Parallel Decoding to enable aggressive parallelism in dLLMs, raising TPF on GSM8K from 2.04 to 5.47 and on MBPP from 2.71 to 5.86 while preserving accuracy.

d-TreeRPO: Towards More Reliable Policy Optimization for Diffusion Language Models

cs.CL · 2025-12-10 · unverdicted · novelty 7.0

d-TreeRPO uses tree rollouts for fine-grained verifiable rewards and time-scheduled self-distillation to reduce probability estimation gaps in diffusion LLMs, delivering substantial gains on Sudoku, Countdown, GSM8K, and Math500 benchmarks.

SpecBound: Adaptive Bounded Self-Speculation with Layer-wise Confidence Calibration

cs.CL · 2026-04-14 · unverdicted · novelty 6.0

SpecBound achieves up to 2.33x wall-time speedup in LLM inference via adaptive bounded self-speculation and layer-wise confidence calibration while preserving exact output equivalence.

Saber: An Efficient Sampling with Adaptive Acceleration and Backtracking Enhanced Remasking for Diffusion Language Model

cs.AI · 2025-10-20 · unverdicted · novelty 6.0

Saber improves both speed and accuracy of diffusion language models on code generation by dynamically adjusting unmasking steps and reverting low-confidence tokens via backtracking.

citing papers explorer

Showing 9 of 9 citing papers.

From Feedback Loops to Policy Updates: Reinforcement Fine-Tuning for LLM-Based Alpha Factor Discovery cs.CE · 2026-05-14 · unverdicted · none · ref 23
QuantEvolver applies reinforcement fine-tuning to evolve an LLM policy for generating executable alpha factor expressions, yielding higher-quality and more complementary factors than prompt-based baselines on market benchmarks.
GPO-V: Jailbreak Diffusion Vision Language Model by Global Probability Optimization cs.CV · 2026-05-08 · unverdicted · none · ref 25 · 2 links
GPO-V jailbreaks dVLMs by globally optimizing probabilities in the denoising process to bypass refusal patterns, achieving stealthy and transferable attacks.
Regulating Branch Parallelism in LLM Serving cs.DC · 2026-05-07 · unverdicted · none · ref 12
TAPER regulates LLM branch parallelism by admitting extra branches opportunistically when predicted externality fits slack, delivering 1.48-1.77x higher goodput than eager or fixed-cap baselines on Qwen3-32B while keeping over 95% SLO attainment.
Towards Robust LLM Post-Training: Automatic Failure Management for Reinforcement Fine-Tuning cs.SE · 2026-05-06 · unverdicted · none · ref 3
Introduces the first benchmark for fine-grained failures in reinforcement fine-tuning of LLMs and an automatic management framework that detects, diagnoses, and remediates them.
E2E-REME: Towards End-to-End Microservices Auto-Remediation via Experience-Simulation Reinforcement Fine-Tuning cs.SE · 2026-04-13 · unverdicted · none · ref 55
E2E-REME outperforms nine LLMs in accuracy and efficiency for end-to-end microservice remediation by using experience-simulation reinforcement fine-tuning on a new benchmark called MicroRemed.
DMax: Aggressive Parallel Decoding for dLLMs cs.LG · 2026-04-09 · conditional · none · ref 98 · 2 links
DMax uses On-Policy Uniform Training and Soft Parallel Decoding to enable aggressive parallelism in dLLMs, raising TPF on GSM8K from 2.04 to 5.47 and on MBPP from 2.71 to 5.86 while preserving accuracy.
d-TreeRPO: Towards More Reliable Policy Optimization for Diffusion Language Models cs.CL · 2025-12-10 · unverdicted · none · ref 6
d-TreeRPO uses tree rollouts for fine-grained verifiable rewards and time-scheduled self-distillation to reduce probability estimation gaps in diffusion LLMs, delivering substantial gains on Sudoku, Countdown, GSM8K, and Math500 benchmarks.
SpecBound: Adaptive Bounded Self-Speculation with Layer-wise Confidence Calibration cs.CL · 2026-04-14 · unverdicted · none · ref 6
SpecBound achieves up to 2.33x wall-time speedup in LLM inference via adaptive bounded self-speculation and layer-wise confidence calibration while preserving exact output equivalence.
Saber: An Efficient Sampling with Adaptive Acceleration and Backtracking Enhanced Remasking for Diffusion Language Model cs.AI · 2025-10-20 · unverdicted · none · ref 25
Saber improves both speed and accuracy of diffusion language models on code generation by dynamically adjusting unmasking steps and reverting low-confidence tokens via backtracking.

Yu, and Aiwei Liu

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer