arXiv preprint arXiv:2410.23123 , year=

Chulin Xie, Yangsibo Huang, Chiyuan Zhang, Da Yu, Xinyun Chen, Bill Yuchen Lin, Bo Li, Badih Ghazi, Ravi Kumar · 2024 · arXiv 2410.23123

9 Pith papers cite this work. Polarity classification is still indexing.

9 Pith papers citing it

read on arXiv browse 9 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Unsteady Metrics and Benchmarking Cultures of AI Model Builders

cs.AI · 2026-05-13 · accept · novelty 8.0

AI model builders mostly highlight unique benchmarks that act as flexible narrative tools for market positioning rather than standardized scientific measurements.

cs.CR · 2025-08-15 · accept · novelty 7.0

A survey of LLM copyright protection that unifies text watermarking, model watermarking, and model fingerprinting while presenting new coverage of fingerprint transfer and removal.

Blending Supervised and Reinforcement Fine-Tuning with Prefix Sampling

cs.LG · 2025-07-02 · unverdicted · novelty 7.0

Prefix-RFT blends SFT and RFT via prefix sampling from demonstrations to outperform standalone SFT, RFT, and mixed-policy baselines on math reasoning problems.

ActivationReasoning: Logical Reasoning in Latent Activation Spaces

cs.LG · 2025-10-21 · unverdicted · novelty 6.0

ActivationReasoning grounds logical reasoning in LLM latent activations via SAEs to enable structured inference, concept composition, and behavior steering on multi-hop, abstraction, and safety tasks.

Structured In-context Environment Scaling for Large Language Model Reasoning

cs.CL · 2025-09-27 · conditional · novelty 6.0

SIE framework automatically constructs scalable, verifiable reasoning environments from structured data, improving in-domain performance and enabling generalization to out-of-domain math and logic tasks.

Learning to Refine: Self-Refinement of Parallel Reasoning in LLMs

cs.LG · 2025-08-27 · conditional · novelty 6.0

GSR jointly trains LLMs to generate candidate solutions and refine a superior final answer from them, achieving state-of-the-art performance on five mathematical benchmarks while transferring across model scales.

Proximal Supervised Fine-Tuning

cs.LG · 2025-08-25 · unverdicted · novelty 5.0

PSFT modifies supervised fine-tuning by incorporating trust-region ideas from RL to constrain policy changes, yielding better out-of-domain generalization in math and human-value tasks without entropy collapse.

Effects of Cross-lingual Evidence in Multilingual Medical Question Answering

cs.CL · 2026-04-22 · unverdicted · novelty 4.0

Combining English and target-language web retrieval boosts medical QA for low-resource languages to match high-resource performance, while English web data benefits high-resource languages most and specialized sources like PubMed lack multilingual coverage.

Sharpness-Guided Group Relative Policy Optimization via Probability Shaping

cs.LG · 2025-10-29 · unverdicted · novelty 4.0

GRPO-SG is a sharpness-guided token-weighted variant of GRPO that downweights high-gradient tokens to stabilize optimization and improve generalization in reinforcement learning with verifiable rewards.

citing papers explorer

Showing 9 of 9 citing papers.

Unsteady Metrics and Benchmarking Cultures of AI Model Builders cs.AI · 2026-05-13 · accept · none · ref 64
AI model builders mostly highlight unique benchmarks that act as flexible narrative tools for market positioning rather than standardized scientific measurements.
Copyright Protection for Large Language Models: A Survey of Methods, Challenges, and Trends cs.CR · 2025-08-15 · accept · none · ref 154
A survey of LLM copyright protection that unifies text watermarking, model watermarking, and model fingerprinting while presenting new coverage of fingerprint transfer and removal.
Blending Supervised and Reinforcement Fine-Tuning with Prefix Sampling cs.LG · 2025-07-02 · unverdicted · none · ref 40
Prefix-RFT blends SFT and RFT via prefix sampling from demonstrations to outperform standalone SFT, RFT, and mixed-policy baselines on math reasoning problems.
ActivationReasoning: Logical Reasoning in Latent Activation Spaces cs.LG · 2025-10-21 · unverdicted · none · ref 20
ActivationReasoning grounds logical reasoning in LLM latent activations via SAEs to enable structured inference, concept composition, and behavior steering on multi-hop, abstraction, and safety tasks.
Structured In-context Environment Scaling for Large Language Model Reasoning cs.CL · 2025-09-27 · conditional · none · ref 24
SIE framework automatically constructs scalable, verifiable reasoning environments from structured data, improving in-domain performance and enabling generalization to out-of-domain math and logic tasks.
Learning to Refine: Self-Refinement of Parallel Reasoning in LLMs cs.LG · 2025-08-27 · conditional · none · ref 38
GSR jointly trains LLMs to generate candidate solutions and refine a superior final answer from them, achieving state-of-the-art performance on five mathematical benchmarks while transferring across model scales.
Proximal Supervised Fine-Tuning cs.LG · 2025-08-25 · unverdicted · none · ref 25
PSFT modifies supervised fine-tuning by incorporating trust-region ideas from RL to constrain policy changes, yielding better out-of-domain generalization in math and human-value tasks without entropy collapse.
Effects of Cross-lingual Evidence in Multilingual Medical Question Answering cs.CL · 2026-04-22 · unverdicted · none · ref 40
Combining English and target-language web retrieval boosts medical QA for low-resource languages to match high-resource performance, while English web data benefits high-resource languages most and specialized sources like PubMed lack multilingual coverage.
Sharpness-Guided Group Relative Policy Optimization via Probability Shaping cs.LG · 2025-10-29 · unverdicted · none · ref 36
GRPO-SG is a sharpness-guided token-weighted variant of GRPO that downweights high-gradient tokens to stabilize optimization and improve generalization in reinforcement learning with verifiable rewards.

arXiv preprint arXiv:2410.23123 , year=

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer