AI model builders mostly highlight unique benchmarks that act as flexible narrative tools for market positioning rather than standardized scientific measurements.
arXiv preprint arXiv:2410.23123 , year=
9 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 1polarities
background 1representative citing papers
A survey of LLM copyright protection that unifies text watermarking, model watermarking, and model fingerprinting while presenting new coverage of fingerprint transfer and removal.
Prefix-RFT blends SFT and RFT via prefix sampling from demonstrations to outperform standalone SFT, RFT, and mixed-policy baselines on math reasoning problems.
ActivationReasoning grounds logical reasoning in LLM latent activations via SAEs to enable structured inference, concept composition, and behavior steering on multi-hop, abstraction, and safety tasks.
SIE framework automatically constructs scalable, verifiable reasoning environments from structured data, improving in-domain performance and enabling generalization to out-of-domain math and logic tasks.
GSR jointly trains LLMs to generate candidate solutions and refine a superior final answer from them, achieving state-of-the-art performance on five mathematical benchmarks while transferring across model scales.
PSFT modifies supervised fine-tuning by incorporating trust-region ideas from RL to constrain policy changes, yielding better out-of-domain generalization in math and human-value tasks without entropy collapse.
Combining English and target-language web retrieval boosts medical QA for low-resource languages to match high-resource performance, while English web data benefits high-resource languages most and specialized sources like PubMed lack multilingual coverage.
GRPO-SG is a sharpness-guided token-weighted variant of GRPO that downweights high-gradient tokens to stabilize optimization and improve generalization in reinforcement learning with verifiable rewards.
citing papers explorer
-
Unsteady Metrics and Benchmarking Cultures of AI Model Builders
AI model builders mostly highlight unique benchmarks that act as flexible narrative tools for market positioning rather than standardized scientific measurements.
-
Copyright Protection for Large Language Models: A Survey of Methods, Challenges, and Trends
A survey of LLM copyright protection that unifies text watermarking, model watermarking, and model fingerprinting while presenting new coverage of fingerprint transfer and removal.
-
Blending Supervised and Reinforcement Fine-Tuning with Prefix Sampling
Prefix-RFT blends SFT and RFT via prefix sampling from demonstrations to outperform standalone SFT, RFT, and mixed-policy baselines on math reasoning problems.
-
ActivationReasoning: Logical Reasoning in Latent Activation Spaces
ActivationReasoning grounds logical reasoning in LLM latent activations via SAEs to enable structured inference, concept composition, and behavior steering on multi-hop, abstraction, and safety tasks.
-
Structured In-context Environment Scaling for Large Language Model Reasoning
SIE framework automatically constructs scalable, verifiable reasoning environments from structured data, improving in-domain performance and enabling generalization to out-of-domain math and logic tasks.
-
Learning to Refine: Self-Refinement of Parallel Reasoning in LLMs
GSR jointly trains LLMs to generate candidate solutions and refine a superior final answer from them, achieving state-of-the-art performance on five mathematical benchmarks while transferring across model scales.
-
Proximal Supervised Fine-Tuning
PSFT modifies supervised fine-tuning by incorporating trust-region ideas from RL to constrain policy changes, yielding better out-of-domain generalization in math and human-value tasks without entropy collapse.
-
Effects of Cross-lingual Evidence in Multilingual Medical Question Answering
Combining English and target-language web retrieval boosts medical QA for low-resource languages to match high-resource performance, while English web data benefits high-resource languages most and specialized sources like PubMed lack multilingual coverage.
-
Sharpness-Guided Group Relative Policy Optimization via Probability Shaping
GRPO-SG is a sharpness-guided token-weighted variant of GRPO that downweights high-gradient tokens to stabilize optimization and improve generalization in reinforcement learning with verifiable rewards.