On memorization of large language models in logical reasoning

Chulin Xie, Yangsibo Huang, Chiyuan Zhang, Da Yu, Xinyun Chen, Bill Yuchen Lin, Bo Li, Badih Ghazi, Ravi Kumar · 2024 · arXiv 2410.23123

12 Pith papers cite this work. Polarity classification is still indexing.

12 Pith papers citing it

read on arXiv browse 12 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Unsteady Metrics and Benchmarking Cultures of AI Model Builders

cs.AI · 2026-05-13 · accept · novelty 8.0

AI model builders mostly highlight unique benchmarks that act as flexible narrative tools for market positioning rather than standardized scientific measurements.

cs.CR · 2025-08-15 · accept · novelty 7.0

A survey of LLM copyright protection that unifies text watermarking, model watermarking, and model fingerprinting while presenting new coverage of fingerprint transfer and removal.

Blending Supervised and Reinforcement Fine-Tuning with Prefix Sampling

cs.LG · 2025-07-02 · unverdicted · novelty 7.0

Prefix-RFT blends SFT and RFT via prefix sampling from demonstrations to outperform standalone SFT, RFT, and mixed-policy baselines on math reasoning problems.

A Close Look At World Model Recovery In Supervised Fine-Tuned LLM Planners

cs.LG · 2026-06-02 · unverdicted · novelty 6.0

Supervised fine-tuning lets LLMs linearly encode action validity and state predicates, with broader state-space coverage during training improving world-model recovery.

ActivationReasoning: Logical Reasoning in Latent Activation Spaces

cs.LG · 2025-10-21 · unverdicted · novelty 6.0

ActivationReasoning grounds logical reasoning in LLM latent activations via SAEs to enable structured inference, concept composition, and behavior steering on multi-hop, abstraction, and safety tasks.

Structured In-context Environment Scaling for Large Language Model Reasoning

cs.CL · 2025-09-27 · conditional · novelty 6.0

SIE framework automatically constructs scalable, verifiable reasoning environments from structured data, improving in-domain performance and enabling generalization to out-of-domain math and logic tasks.

Learning to Refine: Self-Refinement of Parallel Reasoning in LLMs

cs.LG · 2025-08-27 · conditional · novelty 6.0

GSR jointly trains LLMs to generate candidate solutions and refine a superior final answer from them, achieving state-of-the-art performance on five mathematical benchmarks while transferring across model scales.

On the Generalization Gap in Self-Evolving Language Model Reasoning

cs.CL · 2026-05-31 · unverdicted · novelty 5.0

Closed-loop self-evolution on LLMs improves reasoning on Knights and Knaves tasks but plateaus short of oracle-supervised levels, with multi-turn revision nearly matching it for large models.

Proximal Supervised Fine-Tuning

cs.LG · 2025-08-25 · unverdicted · novelty 5.0

PSFT modifies supervised fine-tuning by incorporating trust-region ideas from RL to constrain policy changes, yielding better out-of-domain generalization in math and human-value tasks without entropy collapse.

Entropy-KL Divergence-based Token Masking: A Novel Approach for Selective Fine-tuning of Large Language Models

cs.AI · 2026-05-28 · unverdicted · novelty 4.0

EKSFT masks high-entropy or high-KL tokens in low-data SFT to preserve pre-trained distribution and improve downstream RL performance on math reasoning tasks.

Effects of Cross-lingual Evidence in Multilingual Medical Question Answering

cs.CL · 2026-04-22 · unverdicted · novelty 4.0

Combining English and target-language web retrieval boosts medical QA for low-resource languages to match high-resource performance, while English web data benefits high-resource languages most and specialized sources like PubMed lack multilingual coverage.

Sharpness-Guided Group Relative Policy Optimization via Probability Shaping

cs.LG · 2025-10-29 · unverdicted · novelty 4.0

GRPO-SG is a sharpness-guided token-weighted variant of GRPO that downweights high-gradient tokens to stabilize optimization and improve generalization in reinforcement learning with verifiable rewards.

citing papers explorer

Showing 8 of 8 citing papers after filters.

Blending Supervised and Reinforcement Fine-Tuning with Prefix Sampling cs.LG · 2025-07-02 · unverdicted · none · ref 40
Prefix-RFT blends SFT and RFT via prefix sampling from demonstrations to outperform standalone SFT, RFT, and mixed-policy baselines on math reasoning problems.
A Close Look At World Model Recovery In Supervised Fine-Tuned LLM Planners cs.LG · 2026-06-02 · unverdicted · none · ref 26
Supervised fine-tuning lets LLMs linearly encode action validity and state predicates, with broader state-space coverage during training improving world-model recovery.
ActivationReasoning: Logical Reasoning in Latent Activation Spaces cs.LG · 2025-10-21 · unverdicted · none · ref 20
ActivationReasoning grounds logical reasoning in LLM latent activations via SAEs to enable structured inference, concept composition, and behavior steering on multi-hop, abstraction, and safety tasks.
On the Generalization Gap in Self-Evolving Language Model Reasoning cs.CL · 2026-05-31 · unverdicted · none · ref 39
Closed-loop self-evolution on LLMs improves reasoning on Knights and Knaves tasks but plateaus short of oracle-supervised levels, with multi-turn revision nearly matching it for large models.
Proximal Supervised Fine-Tuning cs.LG · 2025-08-25 · unverdicted · none · ref 25
PSFT modifies supervised fine-tuning by incorporating trust-region ideas from RL to constrain policy changes, yielding better out-of-domain generalization in math and human-value tasks without entropy collapse.
Entropy-KL Divergence-based Token Masking: A Novel Approach for Selective Fine-tuning of Large Language Models cs.AI · 2026-05-28 · unverdicted · none · ref 35
EKSFT masks high-entropy or high-KL tokens in low-data SFT to preserve pre-trained distribution and improve downstream RL performance on math reasoning tasks.
Effects of Cross-lingual Evidence in Multilingual Medical Question Answering cs.CL · 2026-04-22 · unverdicted · none · ref 40
Combining English and target-language web retrieval boosts medical QA for low-resource languages to match high-resource performance, while English web data benefits high-resource languages most and specialized sources like PubMed lack multilingual coverage.
Sharpness-Guided Group Relative Policy Optimization via Probability Shaping cs.LG · 2025-10-29 · unverdicted · none · ref 36
GRPO-SG is a sharpness-guided token-weighted variant of GRPO that downweights high-gradient tokens to stabilize optimization and improve generalization in reinforcement learning with verifiable rewards.

On memorization of large language models in logical reasoning

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer