Self-supervised Analogical Learning using Language Models

Ben Zhou; Dan Roth; Qiang Ning; Sarthak Jain; Shuai Wang; Yassine Benajiba; Yi Zhang

arxiv: 2502.00996 · v1 · pith:XULWHZHQnew · submitted 2025-02-03 · 💻 cs.CL

Self-supervised Analogical Learning using Language Models

Ben Zhou , Sarthak Jain , Yi Zhang , Qiang Ning , Shuai Wang , Yassine Benajiba , Dan Roth This is my paper

classification 💻 cs.CL

keywords modelscasesreasoningtheylanguagelearninganalogicaldata

0 comments

read the original abstract

Large language models have been shown to suffer from reasoning inconsistency issues. That is, they fail more in situations unfamiliar to the training data, even though exact or very similar reasoning paths exist in more common cases that they can successfully solve. Such observations motivate us to propose methods that encourage models to understand the high-level and abstract reasoning processes during training instead of only the final answer. This way, models can transfer the exact solution to similar cases, regardless of their relevance to the pre-training data distribution. In this work, we propose SAL, a self-supervised analogical learning framework. SAL mimics the human analogy process and trains models to explicitly transfer high-quality symbolic solutions from cases that they know how to solve to other rare cases in which they tend to fail more. We show that the resulting models after SAL learning outperform base language models on a wide range of reasoning benchmarks, such as StrategyQA, GSM8K, and HotpotQA, by 2% to 20%. At the same time, we show that our model is more generalizable and controllable through analytical studies.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

CORE: Concept-Oriented Reinforcement for Bridging the Definition-Application Gap in Mathematical Reasoning
cs.AI 2025-12 unverdicted novelty 7.0

CORE is a concept-oriented RL method that synthesizes quizzes, injects concept snippets into rollouts, and reinforces conceptual trajectories to close the gap between restating definitions and applying them in math problems.