An analysis for reasoning bias of language models with small initialization

[YZX25] Junjie Yao, Zhongwang Zhang, Zhi-Qin John Xu · arXiv 2502.04375

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

representative citing papers

How Do Transformers Learn to Associate Tokens: Gradient Leading Terms Bring Mechanistic Interpretability

cs.CL · 2026-01-27 · unverdicted · novelty 7.0

Transformer weights at early training stages are closed-form compositions of bigram, token-interchangeability, and context mappings that directly reflect text-corpus statistics and explain the emergence of semantic associations.

An overview of condensation phenomenon in deep learning

cs.LG · 2025-04-13 · unverdicted · novelty 2.0

Neural networks exhibit condensation of neurons into clusters with similar outputs whose number increases monotonically during training, facilitated by small initializations or dropout, providing insights into generalization and reasoning.

Understanding LoRA as Knowledge Memory: An Empirical Analysis

cs.LG · 2026-03-01

citing papers explorer

Showing 1 of 1 citing paper after filters.

Understanding LoRA as Knowledge Memory: An Empirical Analysis cs.LG · 2026-03-01 · unreviewed · ref 3

An analysis for reasoning bias of language models with small initialization

fields

years

verdicts

representative citing papers

citing papers explorer