fmxcoders improve cross-layer feature recovery in transformers via factorized weights and layer masking, delivering 10-30 point probing F1 gains, 25-50% lower MSE, doubled functional coherence, and 3-13x more coherent latents than standard crosscoders on GPT2-Small, Pythia, and Gemma2 models.
The expression of a tensor or a polyadic as a sum of products
3 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 3roles
background 1polarities
background 1representative citing papers
Tensor-train low-rank structure enables tractable near-optimal Bayesian inference for high-dimensional MIMO detection and soft-decision decoding.
An overview revisits LoRA variants by categorizing advances in architectural design, efficient optimization, and applications while linking them to classical signal processing tools for principled fine-tuning.
citing papers explorer
-
fmxcoders: Factorized Masked Crosscoders for Cross-Layer Feature Discovery
fmxcoders improve cross-layer feature recovery in transformers via factorized weights and layer masking, delivering 10-30 point probing F1 gains, 25-50% lower MSE, doubled functional coherence, and 3-13x more coherent latents than standard crosscoders on GPT2-Small, Pythia, and Gemma2 models.
-
A Tensor-Train Framework for Bayesian Inference in High-Dimensional Systems: Applications to MIMO Detection and Channel Decoding
Tensor-train low-rank structure enables tractable near-optimal Bayesian inference for high-dimensional MIMO detection and soft-decision decoding.
-
Low-Rank Adaptation Redux for Large Models
An overview revisits LoRA variants by categorizing advances in architectural design, efficient optimization, and applications while linking them to classical signal processing tools for principled fine-tuning.