ACROS induces explicit sense representations in frozen decoder LMs via gated residual addition, enabling competitive zero-shot WSD, lexical steering, and cross-lingual adaptation on SmolLM2-360M while preserving base quality.
Title resolution pending
3 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.CL 3years
2026 3verdicts
UNVERDICTED 3representative citing papers
Proposes AI-driven simulations for literary-historical experiments and reports preliminary text-generation results claiming the first limited in-distribution outputs matching human novels.
This survey organizes intrinsic interpretability approaches for LLMs into five categories—functional transparency, concept alignment, representational decomposability, explicit modularization, and latent sparsity induction—while discussing challenges and future directions.
citing papers explorer
-
Sense Representations Are Inducible Interfaces
ACROS induces explicit sense representations in frozen decoder LMs via gated residual addition, enabling competitive zero-shot WSD, lexical steering, and cross-lingual adaptation on SmolLM2-360M while preserving base quality.
-
AI as a Tool for Simulation-Based Experiments in Literary Studies
Proposes AI-driven simulations for literary-historical experiments and reports preliminary text-generation results claiming the first limited in-distribution outputs matching human novels.
-
Towards Intrinsic Interpretability of Large Language Models:A Survey of Design Principles and Architectures
This survey organizes intrinsic interpretability approaches for LLMs into five categories—functional transparency, concept alignment, representational decomposability, explicit modularization, and latent sparsity induction—while discussing challenges and future directions.