Tiered Language Models use a secret key to induce an alternative computation graph over shared weights, enabling private capabilities in the keyed mode while the public mode shows none.
Merging text transformer models from different initializations
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
A bidirectional optimization method using parameterized transformations enables near-zero loss barriers for linear mode connectivity in medium-scale language models and small barriers in billion-parameter transformers.
citing papers explorer
-
Toward Open Weight Models Without Risks: Separating Public and Private Capabilities in LLMs
Tiered Language Models use a secret key to induce an alternative computation graph over shared weights, enabling private capabilities in the keyed mode while the public mode shows none.
-
Scaling Linear Mode Connectivity and Merging to Billion Parameter Pretrained Transformers
A bidirectional optimization method using parameterized transformations enables near-zero loss barriers for linear mode connectivity in medium-scale language models and small barriers in billion-parameter transformers.