TeRA parametrizes high-rank LLM weight updates via a random Tucker-like tensor network with shared frozen factors and layer-specific scaling vectors, matching high-rank adapter performance at vector-level parameter counts.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2025 2representative citing papers
Directly fine-tuning the value bias (b_v) in transformer projections outperforms fine-tuning b_q or b_k for downstream performance in low-data regimes across multiple LLM architectures.
citing papers explorer
-
TeRA: Vector-based Random Tensor Network for High-Rank Adaptation of Large Language Models
TeRA parametrizes high-rank LLM weight updates via a random Tucker-like tensor network with shared frozen factors and layer-specific scaling vectors, matching high-rank adapter performance at vector-level parameter counts.
-
BEFT: Bias-Efficient Fine-Tuning of Language Models in Low-Data Regimes
Directly fine-tuning the value bias (b_v) in transformer projections outperforms fine-tuning b_q or b_k for downstream performance in low-data regimes across multiple LLM architectures.