arxiv: 2604.18124 · v1 · submitted 2026-04-20 · 💻 cs.CL · cs.AI

Recognition: unknown

TLoRA: Task-aware Low Rank Adaptation of Large Language Models

Weicheng Lin , Yi Zhang , Jiawei Dang , Liang-Jie Zhang

Authors on Pith no claims yet

Pith reviewed 2026-05-10 05:02 UTC · model grok-4.3

classification 💻 cs.CL cs.AI

keywords Low-Rank AdaptationParameter-Efficient Fine-TuningLarge Language ModelsSingular Value DecompositionRank AllocationTask-Aware Adaptation

0 comments

The pith

TLoRA cuts trainable parameters while matching performance on language tasks

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents TLoRA as a unified framework that jointly sets the initial values and distributes limited training resources across layers before fine-tuning starts. It initializes the LoRA A matrix by performing singular value decomposition on the product of the pre-trained weights and the input activation covariance, which aligns the starting point with directions relevant to the task at hand. After this step the A matrix is held fixed and only the B matrix is updated during training. A sensitivity measure then decides how many ranks and what scaling factors each layer receives, all while respecting a fixed total budget of trainable parameters. Experiments across natural language understanding, commonsense and math reasoning, code generation, and chat tasks show that this produces strong results with far fewer parameters to update than standard approaches. A reader would care because fine-tuning large models is costly in compute and data, and a method that automates good starting points and allocations could make such customization cheaper and more reliable.

Core claim

TLoRA is a framework that performs singular value decomposition on the product of pre-trained weights and input activation covariance to initialize the LoRA A matrix with task-relevant subspaces, freezes A while training only B, and employs a sensitivity-based importance metric to adaptively allocate ranks and scaling factors across layers under a fixed parameter budget, leading to excellent performance across natural language understanding, commonsense reasoning, math reasoning, code generation, and chat generation tasks with reduced trainable parameters.

What carries the argument

The SVD-based task-aware initialization of the LoRA A matrix combined with the sensitivity-based metric for adaptive rank and scaling allocation across layers.

If this is right

TLoRA reduces the number of trainable parameters compared with standard LoRA while delivering strong results on multiple task types.
The method works across natural language understanding, commonsense reasoning, math reasoning, code generation, and chat generation.
Freezing the initialized A matrix and training only the B matrix keeps training simpler than some other variants.
The joint optimization of initialization and allocation avoids the extra complexity introduced by some existing LoRA extensions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The sensitivity metric could be tested as a general tool for deciding update priorities in other parameter-efficient methods.
If the activation covariance step proves robust, small task-specific datasets might suffice to guide efficient adaptation in new domains.
Similar decomposition-based initialization might apply to deciding which modules to adapt in full-model fine-tuning.

Load-bearing premise

The assumption that decomposing the product of the model's weights and input patterns will select directions useful for the task, and that the sensitivity measure will rank layer importance accurately without bias or extra searches.

What would settle it

Experiments that replace the SVD initialization with standard random initialization or switch to uniform rank allocation instead of the sensitivity-based choice, then check whether the reported performance advantage over baseline LoRA disappears on the tested tasks.

Figures

Figures reproduced from arXiv: 2604.18124 by Jiawei Dang, Liang-Jie Zhang, Weicheng Lin, Yi Zhang.

**Figure 1.** Figure 1: (Left) Standard LoRA employs random Gaussian initialization for matrix A and initializes B to zero, using a fixed rank r across all modules. (Right) TLoRA utilizes a task-aware initialization strategy where matrix A is initialized using the top-r ∗ singular vectors of the product of pre-trained weights and input covariance (W0C) and is subsequently frozen (indicated by the snowflake). Furthermore, TLoRA ad… view at source ↗

**Figure 2.** Figure 2: (Left) Training loss curves of TLoRA and baseline with 128 ranks on the MetaMathQA dataset. (Right) Training loss curves from the ablation study with different settings on the MetaMathQA dataset. are shown in Tables 2 and 3. Notably, despite utilizing significantly fewer trainable parameters, TLoRA consistently achieves superior or highly competitive performance across various inference and generative benc… view at source ↗

**Figure 3.** Figure 3: Subspace Similarity (ϕ). Finally, we map the optimization result back to the original parameter space via an inverse transformation to recover the LoRA projection matrix A. Recalling the transformation definition A˜ = AC1/2 , by right-multiplying both sides of the equation by C −1/2 , we derive the theoretical optimal solution for A as: A = V T r C −1/2 ∈ R r×n (16) Where Vr is the matrix composed of the… view at source ↗

**Figure 5.** Figure 5: presents the gradient norm and training loss curves during the initial phase. The results provide compelling evidence: firstly, the theoretical solution exhibits a propensity for gradient explosion, with gradient norms exceeding those of TLoRA by several orders of magnitude. Secondly, this instability hinders effective model learning, resulting in a failure to converge. In contrast, TLoRA maintains stab… view at source ↗

**Figure 6.** Figure 6: Subspace similarity analysis between the the [PITH_FULL_IMAGE:figures/full_fig_p015_6.png] view at source ↗

**Figure 7.** Figure 7: The data on the graph shows the difference [PITH_FULL_IMAGE:figures/full_fig_p018_7.png] view at source ↗

read the original abstract

Low-Rank Adaptation (LoRA) has become a widely adopted parameter-efficient fine-tuning method for large language models, with its effectiveness largely influenced by the allocation of ranks and scaling factors, as well as initialization. Existing LoRA variants typically address only one of these factors, often at the cost of increased training complexity or reduced practical efficiency. In this work, we present Task-aware Low-Rank Adaptation (TLoRA), a unified framework that jointly optimizes initialization and resource allocation at the outset of training. TLoRA introduces a data-driven initialization strategy that aligns the LoRA $A$ matrix with task-relevant subspaces by performing singular value decomposition on the product of pre-trained weights and input activation covariance. After this, the $A$ matrix is frozen, and only the $B$ matrix is trained. Furthermore, TLoRA employs a sensitivity-based importance metric to adaptively allocate ranks and scaling factors across layers under a fixed parameter budget. We conduct extensive experiments that demonstrate TLoRA consistently performs excellently across various tasks, including natural language understanding, commonsense reasoning, math reasoning, code generation, and chat generation, while significantly reducing the number of trainable parameters.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

TLoRA pairs an SVD-based initialization of the frozen A matrix with sensitivity-driven rank allocation under a budget, but the abstract gives no numbers so the claimed gains over standard LoRA remain unverified.

read the letter

The main point is that TLoRA tries to fix two separate issues in LoRA at once: it initializes the A matrix by running SVD on the product of pretrained weights and input activation covariance, freezes that A, trains only B, and then uses a sensitivity score to hand out different ranks and scaling factors per layer while staying inside a fixed parameter count. That combination of data-driven init plus adaptive allocation is the new piece; earlier variants usually handled only one of them.

Referee Report

3 major / 2 minor

Summary. The paper introduces TLoRA, a unified LoRA variant that (1) initializes the low-rank A matrix via SVD on the product of pretrained weights and input activation covariance, then freezes A while training only B, and (2) uses a sensitivity-based metric to adaptively allocate per-layer ranks and scaling factors under a fixed parameter budget. It claims this yields consistently strong results across NLU, commonsense reasoning, math reasoning, code generation, and chat tasks while using fewer trainable parameters than standard LoRA.

Significance. If the subspace-alignment and bias-free selection assumptions hold and the reported gains are reproducible, TLoRA would provide a practical, low-overhead way to improve both initialization and resource allocation in parameter-efficient fine-tuning. The joint treatment of init and allocation under a single framework is a conceptual strength, but the absence of quantitative results, ablations, or statistical details in the provided description makes it impossible to gauge whether the gains exceed those of existing adaptive LoRA methods.

major comments (3)

[§3] §3 (method): The claim that SVD on W·(activation covariance) produces an A matrix whose frozen directions are 'task-relevant' is presented as a direct algorithmic step with no accompanying analysis, visualization, or ablation showing that these directions capture task-specific features better than random or standard LoRA initialization; without such evidence the freezing of A after SVD remains an unverified modeling assumption.
[§4] §4 (experiments): The abstract and experimental claims assert 'consistent excellence' and 'significant' parameter reduction across five task categories, yet supply no numerical results, baseline comparisons (e.g., LoRA, DoRA, AdaLoRA), number of runs, or statistical tests; this absence is load-bearing because the central contribution is empirical superiority under a fixed budget.
[§3.3] §3.3 (sensitivity metric): The sensitivity-based importance score used for adaptive rank/scaling allocation is described only at a high level; if the metric is gradient- or activation-derived, its correlation with true task importance versus layer depth/width is not analyzed, leaving open the possibility that the reported gains are an artifact of the allocation heuristic rather than the unified framework.

minor comments (2)

[§3.1] Notation for the activation covariance matrix and the exact SVD procedure should be formalized with equations rather than prose to allow reproduction.
[§3] The paper should clarify the computational overhead of the one-time SVD and sensitivity computation relative to standard LoRA training.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. The comments identify important areas where additional evidence and clarity will strengthen the manuscript. We address each major comment below and indicate the revisions we will make.

read point-by-point responses

Referee: [§3] §3 (method): The claim that SVD on W·(activation covariance) produces an A matrix whose frozen directions are 'task-relevant' is presented as a direct algorithmic step with no accompanying analysis, visualization, or ablation showing that these directions capture task-specific features better than random or standard LoRA initialization; without such evidence the freezing of A after SVD remains an unverified modeling assumption.

Authors: We agree that the manuscript would be strengthened by explicit evidence supporting the task-relevance of the SVD-derived directions. In the revised version we will add to §3 an analysis subsection containing (i) visualizations of the leading singular vectors from the task-data SVD versus random initialization, (ii) quantitative ablations comparing downstream performance when A is frozen after SVD versus when it is randomly initialized or jointly trained, and (iii) a brief discussion of how the captured subspaces align with task-specific activation patterns. These additions will provide direct empirical grounding for the modeling choice. revision: yes
Referee: [§4] §4 (experiments): The abstract and experimental claims assert 'consistent excellence' and 'significant' parameter reduction across five task categories, yet supply no numerical results, baseline comparisons (e.g., LoRA, DoRA, AdaLoRA), number of runs, or statistical tests; this absence is load-bearing because the central contribution is empirical superiority under a fixed budget.

Authors: We acknowledge that the current presentation of results does not sufficiently detail the numerical evidence or statistical support. We will revise §4 to include full performance tables with exact metrics and trainable-parameter counts for TLoRA and the requested baselines (LoRA, DoRA, AdaLoRA) across all five task categories, report the number of runs (with standard deviations), and add statistical significance tests (paired t-tests) comparing TLoRA against each baseline under the fixed parameter budget. This will make the empirical claims fully verifiable. revision: yes
Referee: [§3.3] §3.3 (sensitivity metric): The sensitivity-based importance score used for adaptive rank/scaling allocation is described only at a high level; if the metric is gradient- or activation-derived, its correlation with true task importance versus layer depth/width is not analyzed, leaving open the possibility that the reported gains are an artifact of the allocation heuristic rather than the unified framework.

Authors: The sensitivity score is computed from the average gradient norm of the task loss with respect to each layer’s weights on a small calibration set of task data. We will expand §3.3 with the exact mathematical definition, a correlation analysis between the scores and both task performance and layer depth/width, and ablation experiments that compare sensitivity-based allocation against uniform and alternative heuristics while keeping the SVD initialization fixed. These additions will clarify that the observed gains arise from the joint framework rather than the allocation rule in isolation. revision: yes

Circularity Check

0 steps flagged

No significant circularity: TLoRA is a direct algorithmic recipe

full rationale

The paper presents TLoRA as an explicit two-part procedure—SVD initialization of the frozen A matrix from the product of pretrained weights and input activation covariance, followed by sensitivity-based adaptive allocation of ranks and scaling factors under a fixed budget. No equations, predictions, or uniqueness claims are shown to reduce to fitted constants, self-referential definitions, or load-bearing self-citations. The method is self-contained as a heuristic construction whose validity is assessed via downstream experiments rather than internal derivation loops.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The method rests on the unstated premise that the top singular vectors of W * Cov(x) capture task-relevant directions and that loss sensitivity to rank is a reliable proxy for importance; both are domain assumptions without independent verification in the abstract.

axioms (2)

domain assumption SVD on pre-trained weights times input activation covariance yields task-aligned subspaces for A initialization
Invoked in the initialization strategy paragraph of the abstract
domain assumption Sensitivity-based metric correctly ranks layer importance for rank allocation
Invoked in the adaptive allocation paragraph of the abstract

pith-pipeline@v0.9.0 · 5509 in / 1413 out tokens · 53715 ms · 2026-05-10T05:02:20.029487+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

83 extracted references · 52 canonical work pages · 24 internal anchors

[1]

GPT-4 Technical Report

Gpt-4 technical report , author=. arXiv preprint arXiv:2303.08774 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[2]

Advances in neural information processing systems , volume=

Training language models to follow instructions with human feedback , author=. Advances in neural information processing systems , volume=
[3]

LoRA-FA: Efficient and Effective Low Rank Representation Fine-tuning

Lora-fa: Memory-efficient low-rank adaptation for large language models fine-tuning , author=. arXiv preprint arXiv:2308.03303 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[4]

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning , author=. arXiv preprint arXiv:2501.12948 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[5]

The Llama 3 Herd of Models

The llama 3 herd of models , author=. arXiv preprint arXiv:2407.21783 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[6]

International conference on machine learning , pages=

Parameter-efficient transfer learning for NLP , author=. International conference on machine learning , pages=. 2019 , organization=

2019
[7]

2024 , school=

Identifying and mitigating vulnerabilities in llm-integrated applications , author=. 2024 , school=

2024
[8]

arXiv:2405.17357

Dora: Enhancing parameter-efficient fine-tuning with dynamic rank distribution , author=. arXiv preprint arXiv:2405.17357 , year=

work page arXiv
[9]

, author=

Lora: Low-rank adaptation of large language models. , author=. ICLR , volume=
[10]

Lora vs full fine-tuning: An illusion of equivalence

Lora vs full fine-tuning: An illusion of equivalence , author=. arXiv preprint arXiv:2410.21228 , year=

work page arXiv
[11]

Yangyi Chen, Binxuan Huang, Yifan Gao, Zhengyang Wang, Jingfeng Yang, and Heng Ji

Lora learns less and forgets less , author=. arXiv preprint arXiv:2405.09673 , year=

work page arXiv
[12]

arXiv preprint arXiv:2311.09578 , year=

Tied-LoRA: Enhancing parameter efficiency of LoRA with weight tying , author=. arXiv preprint arXiv:2311.09578 , year=

work page arXiv
[13]

arXiv preprint arXiv:2405.15179 , year=

VB-LoRA: extreme parameter efficient fine-tuning with vector banks , author=. arXiv preprint arXiv:2405.15179 , year=

work page arXiv
[14]

Gradient Descent Happens in a Tiny Subspace

Gradient descent happens in a tiny subspace , author=. arXiv preprint arXiv:1812.04754 , year=

work page Pith review arXiv
[15]

arXiv preprint arXiv:2410.07170 , year=

Parameter Efficient Fine-tuning via Explained Variance Adaptation , author=. arXiv preprint arXiv:2410.07170 , year=

work page arXiv
[16]

Advances in Neural Information Processing Systems , volume=

Lora-ga: Low-rank adaptation with gradient approximation , author=. Advances in Neural Information Processing Systems , volume=
[17]

Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics , pages=

DyLoRA: Parameter-efficient tuning of pre-trained models using dynamic search-free low-rank adaptation , author=. Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics , pages=
[18]

Advances in Neural Information Processing Systems , volume=

Compacter: Efficient low-rank hypercomplex adapter layers , author=. Advances in Neural Information Processing Systems , volume=
[19]

Lora+: Efficient low rank adaptation of large models

Lora+: Efficient low rank adaptation of large models , author=. arXiv preprint arXiv:2402.12354 , year=

work page arXiv
[20]

SeedLoRA: A Fusion Approach to Efficient LLM Fine-Tuning , author=
[21]

Olora: Orthonormal low-rank adaptation of large language models

Olora: Orthonormal low-rank adaptation of large language models , author=. arXiv preprint arXiv:2406.01775 , year=

work page arXiv
[22]

Advances in Neural Information Processing Systems , volume=

Corda: Context-oriented decomposition adaptation of large language models for task-aware parameter-efficient fine-tuning , author=. Advances in Neural Information Processing Systems , volume=
[23]

Mistral 7B

Mistral 7B , author=. arXiv preprint arXiv:2310.06825 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[24]

Advances in Neural Information Processing Systems , volume=

Pissa: Principal singular values and singular vectors adaptation of large language models , author=. Advances in Neural Information Processing Systems , volume=
[25]

Forty-first International Conference on Machine Learning , year=

Dora: Weight-decomposed low-rank adaptation , author=. Forty-first International Conference on Machine Learning , year=
[26]

CoRR , volume =

P-tuning v2: Prompt tuning can be comparable to fine-tuning universally across scales and tasks , author=. arXiv preprint arXiv:2110.07602 , year=

work page arXiv
[27]

Prefix-Tuning: Optimizing Continuous Prompts for Generation

Prefix-tuning: Optimizing continuous prompts for generation , author=. arXiv preprint arXiv:2101.00190 , year=

work page internal anchor Pith review arXiv
[28]

AdaLoRA: Adaptive Budget Allocation for Parameter-Efficient Fine-Tuning

Adalora: Adaptive budget allocation for parameter-efficient fine-tuning , author=. arXiv preprint arXiv:2303.10512 , year=

work page internal anchor Pith review arXiv
[29]

Advances in neural information processing systems , volume=

Qlora: Efficient finetuning of quantized llms , author=. Advances in neural information processing systems , volume=
[30]

Towards a Unified View of Parameter-Efficient Transfer Learning , journal =

Towards a unified view of parameter-efficient transfer learning , author=. arXiv preprint arXiv:2110.04366 , year=

work page arXiv
[31]

AdapterFusion: Non-Destructive Task Composition for Transfer Learning , journal =

Adapterfusion: Non-destructive task composition for transfer learning , author=. arXiv preprint arXiv:2005.00247 , year=

work page arXiv 2005
[32]

Adapterdrop: On the efficiency of adapters in transformers,

Adapterdrop: On the efficiency of adapters in transformers , author=. arXiv preprint arXiv:2010.11918 , year=

work page arXiv 2010
[33]

Bitﬁt: Simple parameter-efﬁcient ﬁne-tuning for transformer-based masked language-models

Bitfit: Simple parameter-efficient fine-tuning for transformer-based masked language-models , author=. arXiv preprint arXiv:2106.10199 , year=

work page arXiv
[34]

The Power of Scale for Parameter-Efficient Prompt Tuning

The power of scale for parameter-efficient prompt tuning , author=. arXiv preprint arXiv:2104.08691 , year=

work page internal anchor Pith review arXiv
[35]

Parameter-Efficient Fine-Tuning for Large Models: A Comprehensive Survey

Parameter-efficient fine-tuning for large models: A comprehensive survey , author=. arXiv preprint arXiv:2403.14608 , year=

work page internal anchor Pith review arXiv
[36]

Vera: Vector-based random matrix adaptation,

Vera: Vector-based random matrix adaptation , author=. arXiv preprint arXiv:2310.11454 , year=

work page arXiv
[37]

arXiv preprint arXiv:2402.16842 , year=

Asymmetry in low-rank adapters of foundation models , author=. arXiv preprint arXiv:2402.16842 , year=

work page arXiv
[38]

Milora: Harnessing minor singular components for parameter-efficient llm finetuning , author=. Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers) , pages=

2025
[39]

A rank stabilization scaling factor for fine-tuning with lora,

A rank stabilization scaling factor for fine-tuning with lora , author=. arXiv preprint arXiv:2312.03732 , year=

work page arXiv
[40]

Advances in Neural Information Processing Systems , volume=

Adaptformer: Adapting vision transformers for scalable visual recognition , author=. Advances in Neural Information Processing Systems , volume=
[41]

Training Compute-Optimal Large Language Models

Training compute-optimal large language models , author=. arXiv preprint arXiv:2203.15556 , year=

work page internal anchor Pith review arXiv
[42]

Nature Machine Intelligence , volume=

Parameter-efficient fine-tuning of large-scale pre-trained language models , author=. Nature Machine Intelligence , volume=. 2023 , publisher=

2023
[43]

Parameter-efficient fine-tuning methods for pretrained language models: A critical review and assessment

Parameter-efficient fine-tuning methods for pretrained language models: A critical review and assessment , author=. arXiv preprint arXiv:2312.12148 , year=

work page arXiv
[44]

Advances in Neural Information Processing Systems , volume=

Conditional adapters: Parameter-efficient transfer learning with fast inference , author=. Advances in Neural Information Processing Systems , volume=
[45]

arXiv preprint arXiv:2211.01979 , year=

Tiny-attention adapter: Contexts are more important than the number of parameters , author=. arXiv preprint arXiv:2211.01979 , year=

work page arXiv
[46]

arXiv preprint arXiv:2205.11961 , year=

Attempt: Parameter-efficient multi-task tuning via attentional mixtures of soft prompts , author=. arXiv preprint arXiv:2205.11961 , year=

work page arXiv
[47]

AI Open , volume=

GPT understands, too , author=. AI Open , volume=. 2024 , publisher=

2024
[48]

arXiv preprint arXiv:2101.00297 , year=

Analyzing commonsense emergence in few-shot knowledge models , author=. arXiv preprint arXiv:2101.00297 , year=

work page arXiv
[49]

arXiv preprint arXiv:1909.01066 , year=

Language models as knowledge bases? , author=. arXiv preprint arXiv:1909.01066 , year=

work page arXiv 1909
[50]

How Much Knowledge Can You Pack Into the Parameters of a Language Model?

How much knowledge can you pack into the parameters of a language model? , author=. arXiv preprint arXiv:2002.08910 , year=

work page internal anchor Pith review arXiv 2002
[51]

Nature Reviews Methods Primers , volume=

Principal component analysis , author=. Nature Reviews Methods Primers , volume=. 2022 , publisher=

2022
[52]

Delta tuning: A comprehen- sive study of parameter efﬁcient methods for pre-trained lan- guage models

Delta tuning: A comprehensive study of parameter efficient methods for pre-trained language models , author=. arXiv preprint arXiv:2203.06904 , year=

work page arXiv
[53]

arXiv preprint arXiv:2012.13255 , year=

Intrinsic dimensionality explains the effectiveness of language model fine-tuning , author=. arXiv preprint arXiv:2012.13255 , year=

work page arXiv 2012
[54]

Proceedings of the IEEE international conference on computer vision , pages=

Delving deep into rectifiers: Surpassing human-level performance on imagenet classification , author=. Proceedings of the IEEE international conference on computer vision , pages=
[55]

Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

Importance estimation for neural network pruning , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=
[56]

arXiv preprint arXiv:2105.12002 , year=

Super tickets in pre-trained language models: From model compression to improving generalization , author=. arXiv preprint arXiv:2105.12002 , year=

work page arXiv
[57]

International conference on machine learning , pages=

Platon: Pruning large transformer models with upper confidence bound of weight importance , author=. International conference on machine learning , pages=. 2022 , organization=

2022
[58]

Journal of machine learning research , volume=

Exploring the limits of transfer learning with a unified text-to-text transformer , author=. Journal of machine learning research , volume=
[59]

GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding

GLUE: A multi-task benchmark and analysis platform for natural language understanding , author=. arXiv preprint arXiv:1804.07461 , year=

work page internal anchor Pith review arXiv
[60]

Llama 2: Open Foundation and Fine-Tuned Chat Models

Llama 2: Open foundation and fine-tuned chat models , author=. arXiv preprint arXiv:2307.09288 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[61]

Third international workshop on paraphrasing (IWP2005) , year=

Automatically constructing a corpus of sentential paraphrases , author=. Third international workshop on paraphrasing (IWP2005) , year=
[62]

Transactions of the Association for Computational Linguistics , volume=

Neural network acceptability judgments , author=. Transactions of the Association for Computational Linguistics , volume=. 2019 , publisher=

2019
[63]

Proceedings of the TAC , year=

The fifth PASCAL recognizing textual entailment challenge , author=. Proceedings of the TAC , year=
[64]

Proceedings of the 2013 conference on empirical methods in natural language processing , pages=

Recursive deep models for semantic compositionality over a sentiment treebank , author=. Proceedings of the 2013 conference on empirical methods in natural language processing , pages=

2013
[65]

SIAM Journal on Scientific Computing , volume=

Principal angles between subspaces in an A-based scalar product: algorithms and perturbation estimates , author=. SIAM Journal on Scientific Computing , volume=. 2002 , publisher=

2002
[66]

Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing , pages=

SQuAD: 100,000+ questions for machine comprehension of text , author=. Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing , pages=

2016
[67]

Measuring the intrinsic dimension of objective landscapes,

Measuring the intrinsic dimension of objective landscapes , author=. arXiv preprint arXiv:1804.08838 , year=

work page arXiv
[68]

Llm-adapters: An adapter family for parameter-efficient fine- tuning of large language models,

Llm-adapters: An adapter family for parameter-efficient fine-tuning of large language models , author=. arXiv preprint arXiv:2304.01933 , year=

work page arXiv
[69]

BoolQ: Exploring the Surprising Difficulty of Natural Yes/No Questions

Boolq: Exploring the surprising difficulty of natural yes/no questions , author=. arXiv preprint arXiv:1905.10044 , year=

work page internal anchor Pith review arXiv 1905
[70]

Proceedings of the AAAI conference on artificial intelligence , volume=

Piqa: Reasoning about physical commonsense in natural language , author=. Proceedings of the AAAI conference on artificial intelligence , volume=
[71]

SocialIQA: Commonsense Reasoning about Social Interactions

Socialiqa: Commonsense reasoning about social interactions , author=. arXiv preprint arXiv:1904.09728 , year=

work page internal anchor Pith review arXiv 1904
[72]

HellaSwag: Can a Machine Really Finish Your Sentence?

Hellaswag: Can a machine really finish your sentence? , author=. arXiv preprint arXiv:1905.07830 , year=

work page internal anchor Pith review arXiv 1905
[73]

Communications of the ACM , volume=

Winogrande: An adversarial winograd schema challenge at scale , author=. Communications of the ACM , volume=. 2021 , publisher=

2021
[74]

Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge

Think you have solved question answering? try arc, the ai2 reasoning challenge , author=. arXiv preprint arXiv:1803.05457 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[75]

Can a Suit of Armor Conduct Electricity? A New Dataset for Open Book Question Answering

Can a suit of armor conduct electricity? a new dataset for open book question answering , author=. arXiv preprint arXiv:1809.02789 , year=

work page internal anchor Pith review arXiv
[76]

MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models

Metamath: Bootstrap your own mathematical questions for large language models , author=. arXiv preprint arXiv:2309.12284 , year=

work page internal anchor Pith review arXiv
[77]

Training Verifiers to Solve Math Word Problems

Training verifiers to solve math word problems, 2021 , author=. URL https://arxiv. org/abs/2110.14168 , volume=

work page internal anchor Pith review Pith/arXiv arXiv 2021
[78]

Measuring Mathematical Problem Solving With the MATH Dataset

Measuring mathematical problem solving with the math dataset , author=. arXiv preprint arXiv:2103.03874 , year=

work page internal anchor Pith review arXiv
[79]

Opencodeinterpreter: Integrat- ing code generation with execution and refinement

Opencodeinterpreter: Integrating code generation with execution and refinement , author=. arXiv preprint arXiv:2402.14658 , year=

work page arXiv
[80]

Program Synthesis with Large Language Models

Program synthesis with large language models , author=. arXiv preprint arXiv:2108.07732 , year=

work page internal anchor Pith review Pith/arXiv arXiv

Showing first 80 references.