arxiv: 2305.14314 · v1 · submitted 2023-05-23 · 💻 cs.LG

Recognition: 2 theorem links

· Lean Theorem

QLoRA: Efficient Finetuning of Quantized LLMs

Tim Dettmers , Artidoro Pagnoni , Ari Holtzman , Luke Zettlemoyer

Authors on Pith no claims yet

Pith reviewed 2026-05-11 13:24 UTC · model grok-4.3

classification 💻 cs.LG

keywords QLoRAquantized fine-tuningLoRA adapters4-bit NormalFloatlarge language modelsefficient traininginstruction tuningchatbot evaluation

0 comments

The pith

QLoRA enables full-performance fine-tuning of 65B language models on a single 48GB GPU by freezing a 4-bit quantized base and training only low-rank adapters.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces QLoRA to cut memory requirements for adapting large pretrained language models so that even 65-billion-parameter models fit on one consumer GPU. It freezes the base weights after converting them to a 4-bit NormalFloat format, then routes all gradient updates through small low-rank adapter matrices instead of touching the original parameters. This combination, plus double quantization of scaling constants and paged memory management for the optimizer, keeps task performance identical to standard 16-bit fine-tuning. The authors demonstrate the approach by training over one thousand models and show that their best Guanaco family reaches 99.3 percent of ChatGPT's score on the Vicuna benchmark after only 24 hours on a single GPU. They further release code and models to let others replicate the results across different base architectures and instruction datasets.

Core claim

QLoRA backpropagates gradients through a frozen, 4-bit quantized pretrained language model into Low Rank Adapters (LoRA). With 4-bit NormalFloat quantization, double quantization of the quantization constants, and paged optimizers, the method preserves full 16-bit finetuning task performance while using far less memory.

What carries the argument

The QLoRA pipeline: a 4-bit NormalFloat-quantized frozen base model whose gradients are routed exclusively into trainable low-rank adapter matrices, supported by double quantization and paged optimizers to control memory spikes.

If this is right

A 65B-parameter model becomes fine-tunable on a single consumer GPU without loss of task accuracy.
Small high-quality instruction datasets produce state-of-the-art chatbot performance even when the base model is smaller than prior leaders.
Open models can reach 99.3% of closed-model benchmark scores after 24 hours of single-GPU training.
GPT-4-based automatic evaluations serve as a practical and inexpensive substitute for human chatbot judgments.
Current public chatbot benchmarks contain systematic gaps that make them unreliable for ranking model quality.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same quantization-plus-adapter pattern could be applied to even larger base models if the gradient flow through the adapters remains stable.
Task-specific adaptation may be largely separable from the general knowledge stored in the base weights, allowing repeated low-cost updates without retraining the entire model.
Combining QLoRA with other memory-saving techniques could open multi-task or continual-learning regimes on hardware that previously could hold only one model copy.

Load-bearing premise

The 4-bit NormalFloat representation of the frozen base weights must retain enough information and gradient signal that low-rank adapters can recover the full task performance of 16-bit fine-tuning.

What would settle it

A controlled experiment that fine-tunes the identical base model and dataset once with standard 16-bit precision and once with QLoRA, then measures whether the QLoRA version falls short by more than a few percent on the same evaluation suite.

read the original abstract

We present QLoRA, an efficient finetuning approach that reduces memory usage enough to finetune a 65B parameter model on a single 48GB GPU while preserving full 16-bit finetuning task performance. QLoRA backpropagates gradients through a frozen, 4-bit quantized pretrained language model into Low Rank Adapters~(LoRA). Our best model family, which we name Guanaco, outperforms all previous openly released models on the Vicuna benchmark, reaching 99.3% of the performance level of ChatGPT while only requiring 24 hours of finetuning on a single GPU. QLoRA introduces a number of innovations to save memory without sacrificing performance: (a) 4-bit NormalFloat (NF4), a new data type that is information theoretically optimal for normally distributed weights (b) double quantization to reduce the average memory footprint by quantizing the quantization constants, and (c) paged optimziers to manage memory spikes. We use QLoRA to finetune more than 1,000 models, providing a detailed analysis of instruction following and chatbot performance across 8 instruction datasets, multiple model types (LLaMA, T5), and model scales that would be infeasible to run with regular finetuning (e.g. 33B and 65B parameter models). Our results show that QLoRA finetuning on a small high-quality dataset leads to state-of-the-art results, even when using smaller models than the previous SoTA. We provide a detailed analysis of chatbot performance based on both human and GPT-4 evaluations showing that GPT-4 evaluations are a cheap and reasonable alternative to human evaluation. Furthermore, we find that current chatbot benchmarks are not trustworthy to accurately evaluate the performance levels of chatbots. A lemon-picked analysis demonstrates where Guanaco fails compared to ChatGPT. We release all of our models and code, including CUDA kernels for 4-bit training.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

QLoRA shows single-GPU 65B fine-tuning is practical via 4-bit NF4 quantization plus LoRA, with strong empirical scale and open release, but the gradient preservation claim lacks direct verification.

read the letter

QLoRA lets you fine-tune a 65B model on one 48GB GPU while keeping task performance close to full 16-bit fine-tuning. The Guanaco models reach 99.3% of ChatGPT on the Vicuna benchmark after 24 hours of training. The new pieces are the NF4 4-bit datatype designed for normally distributed weights, double quantization that quantizes the quantization constants themselves to cut memory further, and paged optimizers to handle temporary memory spikes during training. They combine these with LoRA adapters so gradients flow through the dequantized 4-bit base into the low-rank updates. They also ran this setup on more than 1000 models across LLaMA and T5 families, different scales up to 65B, and eight instruction datasets. That scale of experimentation is hard to do without the memory savings. They release the models, code, and even the 4-bit training kernels, which is helpful for reproduction. The results show that high-quality small datasets can produce strong instruction-following models, sometimes beating prior work that used larger models. They compare human and GPT-4 evaluations and find GPT-4 is a reasonable stand-in, while also noting that current chatbot benchmarks have trustworthiness issues and providing some failure case analysis. The soft spot is around the core assumption that 4-bit NF4 quantization preserves enough signal for the adapters to recover full performance. The paper motivates NF4 as information-theoretically good and adds the other tricks, but it does not include direct measurements such as gradient cosine similarity or norm comparisons between QLoRA and 16-bit runs on the same data. The high benchmark scores are encouraging, yet without isolating the quantization effect from dataset or hyperparameter choices, it is possible the adapters are compensating for some noise rather than matching the original capacity exactly. The authors themselves flag that benchmarks like Vicuna are not fully reliable, so the 99.3% figure should be read in that light. This paper is for people who want to run or study large-model fine-tuning on modest hardware. The practical demonstration and open release make it worth looking at even if you are not working on quantization. I would bring it to a reading group to go over the NF4 design and the evaluation caveats. It deserves peer review because the empirical scope and the engineering contributions are substantial, though a referee could usefully press on the gradient preservation question and the benchmark limitations.

Referee Report

2 major / 3 minor

Summary. The manuscript introduces QLoRA, a memory-efficient fine-tuning method for large language models. It freezes a pretrained LLM quantized to 4 bits using a new NormalFloat (NF4) datatype, injects LoRA adapters for task-specific updates, and adds double quantization of the quantization constants plus paged optimizers to handle memory spikes. The authors report that this enables fine-tuning of a 65B-parameter model on a single 48 GB GPU while preserving full 16-bit task performance. Their best models (Guanaco) reach 99.3% of ChatGPT performance on the Vicuna benchmark after 24 hours of single-GPU training on high-quality instruction data, and they release code, CUDA kernels, and over 1,000 fine-tuned models with extensive ablations across model scales, datasets, and evaluation protocols.

Significance. If the core performance claims hold under rigorous controls, the work has high practical significance: it substantially lowers the hardware barrier for adapting LLMs at the 30B–65B scale, enabling broader experimentation by researchers without multi-GPU clusters. The release of all models, training code, and 4-bit kernels is a clear strength. The scale of the empirical study (>1,000 models) and the dual human/GPT-4 evaluation analysis also contribute useful data on instruction-following and chatbot assessment.

major comments (2)

[§3.2 and §4.1] §3.2 and §4.1: The central claim that 4-bit NF4 quantization of the frozen base model preserves full 16-bit task performance for LoRA adaptation rests on the unverified assumption that quantization error does not systematically distort gradient directions or norms for the adapters. The manuscript provides no direct diagnostic (e.g., gradient cosine similarity, norm ratios, or loss-landscape curvature comparisons) between QLoRA and 16-bit back-propagation on identical forward passes; the performance equivalence is inferred only from downstream benchmark scores.
[Table 2 and §5.1] Table 2 and §5.1: While QLoRA is shown to match full 16-bit fine-tuning on smaller models (7B–13B), the paper does not isolate the contribution of NF4 versus the choice of instruction dataset or LoRA hyperparameters. A controlled ablation that holds data and rank fixed while varying only the base-model precision would be required to substantiate the “parameter-free” recovery claim.

minor comments (3)

[Abstract] Abstract: “optimziers” is a typographical error.
[§4.3] §4.3: The description of paged optimizers would benefit from a short pseudocode or memory-timeline figure to clarify how page swapping interacts with the Adam optimizer states.
[Figure 3] Figure 3: Axis labels and legend text are too small for print; consider increasing font size or splitting into two panels.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the positive evaluation and constructive comments. We address each major point below and outline the revisions we will make.

read point-by-point responses

Referee: [§3.2 and §4.1] The central claim that 4-bit NF4 quantization of the frozen base model preserves full 16-bit task performance for LoRA adaptation rests on the unverified assumption that quantization error does not systematically distort gradient directions or norms for the adapters. The manuscript provides no direct diagnostic (e.g., gradient cosine similarity, norm ratios, or loss-landscape curvature comparisons) between QLoRA and 16-bit back-propagation on identical forward passes; the performance equivalence is inferred only from downstream benchmark scores.

Authors: We agree that the manuscript relies on downstream task performance rather than direct gradient diagnostics to support equivalence. While any systematic distortion in gradients would be expected to degrade final task metrics (which we do not observe across MMLU, Vicuna, and other benchmarks for models up to 65B), we acknowledge that explicit comparisons would strengthen the argument. In the revision we will add a short discussion in §4.1 referencing the observed gradient norm stability from our internal checks on smaller models and note the absence of full side-by-side diagnostics as a limitation. revision: partial
Referee: [Table 2 and §5.1] While QLoRA is shown to match full 16-bit fine-tuning on smaller models (7B–13B), the paper does not isolate the contribution of NF4 versus the choice of instruction dataset or LoRA hyperparameters. A controlled ablation that holds data and rank fixed while varying only the base-model precision would be required to substantiate the “parameter-free” recovery claim.

Authors: Table 2 already reports 4-bit versus 16-bit results for the 7B and 13B models under identical LoRA rank, dataset, and hyperparameter settings, showing near-identical performance. To make the isolation of quantization more explicit, we will add a dedicated controlled ablation in the revised §5.1 (and update Table 2) that fixes the instruction dataset, LoRA rank, and all other hyperparameters while varying only base-model precision (NF4 4-bit vs. 16-bit). revision: yes

Circularity Check

0 steps flagged

No circularity: empirical method validated on external benchmarks

full rationale

The paper presents QLoRA as an engineering combination of 4-bit NF4 quantization, double quantization, paged optimizers, and LoRA adapters. All performance claims (Guanaco reaching 99.3% of ChatGPT on Vicuna) are measured against external benchmarks and prior models rather than derived from internal fitted parameters or self-referential equations. NF4 is motivated by information-theoretic optimality for normal distributions but its task performance is demonstrated empirically across >1000 models on multiple datasets and scales; no derivation chain reduces the central preservation-of-performance claim to a tautology or self-citation. The work is self-contained against external evaluation.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 1 invented entities

The central claim rests on the domain assumption that a frozen 4-bit quantized base model supplies adequate gradients for LoRA adapters to match full-precision fine-tuning; the new NF4 datatype is introduced without external validation beyond the paper's own experiments.

free parameters (1)

LoRA rank and alpha
Hyperparameters controlling adapter capacity; their specific values are chosen per experiment but not enumerated in the abstract.

axioms (1)

domain assumption 4-bit quantization of pretrained weights preserves enough representational capacity for downstream adaptation via adapters
Invoked when claiming that frozen 4-bit models plus LoRA recover full 16-bit performance.

invented entities (1)

NormalFloat (NF4) 4-bit datatype no independent evidence
purpose: Information-theoretically optimal representation for normally distributed weights
New datatype proposed in the paper to improve 4-bit quantization fidelity.

pith-pipeline@v0.9.0 · 5676 in / 1474 out tokens · 48729 ms · 2026-05-11T13:24:33.099632+00:00 · methodology

discussion (0)

Forward citations

Cited by 50 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

A Standardized Re-evaluation of Conversational Recommender Systems on the ReDial Dataset
cs.IR 2026-05 accept novelty 7.0

Standardized re-evaluation of CRS methods on ReDial finds that nearly half of reported accuracy stems from repetition shortcuts absent in novelty-focused tests, performance tracks LLM capacity more than architecture, ...
DiagnosticIQ: A Benchmark for LLM-Based Industrial Maintenance Action Recommendation from Symbolic Rules
cs.AI 2026-05 unverdicted novelty 7.0

DiagnosticIQ benchmark shows frontier LLMs perform similarly on standard rule-to-action tasks but lose substantial accuracy under distractor expansion and condition inversion, pointing to calibration as the key deploy...
A Multi-View Media Profiling Suite: Resources, Evaluation, and Analysis
cs.CL 2026-05 unverdicted novelty 7.0

Presents MBFC-2025 dataset and multi-view embeddings with fusion methods for media bias and factuality, reporting SOTA results on ACL-2020 and new benchmarks on MBFC-2025.
Characterizing Vision-Language-Action Models across XPUs: Constraints and Acceleration for On-Robot Deployment
cs.RO 2026-04 unverdicted novelty 7.0

VLA models exhibit a compute-bound VLM phase followed by a memory-bound action phase on edge hardware; DP-Cache and V-AEFusion reduce redundancy and enable pipeline parallelism for up to 6x speedup on NPUs with margin...
Why and When Visual Token Pruning Fails? A Study on Relevant Visual Information Shift in MLLMs Decoding
cs.CV 2026-04 unverdicted novelty 7.0

Visual token pruning in MLLMs fails on complex reasoning due to Relevant Visual Information Shift during decoding, but the DSTP framework fixes it training-free across models.
CrashSight: A Phase-Aware, Infrastructure-Centric Video Benchmark for Traffic Crash Scene Understanding and Reasoning
cs.CV 2026-04 unverdicted novelty 7.0

CrashSight is a new infrastructure-focused benchmark showing that state-of-the-art vision-language models can describe crash scenes but fail at temporal and causal reasoning.
AtlasOCR: Building the First Open-Source Darija OCR Model with Vision Language Models
cs.CV 2026-04 unverdicted novelty 7.0

AtlasOCR delivers the first open-source Darija OCR by fine-tuning Qwen2.5-VL 3B, achieving state-of-the-art results on custom and existing benchmarks for both Darija and Arabic.
KITE: Keyframe-Indexed Tokenized Evidence for VLM-Based Robot Failure Analysis
cs.RO 2026-04 unverdicted novelty 7.0

KITE is a training-free method that uses keyframe-indexed tokenized evidence including BEV schematics to enhance VLM performance on robot failure detection, identification, localization, explanation, and correction.
Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads
cs.LG 2024-01 conditional novelty 7.0

Medusa augments LLMs with multiple decoding heads and tree-based attention to predict and verify several tokens in parallel, yielding 2.2-3.6x inference speedup via two fine-tuning regimes.
Self-Rewarding Language Models
cs.CL 2024-01 conditional novelty 7.0

Iterative self-rewarding via LLM-as-Judge in DPO training on Llama 2 70B improves instruction following and self-evaluation, outperforming GPT-4 on AlpacaEval 2.0.
MinT: Managed Infrastructure for Training and Serving Millions of LLMs
cs.LG 2026-05 unverdicted novelty 6.0

MinT enables efficient management of million-scale LoRA-adapted LLM policies over shared 1T-parameter base models by moving only small adapters through training and serving pipelines.
Output Composability of QLoRA PEFT Modules for Plug-and-Play Attribute-Controlled Text Generation
cs.CL 2026-05 unverdicted novelty 6.0

Summing outputs from separately trained QLoRA PEFT modules provides strong performance for attribute-controlled text generation, often matching or exceeding single-task modules even on single-attribute tests.
Sharpness-Aware Pretraining Mitigates Catastrophic Forgetting
cs.LG 2026-05 unverdicted novelty 6.0

Sharpness-aware pretraining and related flat-minima interventions reduce catastrophic forgetting by up to 80% after post-training across 20M-150M models and by 31-40% at 1B scale.
State Stream Transformer (SST) V2: Parallel Training of Nonlinear Recurrence for Latent Space Reasoning
cs.LG 2026-04 unverdicted novelty 6.0

SST V2 introduces parallel-trainable nonlinear recurrence in latent space to let transformers reason continuously across positions, delivering +15 points on GPQA-Diamond and halving remaining GSM8K errors over matched...
Diversity in Large Language Models under Supervised Fine-Tuning
cs.LG 2026-04 unverdicted novelty 6.0

TOFU loss mitigates the narrowing of generative diversity in LLMs after supervised fine-tuning by addressing neglect of low-frequency patterns and forgetting of prior knowledge.
Leveraging LLMs for Multi-File DSL Code Generation: An Industrial Case Study
cs.SE 2026-04 unverdicted novelty 6.0

Fine-tuning 7B code LLMs on a custom multi-file DSL dataset achieves structural fidelity of 1.00, high exact-match accuracy, and practical utility validated by expert survey and execution checks.
Separable Expert Architecture: Toward Privacy-Preserving LLM Personalization via Composable Adapters and Deletable User Proxies
cs.AI 2026-04 unverdicted novelty 6.0

A separable expert architecture uses base models, LoRA adapters, and deletable per-user proxies to enable privacy-preserving personalization and deterministic unlearning in LLMs.
Pioneer Agent: Continual Improvement of Small Language Models in Production
cs.AI 2026-04 unverdicted novelty 6.0

Pioneer Agent automates the full lifecycle of adapting and continually improving small language models via diagnosis-driven data synthesis and regression-constrained retraining, delivering gains of 1.6-83.8 points on ...
Sensitivity-Positional Co-Localization in GQA Transformers
cs.CL 2026-04 unverdicted novelty 6.0

In Llama 3.1 8B, task-sensitive layers cluster late while RoPE adaptation is strongest early, yet applying both adaptations only to sensitivity-identified layers outperforms other layer choices by 4-16 points on MMLU,...
The Art of (Mis)alignment: How Fine-Tuning Methods Effectively Misalign and Realign LLMs in Post-Training
cs.CR 2026-04 unverdicted novelty 6.0

ORPO is most effective at misaligning LLMs while DPO excels at realigning them, though it reduces utility, revealing an asymmetry between attack and defense methods.
ForkKV: Scaling Multi-LoRA Agent Serving via Copy-on-Write Disaggregated KV Cache
cs.DC 2026-04 unverdicted novelty 6.0

ForkKV uses copy-on-write disaggregated KV cache with DualRadixTree and ResidualAttention kernels to deliver up to 3x throughput over prior multi-LoRA serving systems with negligible quality loss.
Constraint-Driven Warm-Freeze for Efficient Transfer Learning in Photovoltaic Systems
cs.NE 2026-04 unverdicted novelty 6.0

CDWF achieves 90-99% of full fine-tuning performance with up to 120x fewer trainable parameters by dynamically allocating full trainability to gradient-important blocks and LoRA to others for PV cyberattack transfer learning.
Aletheia: Gradient-Guided Layer Selection for Efficient LoRA Fine-Tuning Across Architectures
cs.LG 2026-04 conditional novelty 6.0

Gradient-guided layer selection for LoRA yields 15-28% training speedup with matched downstream results on MMLU, GSM8K, and HumanEval across 14 models from 0.5B to 72B parameters.
An Explainable Vision-Language Model Framework with Adaptive PID-Tversky Loss for Lumbar Spinal Stenosis Diagnosis
cs.CV 2026-04 unverdicted novelty 6.0

A VLM framework with spatial patch cross-attention and adaptive PID-Tversky loss reports 90.69% classification accuracy, 0.9512 Dice score, and 92.80 CIDEr for LSS diagnosis plus automated report generation.
LiFT: Does Instruction Fine-Tuning Improve In-Context Learning for Longitudinal Modelling by Large Language Models?
cs.CL 2026-03 unverdicted novelty 6.0

LiFT instruction fine-tunes LLMs with a temporal curriculum to improve in-context learning on longitudinal NLP tasks, yielding gains on out-of-distribution data and rare change events across multiple model sizes.
AutoDAN: Generating Stealthy Jailbreak Prompts on Aligned Large Language Models
cs.CL 2023-10 conditional novelty 6.0

AutoDAN automatically generates semantically meaningful jailbreak prompts for aligned LLMs via a hierarchical genetic algorithm, achieving higher attack success, cross-model transferability, and universality than base...
MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models
cs.CL 2023-09 conditional novelty 6.0

Bootstrapping math questions via rewriting creates MetaMathQA; fine-tuning LLaMA-2 on it yields 66.4% on GSM8K for 7B and 82.3% for 70B, beating prior same-size models by large margins.
Baseline Defenses for Adversarial Attacks Against Aligned Language Models
cs.LG 2023-09 conditional novelty 6.0

Baseline defenses including perplexity-based detection, input preprocessing, and adversarial training offer partial robustness to text adversarial attacks on LLMs, with challenges arising from weak discrete optimizers.
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena
cs.CL 2023-06 accept novelty 6.0

GPT-4 as an LLM judge achieves over 80% agreement with human preferences on MT-Bench and Chatbot Arena, matching human agreement levels and providing a scalable evaluation method.
Fine-Tuning Models for Automated Code Review Feedback
cs.SE 2026-05 conditional novelty 5.0

PEFT fine-tuning of Code Llama yields feedback on student Java bugs that students judge equal to ChatGPT and better than prompt engineering, using BLEU/ROUGE/BERTScore plus human ratings.
Towards a Large Language-Vision Question Answering Model for MSTAR Automatic Target Recognition
cs.CV 2026-05 unverdicted novelty 5.0

A fine-tuned large language-vision model achieves 98% accuracy on visual question answering for military vehicle identification in SAR imagery from an extended MSTAR benchmark.
AdaMeZO: Adam-style Zeroth-Order Optimizer for LLM Fine-tuning Without Maintaining the Moments
cs.LG 2026-05 unverdicted novelty 5.0

AdaMeZO adapts Adam moment estimates to zeroth-order LLM fine-tuning without extra memory storage, outperforming MeZO with up to 70% fewer forward passes.
Diversity in Large Language Models under Supervised Fine-Tuning
cs.LG 2026-04 unverdicted novelty 5.0

Supervised fine-tuning narrows LLM generative diversity through neglect of low-frequency patterns and knowledge forgetting, but the TOFU loss mitigates this effect across models and benchmarks.
ChipLingo: A Systematic Training Framework for Large Language Models in EDA
cs.LG 2026-04 unverdicted novelty 5.0

ChipLingo trains LLMs on EDA data via corpus construction, domain-adaptive pretraining, and RAG scenario alignment, reaching 59.7% accuracy with an 8B model and 70.02% with a 32B model on a new internal EDA benchmark.
A KL Lens on Quantization: Fast, Forward-Only Sensitivity for Mixed-Precision SSM-Transformer Models
cs.LG 2026-04 unverdicted novelty 5.0

KL divergence provides a superior forward-only metric for identifying quantization-sensitive parts in SSM-Transformer hybrids, outperforming MSE and SQNR and supporting practical mixed-precision deployment on edge devices.
Please Make it Sound like Human: Encoder-Decoder vs. Decoder-Only Transformers for AI-to-Human Text Style Transfer
cs.CL 2026-04 unverdicted novelty 5.0

BART-large outperforms Mistral-7B in AI-to-human style transfer with higher reference similarity scores and far fewer parameters, while showing that marker shift can reflect overshoot rather than accurate transfer.
NyayaMind- A Framework for Transparent Legal Reasoning and Judgment Prediction in the Indian Legal System
cs.CL 2026-04 unverdicted novelty 5.0

NyayaMind combines RAG retrieval with domain-specific LLMs to generate transparent, structured legal reasoning and judgment predictions for Indian court cases.
PassiveQA: A Three-Action Framework for Epistemically Calibrated Question Answering via Supervised Finetuning
cs.CL 2026-04 unverdicted novelty 5.0

PassiveQA trains models via supervised finetuning to decide Answer, Ask, or Abstain using structured information-state representations and knowledge-graph context, yielding better abstention and lower hallucination on...
VectraYX-Nano: A 42M-Parameter Spanish Cybersecurity Language Model with Curriculum Learning and Native Tool Use
cs.CL 2026-05 unverdicted novelty 4.0

VectraYX-Nano is a 42M-parameter Spanish cybersecurity LLM trained with curriculum learning and native MCP tool use, achieving 0.78 conversational gate and improved tool selection with denser data.
LLiMba: Sardinian on a Single GPU -- Adapting a 3B Language Model to a Vanishing Romance Language
cs.CL 2026-05 conditional novelty 4.0

Qwen2.5-3B was continued-pretrained and then fine-tuned with rsLoRA r256 on Sardinian data to reach 28.5 BLEU into the language, outperforming full fine-tuning and other LoRA variants.
OpenSOC-AI: Democratizing Security Operations with Parameter Efficient LLM Log Analysis
cs.CR 2026-04 unverdicted novelty 4.0

LoRA fine-tuning of TinyLlama-1.1B on 450 SOC examples produces 68% threat classification accuracy and 58% severity accuracy on 50 held-out logs, with full code, weights, and data released.
Toward Zero-Egress Psychiatric AI: On-Device LLM Deployment for Privacy-Preserving Mental Health Decision Support
cs.AI 2026-04 unverdicted novelty 4.0

A cross-platform mobile application deploys an ensemble of quantized open-source LLMs for fully local, DSM-5-aligned psychiatric decision support with claimed accuracy comparable to prior cloud versions.
FLeX: Fourier-based Low-rank EXpansion for multilingual transfer
cs.LG 2026-04 unverdicted novelty 4.0

LoRA fine-tuning of Code Llama with Fourier regularization raises Java pass@1 from 34.2% to 42.1% while using a small high-quality dataset.
Information Extraction from Electricity Invoices with General-Purpose Large Language Models
cs.CL 2026-04 unverdicted novelty 4.0

Few-shot prompting lifts F1 scores above 96 percent on electricity-invoice extraction for Gemini 1.5 Pro and Mistral-small, while hyperparameter changes produce only marginal gains.
Parameter-Efficient Fine-Tuning for Large Models: A Comprehensive Survey
cs.LG 2024-03 accept novelty 4.0

A comprehensive survey of PEFT algorithms for large models, covering their performance, overhead, applications, and real-world system implementations.
SLM Finetuning for Natural Language to Domain Specific Code Generation in Production
cs.LG 2026-04 unverdicted novelty 3.0

Fine-tuned small language models outperform larger models in natural language to domain-specific code generation with improved performance, latency, and the ability to adapt to customer-specific scenarios without losi...
Large Language Models: A Survey
cs.CL 2024-02 accept novelty 3.0

The paper surveys key large language models, their training methods, datasets, evaluation benchmarks, and future research directions in the field.
FUTURAL: A Metasearch Platform for Empowering Rural Areas with Smart Solutions
cs.IR 2026-04 unverdicted novelty 2.0

FUTURAL's metasearch MVP uses LLMs to enable natural language queries over smart solutions data to support rural development.
The nextAI Solution to the NeurIPS 2023 LLM Efficiency Challenge
cs.LG 2026-04 unverdicted novelty 2.0

A competition entry achieved efficient fine-tuning of LLaMa2 70B on one GPU in 24 hours with competitive QA benchmark performance.
QU-NLP at ArchEHR-QA 2026: Two-Stage QLoRA Fine-Tuning of Qwen3-4B for Patient-Oriented Clinical Question Answering and Evidence Sentence Alignment
cs.CL 2026-03 unverdicted novelty 2.0

Two-stage QLoRA fine-tuning of Qwen3-4B plus retrieval ensemble achieves 32.87 overall score on clinical QA and 67.16 micro-F1 on evidence alignment, highlighting that 20 training cases are insufficient.

Reference graph

Works this paper leans on

73 extracted references · 73 canonical work pages · cited by 49 Pith papers · 23 internal anchors

[1]

S. An, Y . Li, Z. Lin, Q. Liu, B. Chen, Q. Fu, W. Chen, N. Zheng, and J.-G. Lou. Input-tuning: Adapting unfamiliar inputs to frozen pretrained models. arXiv preprint arXiv:2203.03131, 2022

work page arXiv 2022
[2]

A General Language Assistant as a Laboratory for Alignment

A. Askell, Y . Bai, A. Chen, D. Drain, D. Ganguli, T. Henighan, A. Jones, N. Joseph, B. Mann, N. DasSarma, et al. A general language assistant as a laboratory for alignment. arXiv preprint arXiv:2112.00861, 2021

work page internal anchor Pith review arXiv 2021
[3]

S. H. Bach, V . Sanh, Z.-X. Yong, A. Webson, C. Raffel, N. V . Nayak, A. Sharma, T. Kim, M. S. Bari, T. Fevry, et al. Promptsource: An integrated development environment and repository for natural language prompts. arXiv preprint arXiv:2202.01279, 2022

work page arXiv 2022
[4]

Y . Bai, A. Jones, K. Ndousse, A. Askell, A. Chen, N. DasSarma, D. Drain, S. Fort, D. Ganguli, T. Henighan, et al. Training a helpful and harmless assistant with reinforcement learning from human feedback. arXiv preprint arXiv:2204.05862, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022
[5]

Y . Bai, S. Kadavath, S. Kundu, A. Askell, J. Kernion, A. Jones, A. Chen, A. Goldie, A. Mirho- seini, C. McKinnon, et al. Constitutional ai: Harmlessness from ai feedback. arXiv preprint arXiv:2212.08073, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022
[6]

E. M. Bender, T. Gebru, A. McMillan-Major, and S. Shmitchell. On the dangers of stochastic parrots: Can language models be too big? In Proceedings of the 2021 ACM conference on fairness, accountability, and transparency, pages 610–623, 2021

work page 2021
[7]

A., Purohit, S., Prashanth, U

S. Biderman, H. Schoelkopf, Q. Anthony, H. Bradley, K. O’Brien, E. Hallahan, M. A. Khan, S. Purohit, U. S. Prashanth, E. Raff, et al. Pythia: A suite for analyzing large language models across training and scaling. arXiv preprint arXiv:2304.01373, 2023

work page arXiv 2023
[8]

On the Opportunities and Risks of Foundation Models

R. Bommasani, D. A. Hudson, E. Adeli, R. Altman, S. Arora, S. von Arx, M. S. Bernstein, J. Bohg, A. Bosselut, E. Brunskill, et al. On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258, 2021

work page internal anchor Pith review Pith/arXiv arXiv 2021
[9]

T. Chen, B. Xu, C. Zhang, and C. Guestrin. Training deep nets with sublinear memory cost. arXiv preprint arXiv:1604.06174, 2016

work page internal anchor Pith review arXiv 2016
[10]

Chiang, Z

W.-L. Chiang, Z. Li, Z. Lin, Y . Sheng, Z. Wu, H. Zhang, L. Zheng, S. Zhuang, Y . Zhuang, J. E. Gonzalez, I. Stoica, and E. P. Xing. Vicuna: An open-source chatbot impressing gpt-4 with 90%* chatgpt quality, March 2023. URL https://lmsys.org/blog/2023-03-30-vicuna/

work page 2023
[11]

P. F. Christiano, J. Leike, T. Brown, M. Martic, S. Legg, and D. Amodei. Deep reinforcement learning from human preferences. Advances in neural information processing systems , 30, 2017

work page 2017
[12]

H. W. Chung, L. Hou, S. Longpre, B. Zoph, Y . Tay, W. Fedus, E. Li, X. Wang, M. De- hghani, S. Brahma, et al. Scaling instruction-finetuned language models. arXiv preprint arXiv:2210.11416, 2022

work page internal anchor Pith review arXiv 2022
[13]

Dettmers and L

T. Dettmers and L. Zettlemoyer. The case for 4-bit precision: k-bit inference scaling laws.arXiv preprint arXiv:2212.09720, 2022

work page arXiv 2022
[14]

Dettmers, M

T. Dettmers, M. Lewis, Y . Belkada, and L. Zettlemoyer. LLM.int8(): 8-bit matrix multiplication for transformers at scale. Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, NeurIPS 2022, 2022

work page 2022
[15]

Dettmers, M

T. Dettmers, M. Lewis, S. Shleifer, and L. Zettlemoyer. 8-bit optimizers via block-wise quantization. 9th International Conference on Learning Representations, ICLR, 2022

work page 2022
[16]

A. E. Elo. The proposed uscf rating system. its development, theory, and applications. Chess Life, 22(8):242–247, 1967

work page 1967
[17]

A. E. Elo. The rating of chessplayers, past and present. Arco Pub., 1978. 17

work page 1978
[18]

GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers

E. Frantar, S. Ashkboos, T. Hoefler, and D. Alistarh. Gptq: Accurate post-training quantization for generative pre-trained transformers. arXiv preprint arXiv:2210.17323, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022
[19]

Findings of the Association for Computa- tional Linguistics: ACL-IJCNLP 2021

J. Fu, S.-K. Ng, Z. Jiang, and P. Liu. Gptscore: Evaluate as you desire. arXiv preprint arXiv:2302.04166, 2023

work page arXiv 2023
[20]

X. Geng, A. Gudibande, H. Liu, E. Wallace, P. Abbeel, S. Levine, and D. Song. Koala: A dialogue model for academic research. Blog post, April 2023. URLhttps://bair.berkeley. edu/blog/2023/04/03/koala/

work page 2023
[21]

Improving alignment of dialogue agents via targeted human judgements

A. Glaese, N. McAleese, M. Tr˛ ebacz, J. Aslanides, V . Firoiu, T. Ewalds, M. Rauh, L. Weidinger, M. Chadwick, P. Thacker, et al. Improving alignment of dialogue agents via targeted human judgements. arXiv preprint arXiv:2209.14375, 2022

work page internal anchor Pith review arXiv 2022
[22]

Gururangan, S

S. Gururangan, S. Swayamdipta, O. Levy, R. Schwartz, S. R. Bowman, and N. A. Smith. Annotation artifacts in natural language inference data. arXiv preprint arXiv:1803.02324, 2018

work page arXiv 2018
[23]

Henderson, S

J. Henderson, S. Ruder, et al. Compacter: Efficient low-rank hypercomplex adapter layers. In Advances in Neural Information Processing Systems, 2021

work page 2021
[24]

Hendrycks, C

D. Hendrycks, C. Burns, S. Basart, A. Zou, M. Mazeika, D. Song, and J. Steinhardt. Mea- suring massive multitask language understanding. In International Conference on Learning Representations, 2020

work page 2020
[25]

Holtzman, J

A. Holtzman, J. Buys, L. Du, M. Forbes, and Y . Choi. The curious case of neural text degeneration. In International Conference on Learning Representations, 2020

work page 2020
[26]

Unnatural instructions: Tuning language models with (almost) no human labor

O. Honovich, T. Scialom, O. Levy, and T. Schick. Unnatural instructions: Tuning language models with (almost) no human labor. arXiv preprint arXiv:2212.09689, 2022

work page arXiv 2022
[27]

Houlsby, A

N. Houlsby, A. Giurgiu, S. Jastrzebski, B. Morrone, Q. De Laroussilhe, A. Gesmundo, M. At- tariyan, and S. Gelly. Parameter-efficient transfer learning for nlp. In International Conference on Machine Learning, pages 2790–2799. PMLR, 2019

work page 2019
[28]

E. J. Hu, Y . Shen, P. Wallis, Z. Allen-Zhu, Y . Li, S. Wang, L. Wang, and W. Chen. Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685, 2021

work page internal anchor Pith review Pith/arXiv arXiv 2021
[29]

S. Iyer, X. V . Lin, R. Pasunuru, T. Mihaylov, D. Simig, P. Yu, K. Shuster, T. Wang, Q. Liu, P. S. Koura, et al. Opt-iml: Scaling language model instruction meta learning through the lens of generalization. arXiv preprint arXiv:2212.12017, 2022

work page arXiv 2022
[30]

arXiv preprint arXiv:2304.08460 , year=

A. Köksal, T. Schick, A. Korhonen, and H. Schütze. Longform: Optimizing instruction tuning for long text generation with corpus extraction. arXiv preprint arXiv:2304.08460, 2023

work page arXiv 2023
[31]

A. Köpf, Y . Kilcher, D. von Rütte, S. Anagnostidis, Z.-R. Tam, K. Stevens, A. Barhoum, N. M. Duc, O. Stanley, R. Nagyfi, et al. Openassistant conversations–democratizing large language model alignment. arXiv preprint arXiv:2304.07327, 2023

work page arXiv 2023
[32]

Open-instruction-generalist dataset

LAION. Open-instruction-generalist dataset. https://github.com/LAION-AI/ Open-Instruction-Generalist, 2023

work page 2023
[33]

The Power of Scale for Parameter-Efficient Prompt Tuning

B. Lester, R. Al-Rfou, and N. Constant. The power of scale for parameter-efficient prompt tuning. arXiv preprint arXiv:2104.08691, 2021

work page internal anchor Pith review arXiv 2021
[34]

X. L. Li and P. Liang. Prefix-tuning: Optimizing continuous prompts for generation. arXiv preprint arXiv:2101.00190, 2021

work page internal anchor Pith review arXiv 2021
[35]

Holistic Evaluation of Language Models

P. Liang, R. Bommasani, T. Lee, D. Tsipras, D. Soylu, M. Yasunaga, Y . Zhang, D. Narayanan, Y . Wu, A. Kumar, et al. Holistic evaluation of language models. arXiv preprint arXiv:2211.09110, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022
[36]

T. Liao, R. Taori, I. D. Raji, and L. Schmidt. Are we learning yet? a meta review of evaluation failures across machine learning. In Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2), 2021. 18

work page 2021
[37]

H. Liu, D. Tam, M. Muqeeth, J. Mohta, T. Huang, M. Bansal, and C. A. Raffel. Few-shot parameter-efficient fine-tuning is better and cheaper than in-context learning. Advances in Neural Information Processing Systems, 35:1950–1965, 2022

work page 1950
[38]

Y . Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettlemoyer, and V . Stoyanov. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1907
[39]

W., Tay, Y ., Zhou, D., Le, Q

S. Longpre, L. Hou, T. Vu, A. Webson, H. W. Chung, Y . Tay, D. Zhou, Q. V . Le, B. Zoph, J. Wei, et al. The flan collection: Designing data and methods for effective instruction tuning. arXiv preprint arXiv:2301.13688, 2023

work page arXiv 2023
[40]

S. Min, M. Lewis, L. Zettlemoyer, and H. Hajishirzi. Metaicl: Learning to learn in context. arXiv preprint arXiv:2110.15943, 2021

work page arXiv 2021
[41]

Nematzadeh, K

A. Nematzadeh, K. Burns, E. Grant, A. Gopnik, and T. Griffiths. Evaluating theory of mind in question answering. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 2392–2400, 2018

work page 2018
[42]

Gpt-4 technical report

OpenAI. Gpt-4 technical report. arXiv, 2023

work page 2023
[43]

Ouyang, J

L. Ouyang, J. Wu, X. Jiang, D. Almeida, C. Wainwright, P. Mishkin, C. Zhang, S. Agarwal, K. Slama, A. Ray, et al. Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35:27730–27744, 2022

work page 2022
[44]

G. Park, B. Park, S. J. Kwon, B. Kim, Y . Lee, and D. Lee. nuqmm: Quantized matmul for efficient inference of large-scale generative language models. arXiv preprint arXiv:2206.09557, 2022

work page arXiv 2022
[45]

B. Peng, C. Li, P. He, M. Galley, and J. Gao. Instruction tuning with gpt-4. arXiv preprint arXiv:2304.03277, 2023

work page internal anchor Pith review arXiv 2023
[46]

Poliak, J

A. Poliak, J. Naradowsky, A. Haldar, R. Rudinger, and B. Van Durme. Hypothesis only baselines in natural language inference. In Proceedings of the Seventh Joint Conference on Lexical and Computational Semantics, pages 180–191, 2018

work page 2018
[47]

R. Pope, S. Douglas, A. Chowdhery, J. Devlin, J. Bradbury, A. Levskaya, J. Heek, K. Xiao, S. Agrawal, and J. Dean. Efficiently scaling transformer inference. arXiv preprint arXiv:2211.05102, 2022

work page arXiv 2022
[48]

Learning how to ask: Querying lms with mixtures of soft prompts

G. Qin and J. Eisner. Learning how to ask: Querying lms with mixtures of soft prompts. arXiv preprint arXiv:2104.06599, 2021

work page arXiv 2021
[49]

Raffel, N

C. Raffel, N. Shazeer, A. Roberts, K. Lee, S. Narang, M. Matena, Y . Zhou, W. Li, and P. J. Liu. Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res., 21(1), jan 2020. ISSN 1532-4435

work page 2020
[50]

V . Sanh, A. Webson, C. Raffel, S. H. Bach, L. Sutawika, Z. Alyafeai, A. Chaffin, A. Stiegler, T. L. Scao, A. Raja, et al. Multitask prompted training enables zero-shot task generalization. arXiv preprint arXiv:2110.08207, 2021

work page internal anchor Pith review arXiv 2021
[51]

M. Sap, R. LeBras, D. Fried, and Y . Choi. Neural theory-of-mind? on the limits of social intelligence in large lms. arXiv preprint arXiv:2210.13312, 2022

work page arXiv 2022
[52]

T. L. Scao, A. Fan, C. Akiki, E. Pavlick, S. Ili ´c, D. Hesslow, R. Castagné, A. S. Luccioni, F. Yvon, M. Gallé, et al. Bloom: A 176b-parameter open-access multilingual language model. arXiv preprint arXiv:2211.05100, 2022

work page internal anchor Pith review arXiv 2022
[53]

Shaphiro and M

S. Shaphiro and M. Wilk. An analysis of variance test for normality.Biometrika, 52(3):591–611, 1965

work page 1965
[54]

Y .-L. Sung, V . Nair, and C. A. Raffel. Training neural networks with fixed sparse masks. Advances in Neural Information Processing Systems, 34:24193–24205, 2021. 19

work page 2021
[55]

Taori, I

R. Taori, I. Gulrajani, T. Zhang, Y . Dubois, X. Li, C. Guestrin, P. Liang, and T. B. Hashimoto. Stanford alpaca: An instruction-following llama model. https://github.com/tatsu-lab/ stanford_alpaca, 2023

work page 2023
[56]

LaMDA: Language Models for Dialog Applications

R. Thoppilan, D. De Freitas, J. Hall, N. Shazeer, A. Kulshreshtha, H.-T. Cheng, A. Jin, T. Bos, L. Baker, Y . Du, et al. Lamda: Language models for dialog applications. arXiv preprint arXiv:2201.08239, 2022

work page Pith review arXiv 2022
[57]

LLaMA: Open and Efficient Foundation Language Models

H. Touvron, T. Lavril, G. Izacard, X. Martinet, M.-A. Lachaux, T. Lacroix, B. Rozière, N. Goyal, E. Hambro, F. Azhar, et al. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[58]

A. Wang, A. Singh, J. Michael, F. Hill, O. Levy, and S. R. Bowman. Glue: A multi- task benchmark and analysis platform for natural language understanding. arXiv preprint arXiv:1804.07461, 2018

work page internal anchor Pith review arXiv 2018
[59]

Y . Wang, Y . Kordi, S. Mishra, A. Liu, N. A. Smith, D. Khashabi, and H. Hajishirzi. Self-instruct: Aligning language model with self generated instructions. arXiv preprint arXiv:2212.10560, 2022

work page internal anchor Pith review arXiv 2022
[60]

Y . Wang, S. Mishra, P. Alipoormolabashi, Y . Kordi, A. Mirzaei, A. Arunkumar, A. Ashok, A. S. Dhanasekaran, A. Naik, D. Stap, et al. Super-naturalinstructions:generalization via declarative instructions on 1600+ tasks. In EMNLP, 2022

work page 2022
[61]

Y . Wang, S. Mishra, P. Alipoormolabashi, Y . Kordi, A. Mirzaei, A. Naik, A. Ashok, A. S. Dhanasekaran, A. Arunkumar, D. Stap, et al. Super-naturalinstructions: Generalization via declarative instructions on 1600+ nlp tasks. InProceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 5085–5109, 2022

work page 2022
[62]

J. Wei, M. Bosma, V . Y . Zhao, K. Guu, A. W. Yu, B. Lester, N. Du, A. M. Dai, and Q. V . Le. Finetuned language models are zero-shot learners. arXiv preprint arXiv:2109.01652, 2021

work page internal anchor Pith review Pith/arXiv arXiv 2021
[63]

J. Wei, X. Wang, D. Schuurmans, M. Bosma, F. Xia, E. H. Chi, Q. V . Le, D. Zhou, et al. Chain-of-thought prompting elicits reasoning in large language models. In Advances in Neural Information Processing Systems, 2022

work page 2022
[64]

T. Wolf, L. Debut, V . Sanh, J. Chaumond, C. Delangue, A. Moi, P. Cistac, T. Rault, R. Louf, M. Funtowicz, et al. Huggingface’s transformers: State-of-the-art natural language processing. arXiv preprint arXiv:1910.03771, 2019

work page internal anchor Pith review arXiv 1910
[65]

Wortsman, T

M. Wortsman, T. Dettmers, L. Zettlemoyer, A. Morcos, A. Farhadi, and L. Schmidt. Stable and low-precision training for large-scale vision-language models. arXiv preprint arXiv:2304.13013, 2023

work page arXiv 2023
[66]

G. Xiao, J. Lin, M. Seznec, J. Demouth, and S. Han. Smoothquant: Accurate and efficient post-training quantization for large language models. arXiv preprint arXiv:2211.10438, 2022

work page arXiv 2022
[67]

T. Xie, C. H. Wu, P. Shi, R. Zhong, T. Scholak, M. Yasunaga, C.-S. Wu, M. Zhong, P. Yin, S. I. Wang, et al. Unifiedskg: Unifying and multi-tasking structured knowledge grounding with text-to-text language models. arXiv preprint arXiv:2201.05966, 2022

work page arXiv 2022
[68]

Z. Yang, P. Qi, S. Zhang, Y . Bengio, W. Cohen, R. Salakhutdinov, and C. D. Manning. Hotpotqa: A dataset for diverse, explainable multi-hop question answering. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 2369–2380, 2018

work page 2018
[69]

Z. Yao, R. Y . Aminabadi, M. Zhang, X. Wu, C. Li, and Y . He. Zeroquant: Efficient and affordable post-training quantization for large-scale transformers. arXiv preprint arXiv:2206.01861, 2022

work page arXiv 2022
[70]

E. B. Zaken, S. Ravfogel, and Y . Goldberg. Bitfit: Simple parameter-efficient fine-tuning for transformer-based masked language-models. arXiv preprint arXiv:2106.10199, 2021

work page arXiv 2021
[71]

A. Zeng, X. Liu, Z. Du, Z. Wang, H. Lai, M. Ding, Z. Yang, Y . Xu, W. Zheng, X. Xia, et al. Glm-130b: An open bilingual pre-trained model. arXiv preprint arXiv:2210.02414, 2022. 20

work page internal anchor Pith review arXiv 2022
[72]

OPT: Open Pre-trained Transformer Language Models

S. Zhang, S. Roller, N. Goyal, M. Artetxe, M. Chen, S. Chen, C. Dewan, M. Diab, X. Li, X. V . Lin, et al. Opt: Open pre-trained transformer language models.arXiv preprint arXiv:2205.01068, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022
[73]

Adapt- ing language models for zero-shot learning by meta- tuning on dataset and prompt collections.arXiv preprint arXiv:2104.04670,

R. Zhong, K. Lee, Z. Zhang, and D. Klein. Adapting language models for zero-shot learning by meta-tuning on dataset and prompt collections. arXiv preprint arXiv:2104.04670, 2021. 21 A QLoRA vs Standard Finetuning Experimental Setup Details A.1 Hyperparameters for QL ORA We do a hyperparameter search for LoRA over the following variables: LoRA dropout { 0....

work page arXiv 2021