hub Mixed citations

QLoRA: Efficient Finetuning of Quantized LLMs

Tim Dettmers, Artidoro Pagnoni, Ari Holtzman, Luke Zettlemoyer · 2023 · cs.LG · arXiv 2305.14314

Mixed citation behavior. Most common role is background (69%).

87 Pith papers citing it

Background 69% of classified citations

open full Pith review browse 87 citing papers arXiv PDF

abstract

We present QLoRA, an efficient finetuning approach that reduces memory usage enough to finetune a 65B parameter model on a single 48GB GPU while preserving full 16-bit finetuning task performance. QLoRA backpropagates gradients through a frozen, 4-bit quantized pretrained language model into Low Rank Adapters~(LoRA). Our best model family, which we name Guanaco, outperforms all previous openly released models on the Vicuna benchmark, reaching 99.3% of the performance level of ChatGPT while only requiring 24 hours of finetuning on a single GPU. QLoRA introduces a number of innovations to save memory without sacrificing performance: (a) 4-bit NormalFloat (NF4), a new data type that is information theoretically optimal for normally distributed weights (b) double quantization to reduce the average memory footprint by quantizing the quantization constants, and (c) paged optimziers to manage memory spikes. We use QLoRA to finetune more than 1,000 models, providing a detailed analysis of instruction following and chatbot performance across 8 instruction datasets, multiple model types (LLaMA, T5), and model scales that would be infeasible to run with regular finetuning (e.g. 33B and 65B parameter models). Our results show that QLoRA finetuning on a small high-quality dataset leads to state-of-the-art results, even when using smaller models than the previous SoTA. We provide a detailed analysis of chatbot performance based on both human and GPT-4 evaluations showing that GPT-4 evaluations are a cheap and reasonable alternative to human evaluation. Furthermore, we find that current chatbot benchmarks are not trustworthy to accurately evaluate the performance levels of chatbots. A lemon-picked analysis demonstrates where Guanaco fails compared to ChatGPT. We release all of our models and code, including CUDA kernels for 4-bit training.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 19 method 6 other 1

citation-polarity summary

background 18 use method 6 unclear 2

claims ledger

abstract We present QLoRA, an efficient finetuning approach that reduces memory usage enough to finetune a 65B parameter model on a single 48GB GPU while preserving full 16-bit finetuning task performance. QLoRA backpropagates gradients through a frozen, 4-bit quantized pretrained language model into Low Rank Adapters~(LoRA). Our best model family, which we name Guanaco, outperforms all previous openly released models on the Vicuna benchmark, reaching 99.3% of the performance level of ChatGPT while only requiring 24 hours of finetuning on a single GPU. QLoRA introduces a number of innovations to save m

co-cited works

representative citing papers

ORPO: Monolithic Preference Optimization without Reference Model

cs.CL · 2024-03-12 · conditional · novelty 8.0

ORPO performs preference alignment during supervised fine-tuning via a monolithic odds ratio penalty, allowing 7B models to outperform larger state-of-the-art models on alignment benchmarks.

Universal and Transferable Adversarial Attacks on Aligned Language Models

cs.CL · 2023-07-27 · accept · novelty 8.0

Gradient and greedy search over token suffixes produces universal, transferable adversarial prompts that elicit objectionable outputs from aligned models including black-box commercial systems.

Polarisation and Faraday rotation measure imaging at metre wavelengths with sub-arcsecond resolution: a foundational calibration strategy

astro-ph.IM · 2026-06-16 · unverdicted · novelty 7.0

A calibration strategy using full-Jones corrections with an in-field unpolarised calibrator and visibility-based multi-epoch alignment enables sub-arcsecond polarimetric imaging with LOFAR at metre wavelengths.

The Regularizing Power of Language-Training Deepfake Detectors

cs.CV · 2026-05-29 · unverdicted · novelty 7.0

A dual-encoder deepfake detector pairs a frozen specialist with a LoRA-tuned MLLM, trained first via binary alignment then via RL to reward explain-then-classify behavior, yielding improved cross-dataset performance and interpretability.

Cognitive Fatigue in Autoregressive Transformers: Formalization and Measurement

cs.CL · 2026-05-29 · unverdicted · novelty 7.0

Autoregressive transformers exhibit measurable cognitive fatigue during extended generation, quantified by the Fatigue Index that predicts degradation (AUROC 0.95) and repetition (rho 0.94).

Llamas on the Web: Memory-Efficient, Performance-Portable, and Multi-Precision LLM Inference with WebGPU

cs.DC · 2026-05-20 · conditional · novelty 7.0

LlamaWeb is a WebGPU backend for llama.cpp that uses static memory planning, tunable kernels, and templated multi-precision support to cut memory use by 29-33% and raise decode throughput by 45-69% versus prior browser frameworks on tested hardware.

Learn Where Outcomes Diverge: Efficient VLA RL via Probabilistic Chunk Masking

cs.LG · 2026-05-15 · unverdicted · novelty 7.0

PCM uses success-failure action variance to probabilistically select and mask chunks for gradient updates in GRPO, matching standard success rates with 2.38x wall-clock speedup and 60% lower memory on LIBERO benchmarks.

DiagnosticIQ: A Benchmark for LLM-Based Industrial Maintenance Action Recommendation from Symbolic Rules

cs.AI · 2026-05-09 · unverdicted · novelty 7.0

DiagnosticIQ benchmark shows frontier LLMs perform similarly on standard rule-to-action tasks but lose substantial accuracy under distractor expansion and condition inversion, pointing to calibration as the key deployment issue.

A Multi-View Media Profiling Suite: Resources, Evaluation, and Analysis

cs.CL · 2026-05-02 · unverdicted · novelty 7.0

Presents MBFC-2025 dataset and multi-view embeddings with fusion methods for media bias and factuality, reporting SOTA results on ACL-2020 and new benchmarks on MBFC-2025.

Characterizing Vision-Language-Action Models across XPUs: Constraints and Acceleration for On-Robot Deployment

cs.RO · 2026-04-27 · unverdicted · novelty 7.0

VLA models exhibit a compute-bound VLM phase followed by a memory-bound action phase on edge hardware; DP-Cache and V-AEFusion reduce redundancy and enable pipeline parallelism for up to 6x speedup on NPUs with marginal task degradation.

Why and When Visual Token Pruning Fails? A Study on Relevant Visual Information Shift in MLLMs Decoding

cs.CV · 2026-04-14 · unverdicted · novelty 7.0

Visual token pruning in MLLMs fails on complex reasoning due to Relevant Visual Information Shift during decoding, but the DSTP framework fixes it training-free across models.

CrashSight: A Phase-Aware, Infrastructure-Centric Video Benchmark for Traffic Crash Scene Understanding and Reasoning

cs.CV · 2026-04-09 · unverdicted · novelty 7.0

CrashSight is a new infrastructure-focused benchmark showing that state-of-the-art vision-language models can describe crash scenes but fail at temporal and causal reasoning.

AtlasOCR: Building the First Open-Source Darija OCR Model with Vision Language Models

cs.CV · 2026-04-09 · unverdicted · novelty 7.0

AtlasOCR delivers the first open-source Darija OCR by fine-tuning Qwen2.5-VL 3B, achieving state-of-the-art results on custom and existing benchmarks for both Darija and Arabic.

KITE: Keyframe-Indexed Tokenized Evidence for VLM-Based Robot Failure Analysis

cs.RO · 2026-04-08 · unverdicted · novelty 7.0

KITE is a training-free method that uses keyframe-indexed tokenized evidence including BEV schematics to enhance VLM performance on robot failure detection, identification, localization, explanation, and correction.

ScrapeGraphAI-100k: Dataset for Schema-Constrained LLM Generation

cs.IR · 2026-02-16 · unverdicted · novelty 7.0

ScrapeGraphAI-100k releases 93,695 real telemetry examples pairing web page content with prompts, schemas, and LLM responses to support training and benchmarking of schema-constrained generation.

CORP: Closed-Form One-shot Representation-Preserving Structured Pruning for Transformers

cs.LG · 2026-02-05 · unverdicted · novelty 7.0

CORP performs one-shot structured pruning of Transformers by modeling removed components as affine functions of retained ones and solving closed-form ridge regressions on calibration data to fold compensation into weights, retaining 83.27% Top-1 accuracy on DeiT-Huge after 50% pruning.

Democratising Pathology Co-Pilots: An Open Pipeline and Dataset for Whole-Slide Vision-Language Modelling

cs.CV · 2025-12-19 · conditional · novelty 7.0

A new open pipeline and dataset enable training of a vision-language model for whole-slide pathology VQA that outperforms MedGemma on tissue identification, neoplasm detection, and differential diagnosis.

Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads

cs.LG · 2024-01-19 · conditional · novelty 7.0

Medusa augments LLMs with multiple decoding heads and tree-based attention to predict and verify several tokens in parallel, yielding 2.2-3.6x inference speedup via two fine-tuning regimes.

Self-Rewarding Language Models

cs.CL · 2024-01-18 · conditional · novelty 7.0

Iterative self-rewarding via LLM-as-Judge in DPO training on Llama 2 70B improves instruction following and self-evaluation, outperforming GPT-4 on AlpacaEval 2.0.

LLMs as Noisy Channels: A Shannon Perspective on Model Capacity and Scaling Laws

cs.LG · 2026-05-22 · unverdicted · novelty 6.0

The Shannon Scaling Law treats LLM training as noisy-channel transmission and predicts U-shaped performance degradation when signal-to-noise ratio falls below a threshold, outperforming monotonic scaling laws on Pythia and OLMo2 data.

What Training Data Teaches RL Memory Agents: An Empirical Study of Curriculum Effects in Memory-Augmented QA

cs.CL · 2026-05-21 · unverdicted · novelty 6.0

Controlled study shows mixed training curricula improve aggregate F1 on memory QA benchmarks while out-of-domain data transfers targeted skills like temporal reasoning, with per-question-type effects exceeding aggregate differences.

From Text to Voice: A Reproducible and Verifiable Framework for Evaluating Tool Calling LLM Agents

cs.CL · 2026-05-14 · unverdicted · novelty 6.0

A dataset-agnostic framework converts text tool-calling benchmarks to paired audio evaluations via TTS, speaker variation and noise, then evaluates seven omni-modal models showing model- and task-dependent performance with small text-to-voice gaps.

Output Composability of QLoRA PEFT Modules for Plug-and-Play Attribute-Controlled Text Generation

cs.CL · 2026-05-12 · unverdicted · novelty 6.0

Summing outputs from separately trained QLoRA PEFT modules provides strong performance for attribute-controlled text generation, often matching or exceeding single-task modules even on single-attribute tests.

Sharpness-Aware Pretraining Mitigates Catastrophic Forgetting

cs.LG · 2026-05-04 · unverdicted · novelty 6.0

Sharpness-aware pretraining and related flat-minima interventions reduce catastrophic forgetting by up to 80% after post-training across 20M-150M models and by 31-40% at 1B scale.

citing papers explorer

Showing 37 of 87 citing papers.

Towards a Large Language-Vision Question Answering Model for MSTAR Automatic Target Recognition cs.CV · 2026-05-11 · unverdicted · none · ref 16 · internal anchor
A fine-tuned large language-vision model achieves 98% accuracy on visual question answering for military vehicle identification in SAR imagery from an extended MSTAR benchmark.
AdaMeZO: Adam-style Zeroth-Order Optimizer for LLM Fine-tuning Without Maintaining the Moments cs.LG · 2026-05-01 · unverdicted · none · ref 27 · internal anchor
AdaMeZO adapts Adam moment estimates to zeroth-order LLM fine-tuning without extra memory storage, outperforming MeZO with up to 70% fewer forward passes.
ChipLingo: A Systematic Training Framework for Large Language Models in EDA cs.LG · 2026-04-30 · unverdicted · none · ref 17 · internal anchor
ChipLingo trains LLMs on EDA data via corpus construction, domain-adaptive pretraining, and RAG scenario alignment, reaching 59.7% accuracy with an 8B model and 70.02% with a 32B model on a new internal EDA benchmark.
A KL Lens on Quantization: Fast, Forward-Only Sensitivity for Mixed-Precision SSM-Transformer Models cs.LG · 2026-04-15 · unverdicted · none · ref 4 · internal anchor
KL divergence provides a superior forward-only metric for identifying quantization-sensitive parts in SSM-Transformer hybrids, outperforming MSE and SQNR and supporting practical mixed-precision deployment on edge devices.
Please Make it Sound like Human: Encoder-Decoder vs. Decoder-Only Transformers for AI-to-Human Text Style Transfer cs.CL · 2026-04-13 · unverdicted · none · ref 13 · internal anchor
BART-large outperforms Mistral-7B in AI-to-human style transfer with higher reference similarity scores and far fewer parameters, while showing that marker shift can reflect overshoot rather than accurate transfer.
NyayaMind- A Framework for Transparent Legal Reasoning and Judgment Prediction in the Indian Legal System cs.CL · 2026-04-10 · unverdicted · none · ref 3 · internal anchor
NyayaMind combines RAG retrieval with domain-specific LLMs to generate transparent, structured legal reasoning and judgment predictions for Indian court cases.
PassiveQA: A Three-Action Framework for Epistemically Calibrated Question Answering via Supervised Finetuning cs.CL · 2026-04-06 · unverdicted · none · ref 30 · internal anchor
PassiveQA trains models via supervised finetuning to decide Answer, Ask, or Abstain using structured information-state representations and knowledge-graph context, yielding better abstention and lower hallucination on QA datasets.
VerifAI: A Verifiable Open-Source Search Engine for Biomedical Question Answering cs.IR · 2026-01-16 · unverdicted · none · ref 50 · internal anchor
VerifAI is an open-source biomedical QA system that decomposes generated answers into claims and verifies them with a fine-tuned NLI engine to reduce hallucinations and provide traceable citations.
TrustLLM: Trustworthiness in Large Language Models cs.CL · 2024-01-10 · unverdicted · none · ref 40 · internal anchor
TrustLLM defines eight trustworthiness principles, creates a six-dimension benchmark, and evaluates 16 LLMs showing proprietary models generally lead but some open-source ones are close while over-calibration can hurt utility.
LoRA-FA: Efficient and Effective Low Rank Representation Fine-tuning cs.CL · 2023-08-07 · unverdicted · none · ref 19 · internal anchor
LoRA-FA freezes LoRA's A matrix and trains only B with gradient corrections to approximate full fine-tuning gradients more closely.
ImmigrationQA: A Source-Grounded Dataset and Small-Model Adaptation for U.S. Immigration Law cs.CL · 2026-05-28 · unverdicted · none · ref 3 · internal anchor
A new source-grounded QA dataset for U.S. immigration law is built from official documents and used to fine-tune a 3B model, yielding a 27% mean score improvement over the base model on a held-out sample.
Domain Adaptation and Reasoning Frameworks in Language Models: A Controlled Experiment with Historical Cosmology cs.CL · 2026-05-28 · unverdicted · none · ref 5 · internal anchor
Fine-tuning on historical cosmology data reshapes language model explanatory frameworks, leading to stance changes as a secondary effect from regime redistribution.
Security Document Classification with a Fine-Tuned Local Large Language Model: Benchmark Data and an Open-Source System cs.CR · 2026-05-19 · conditional · none · ref 11 · internal anchor
A fine-tuned local Qwen 3.5 27B model achieves 95% category-level accuracy on security document classification, outperforming commercial models on both internal and external test sets while keeping processing local.
VectraYX-Nano: A 42M-Parameter Spanish Cybersecurity Language Model with Curriculum Learning and Native Tool Use cs.CL · 2026-05-13 · unverdicted · none · ref 16 · 3 links · internal anchor
Trains a 42M-parameter Spanish cybersecurity LLM from scratch with curriculum phases and achieves 0.23 tool-selection accuracy after SFT mixture rebalancing to 1:21 tool-use ratio.
LLiMba: Sardinian on a Single GPU -- Adapting a 3B Language Model to a Vanishing Romance Language cs.CL · 2026-05-09 · conditional · none · ref 5 · internal anchor
Qwen2.5-3B was continued-pretrained and then fine-tuned with rsLoRA r256 on Sardinian data to reach 28.5 BLEU into the language, outperforming full fine-tuning and other LoRA variants.
HPC-LLM: Practical Domain Adaptation and Retrieval-Augmented Generation for HPC Support cs.LG · 2026-05-08 · unverdicted · none · ref 36 · internal anchor
HPC-LLM fine-tunes Llama 3.1 8B via QLoRA on 9k-24k HPC examples and adds dense retrieval to deliver practical support for job scheduling, MPI, and GPU workflows, approaching the performance of larger general models at lower memory and latency cost.
OpenSOC-AI: Democratizing Security Operations with Parameter Efficient LLM Log Analysis cs.CR · 2026-04-29 · unverdicted · none · ref 5 · internal anchor
LoRA fine-tuning of TinyLlama-1.1B on 450 SOC examples produces 68% threat classification accuracy and 58% severity accuracy on 50 held-out logs, with full code, weights, and data released.
Toward Zero-Egress Psychiatric AI: On-Device LLM Deployment for Privacy-Preserving Mental Health Decision Support cs.AI · 2026-04-20 · unverdicted · none · ref 13 · internal anchor
A cross-platform mobile application deploys an ensemble of quantized open-source LLMs for fully local, DSM-5-aligned psychiatric decision support with claimed accuracy comparable to prior cloud versions.
FLeX: Fourier-based Low-rank EXpansion for multilingual transfer cs.LG · 2026-04-06 · unverdicted · none · ref 6 · internal anchor
LoRA fine-tuning of Code Llama with Fourier regularization raises Java pass@1 from 34.2% to 42.1% while using a small high-quality dataset.
Information Extraction from Electricity Invoices with General-Purpose Large Language Models cs.CL · 2026-04-01 · unverdicted · none · ref 20 · internal anchor
Few-shot prompting lifts F1 scores above 96 percent on electricity-invoice extraction for Gemini 1.5 Pro and Mistral-small, while hyperparameter changes produce only marginal gains.
LLM4Log: A Systematic Review of Large Language Model-based Log Analysis cs.SE · 2026-03-18 · unverdicted · none · ref 25 · 2 links · internal anchor
Systematic review of 145 papers on LLM-based log analysis, providing a unified taxonomy, common design patterns, evaluation practices, and challenges for deployment under drift and limited labels.
Multi-Model Synthetic Training for Mission-Critical Small Language Models cs.CL · 2025-09-16 · unverdicted · none · ref 13 · internal anchor
Fine-tunes Qwen2.5-7B on 21,543 synthetic maritime Q&A pairs generated from 3.2B AIS records by GPT-4o and o3-mini, reaching 75% accuracy at 261x lower inference cost than larger models.
Speech-Based Cognitive Screening: A Systematic Evaluation of LLM Adaptation Strategies cs.CL · 2025-08-24 · unverdicted · none · ref 40 · internal anchor
Systematic comparison of nine text-only and three multimodal LLMs using in-context learning, reasoning prompts, fine-tuning, and multimodal fusion on DementiaBank speech data finds class-centroid demonstrations and token-level fine-tuning most effective, with adapted open models matching or beating
Parameter-Efficient Fine-Tuning for Large Models: A Comprehensive Survey cs.LG · 2024-03-21 · accept · none · ref 124 · internal anchor
A comprehensive survey of PEFT algorithms for large models, covering their performance, overhead, applications, and real-world system implementations.
SLM Finetuning for Natural Language to Domain Specific Code Generation in Production cs.LG · 2026-04-10 · unverdicted · none · ref 11 · internal anchor
Fine-tuned small language models outperform larger models in natural language to domain-specific code generation with improved performance, latency, and the ability to adapt to customer-specific scenarios without losing general capabilities.
LLMOrbit: A Circular Taxonomy of Large Language Models -From Scaling Walls to Agentic AI Systems cs.LG · 2026-01-20 · unverdicted · none · ref 43 · internal anchor
A survey taxonomy of LLMs identifies three scaling crises and six efficiency paradigms while tracing the shift from generation to tool-using agents.
Towards EnergyGPT: A Large Language Model Specialized for the Energy Sector cs.CL · 2025-09-08 · unverdicted · none · ref 30 · internal anchor
Fine-tuned LLaMA 3.1-8B variants for the energy sector outperform the base model on domain QA benchmarks, with LoRA delivering similar gains at lower training cost.
Bridging the Linguistic Divide: A Survey on Leveraging Large Language Models for Machine Translation cs.CL · 2025-04-02 · unverdicted · none · ref 114 · internal anchor
A literature survey that organizes prompting, fine-tuning, preference optimization, and context-aware techniques for LLM-based machine translation with emphasis on low-resource languages.
Large Language Models: A Survey cs.CL · 2024-02-09 · accept · none · ref 63 · internal anchor
The paper surveys key large language models, their training methods, datasets, evaluation benchmarks, and future research directions in the field.
FUTURAL: A Metasearch Platform for Empowering Rural Areas with Smart Solutions cs.IR · 2026-04-26 · unverdicted · none · ref 2 · internal anchor
FUTURAL's metasearch MVP uses LLMs to enable natural language queries over smart solutions data to support rural development.
The nextAI Solution to the NeurIPS 2023 LLM Efficiency Challenge cs.LG · 2026-04-10 · unverdicted · none · ref 10 · internal anchor
A competition entry achieved efficient fine-tuning of LLaMa2 70B on one GPU in 24 hours with competitive QA benchmark performance.
QU-NLP at ArchEHR-QA 2026: Two-Stage QLoRA Fine-Tuning of Qwen3-4B for Patient-Oriented Clinical Question Answering and Evidence Sentence Alignment cs.CL · 2026-03-26 · unverdicted · none · ref 10 · internal anchor
Two-stage QLoRA fine-tuning of Qwen3-4B plus retrieval ensemble achieves 32.87 overall score on clinical QA and 67.16 micro-F1 on evidence alignment, highlighting that 20 training cases are insufficient.
Will LLMs Scaling Hit the Wall? Breaking Barriers via Distributed Resources on Massive Edge Devices cs.DC · 2025-03-11 · unverdicted · none · ref 173 · internal anchor
Position paper claiming that distributed training across massive edge devices can overcome data depletion and centralized compute monopolies in LLM scaling.
A Comprehensive Overview of Large Language Models cs.CL · 2023-07-12 · unverdicted · none · ref 261 · internal anchor
A survey paper providing an overview of Large Language Models, their background, and recent advances in the field.
MinT: Managed Infrastructure for Training and Serving Millions of LLMs cs.LG · 2026-05-13 · unreviewed · ref 4 · internal anchor
CeRA: Breaking the Linear Ceiling of Low-Rank Adaptation with Non-linearity Retained at Inference cs.LG · 2026-02-26 · unreviewed · ref 12 · internal anchor
Fine-Tuning Causal LLMs for Text Classification: Embedding-Based vs. Instruction-Based Approaches cs.CL · 2025-12-14 · unreviewed · ref 9 · internal anchor

QLoRA: Efficient Finetuning of Quantized LLMs

hub tools

citation-role summary

citation-polarity summary

claims ledger

co-cited works

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer