hub Mixed citations

Prefix-Tuning: Optimizing Continuous Prompts for Generation

Xiang Lisa Li, Percy Liang · 2021 · cs.CL · arXiv 2101.00190

Mixed citation behavior. Most common role is background (56%).

78 Pith papers citing it

Background 56% of classified citations

open full Pith review browse 78 citing papers arXiv PDF

abstract

Fine-tuning is the de facto way to leverage large pretrained language models to perform downstream tasks. However, it modifies all the language model parameters and therefore necessitates storing a full copy for each task. In this paper, we propose prefix-tuning, a lightweight alternative to fine-tuning for natural language generation tasks, which keeps language model parameters frozen, but optimizes a small continuous task-specific vector (called the prefix). Prefix-tuning draws inspiration from prompting, allowing subsequent tokens to attend to this prefix as if it were "virtual tokens". We apply prefix-tuning to GPT-2 for table-to-text generation and to BART for summarization. We find that by learning only 0.1\% of the parameters, prefix-tuning obtains comparable performance in the full data setting, outperforms fine-tuning in low-data settings, and extrapolates better to examples with topics unseen during training.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 9 method 5 baseline 2 dataset 1 other 1

citation-polarity summary

background 10 use method 4 baseline 2 unclear 1 use dataset 1

claims ledger

abstract Fine-tuning is the de facto way to leverage large pretrained language models to perform downstream tasks. However, it modifies all the language model parameters and therefore necessitates storing a full copy for each task. In this paper, we propose prefix-tuning, a lightweight alternative to fine-tuning for natural language generation tasks, which keeps language model parameters frozen, but optimizes a small continuous task-specific vector (called the prefix). Prefix-tuning draws inspiration from prompting, allowing subsequent tokens to attend to this prefix as if it were "virtual tokens". We

co-cited works

representative citing papers

Parameter-Efficient Fine-Tuning with Learnable Rank

cs.CL · 2026-06-03 · unverdicted · novelty 7.0

LR-LoRA learns per-layer adapter ranks during training and reports outperforming fixed-rank LoRA and other PEFT baselines on language understanding and commonsense reasoning tasks.

PrAda: Few-Shot Visual Adaptation for Text-Prompted Segmentation

cs.CV · 2026-05-19 · unverdicted · novelty 7.0

PrAda adapts text-prompted segmentation models in a few-shot setting by learning and fusing class-specific prototypes from fine-grained and high-level features, yielding significant gains on semantic, instance, and panoptic segmentation across five benchmarks.

Adaptive and Fine-grained Module-wise Expert Pruning for Efficient LoRA-MoE Fine-Tuning

cs.LG · 2026-04-29 · unverdicted · novelty 7.0

DMEP prunes experts module-by-module in LoRA-MoE and removes load balancing after pruning, cutting trainable parameters 35-43% and raising throughput ~10% while matching or exceeding uniform baselines on reasoning tasks.

A Hormone-inspired Emotion Layer for Transformer language models (HELT)

cs.NE · 2026-04-13 · unverdicted · novelty 7.0

HormoneT5 augments T5 with a hormone-inspired block that predicts six continuous emotion values and uses them to modulate responses, reporting over 85% per-hormone accuracy and human preference for emotional quality.

Graph Topology Information Enhanced Heterogeneous Graph Representation Learning

cs.LG · 2026-04-07 · unverdicted · novelty 7.0

ToGRL learns high-quality graph structures from raw heterogeneous graphs via a two-stage topology extraction process and prompt tuning, outperforming prior methods on five datasets.

Exploring Cross-Modal Flows for Few-Shot Learning

cs.CV · 2025-10-16 · unverdicted · novelty 7.0

FMA introduces flow matching for multi-step cross-modal feature alignment in few-shot learning, using fixed coupling, noise augmentation, and early-stopping to outperform one-step PEFT methods.

Multimodal Policy Internalization for Conversational Agents

cs.CL · 2025-10-10 · unverdicted · novelty 7.0

The paper defines the MPI task and proposes TriMPI, a three-stage training pipeline of continual pretraining, supervised finetuning, and policy-aware reinforcement learning that internalizes multimodal policies into model parameters for improved adherence without prompts at inference.

Continual Learning for VLMs: A Survey and Taxonomy Beyond Forgetting

cs.CV · 2025-08-06 · unverdicted · novelty 7.0

The paper offers a comprehensive survey and proposes a new taxonomy for continual learning strategies in VLMs and MLLMs to combat catastrophic forgetting beyond traditional methods.

Learning Interactive Real-World Simulators

cs.AI · 2023-10-09 · conditional · novelty 7.0

UniSim learns a universal real-world simulator from orchestrated diverse datasets, enabling zero-shot deployment of policies trained purely in simulation.

Efficient Memory Management for Large Language Model Serving with PagedAttention

cs.LG · 2023-09-12 · conditional · novelty 7.0

PagedAttention achieves near-zero waste in LLM key-value cache memory and enables 2-4x higher serving throughput than prior systems.

Large Language Models as Optimizers

cs.LG · 2023-09-07 · unverdicted · novelty 7.0

Large language models can optimize by being prompted with histories of past solutions and scores to propose better ones, producing prompts that raise accuracy up to 8% on GSM8K and 50% on Big-Bench Hard over human-designed baselines.

Steering Language Models With Activation Engineering

cs.CL · 2023-08-20 · unverdicted · novelty 7.0

Activation Addition steers language models by adding contrastive activation vectors from prompt pairs to control high-level properties like sentiment and toxicity at inference time without training.

QLoRA: Efficient Finetuning of Quantized LLMs

cs.LG · 2023-05-23 · conditional · novelty 7.0

QLoRA finetunes 4-bit quantized LLMs via LoRA adapters to match full-precision performance while using far less memory, enabling 65B-scale training on single GPUs and producing Guanaco models near ChatGPT level.

LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention

cs.CV · 2023-03-28 · conditional · novelty 7.0

LLaMA-Adapter turns frozen LLaMA 7B into a capable instruction follower using only 1.2M new parameters and zero-init attention, matching Alpaca while extending to image-conditioned reasoning on ScienceQA and COCO.

Flamingo: a Visual Language Model for Few-Shot Learning

cs.CV · 2022-04-29 · unverdicted · novelty 7.0

Flamingo models reach new state-of-the-art few-shot results on image and video tasks by bridging frozen vision and language models with cross-attention layers trained on interleaved web-scale data.

LoRA: Low-Rank Adaptation of Large Language Models

cs.CL · 2021-06-17 · accept · novelty 7.0

Adapting large language models by training only a low-rank decomposition BA added to frozen weight matrices matches full fine-tuning while cutting trainable parameters by orders of magnitude and adding no inference latency.

Cross-Task Generalization via Natural Language Crowdsourcing Instructions

cs.CL · 2021-04-18 · conditional · novelty 7.0

Presents the NATURAL INSTRUCTIONS meta-dataset and shows generative pre-trained language models achieve 19% better generalization to unseen tasks when using task instructions.

5% > 100%: Flatness Preference is All You Need for Multimodal Parameter-Efficient Fine-Tuning

cs.CV · 2026-06-09 · unverdicted · novelty 6.0

Flatness Preference Optimization (FlatPO) improves multimodal PEFT generalization by flattening a small set of sharp dimensions that dominate performance.

End-to-End Context Compression at Scale

cs.CL · 2026-06-08 · unverdicted · novelty 6.0

LCLMs are scaled 0.6B-encoder 4B-decoder compressors pre-trained on over 350B tokens that improve the Pareto frontier for general-task performance, compression speed, and peak memory in long-context language model inference.

Unveiling Privacy Risks in Multi-modal Large Language Models: Task-specific Vulnerabilities and Mitigation Challenges

cs.CR · 2026-06-08 · unverdicted · novelty 6.0

Introduces MM-Privacy dataset and evaluations showing MLLMs leak sensitive data from images in various tasks, highlighting task inconsistency effects.

Latent Diffusion Pretraining for Crystal Property Prediction

cs.LG · 2026-05-30 · unverdicted · novelty 6.0

CrysLDNet combines VAE and latent diffusion pretraining on unlabeled crystals to improve graph encoder performance on property prediction by about 4-5% on JARVIS and MP datasets.

V-LynX: Token Interface Alignment for Video+X LLMs

cs.CV · 2026-05-30 · unverdicted · novelty 6.0

V-LynX integrates novel modalities into frozen Video LLMs by aligning to an internalized continuous token manifold using unpaired unimodal data and attention/statistical matching.

ACE-LoRA: Adaptive Orthogonal Decoupling for Continual Image Editing

cs.CV · 2026-05-14 · unverdicted · novelty 6.0

ACE-LoRA introduces adaptive orthogonal decoupling and rank-invariant compression for continual image editing in diffusion models, plus the CIE-Bench benchmark.

Octopus: History-Free Gradient Orthogonalization for Continual Learning in Multimodal Large Language Models

cs.LG · 2026-05-14 · unverdicted · novelty 6.0

Octopus introduces history-free gradient orthogonalization in a two-stage finetuning framework to achieve state-of-the-art continual learning results for multimodal LLMs on the UCIT benchmark.

citing papers explorer

Showing 10 of 10 citing papers after filters.

Learning Interactive Real-World Simulators cs.AI · 2023-10-09 · conditional · none · ref 200 · internal anchor
UniSim learns a universal real-world simulator from orchestrated diverse datasets, enabling zero-shot deployment of policies trained purely in simulation.
Efficient Memory Management for Large Language Model Serving with PagedAttention cs.LG · 2023-09-12 · conditional · none · ref 29 · internal anchor
PagedAttention achieves near-zero waste in LLM key-value cache memory and enables 2-4x higher serving throughput than prior systems.
Large Language Models as Optimizers cs.LG · 2023-09-07 · unverdicted · none · ref 16 · internal anchor
Large language models can optimize by being prompted with histories of past solutions and scores to propose better ones, producing prompts that raise accuracy up to 8% on GSM8K and 50% on Big-Bench Hard over human-designed baselines.
Steering Language Models With Activation Engineering cs.CL · 2023-08-20 · unverdicted · none · ref 97 · internal anchor
Activation Addition steers language models by adding contrastive activation vectors from prompt pairs to control high-level properties like sentiment and toxicity at inference time without training.
QLoRA: Efficient Finetuning of Quantized LLMs cs.LG · 2023-05-23 · conditional · none · ref 34 · internal anchor
QLoRA finetunes 4-bit quantized LLMs via LoRA adapters to match full-precision performance while using far less memory, enabling 65B-scale training on single GPUs and producing Guanaco models near ChatGPT level.
LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention cs.CV · 2023-03-28 · conditional · none · ref 177 · internal anchor
LLaMA-Adapter turns frozen LLaMA 7B into a capable instruction follower using only 1.2M new parameters and zero-init attention, matching Alpaca while extending to image-conditioned reasoning on ScienceQA and COCO.
Towards Expert-Level Medical Question Answering with Large Language Models cs.CL · 2023-05-16 · unverdicted · none · ref 68 · internal anchor
Med-PaLM 2 achieves 86.5% accuracy on MedQA and approaches or exceeds prior state-of-the-art on other medical QA benchmarks while receiving higher physician preference ratings than human answers on consumer questions.
CAMEL: Communicative Agents for "Mind" Exploration of Large Language Model Society cs.AI · 2023-03-31 · conditional · none · ref 66 · internal anchor
CAMEL proposes a role-playing framework with inception prompting that enables autonomous multi-agent cooperation among LLMs and generates conversational data for studying their behaviors.
Improved Baselines with Visual Instruction Tuning cs.CV · 2023-10-05 · conditional · none · ref 33 · internal anchor
Simple changes to LLaVA using CLIP-ViT-L-336px, an MLP connector, and academic VQA data yield state-of-the-art results on 11 benchmarks with only 1.2M public examples and one-day training on 8 A100 GPUs.
A Comprehensive Overview of Large Language Models cs.CL · 2023-07-12 · unverdicted · none · ref 41 · internal anchor
A survey paper providing an overview of Large Language Models, their background, and recent advances in the field.

Prefix-Tuning: Optimizing Continuous Prompts for Generation

hub tools

citation-role summary

citation-polarity summary

claims ledger

co-cited works

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer