{"work":{"id":"23ddcedc-909d-4b18-b28a-943a50a8cef7","openalex_id":null,"doi":null,"arxiv_id":"2101.00190","raw_key":null,"title":"Prefix-Tuning: Optimizing Continuous Prompts for Generation","authors":null,"authors_text":"Xiang Lisa Li, Percy Liang","year":2021,"venue":"cs.CL","abstract":"Fine-tuning is the de facto way to leverage large pretrained language models to perform downstream tasks. However, it modifies all the language model parameters and therefore necessitates storing a full copy for each task. In this paper, we propose prefix-tuning, a lightweight alternative to fine-tuning for natural language generation tasks, which keeps language model parameters frozen, but optimizes a small continuous task-specific vector (called the prefix). Prefix-tuning draws inspiration from prompting, allowing subsequent tokens to attend to this prefix as if it were \"virtual tokens\". We apply prefix-tuning to GPT-2 for table-to-text generation and to BART for summarization. We find that by learning only 0.1\\% of the parameters, prefix-tuning obtains comparable performance in the full data setting, outperforms fine-tuning in low-data settings, and extrapolates better to examples with topics unseen during training.","external_url":"https://arxiv.org/abs/2101.00190","cited_by_count":null,"metadata_source":"pith","metadata_fetched_at":"2026-06-29T07:43:14.083036+00:00","pith_arxiv_id":"2101.00190","created_at":"2026-05-09T05:01:40.814550+00:00","updated_at":"2026-06-29T07:43:14.083036+00:00","title_quality_ok":true,"display_title":"Prefix-Tuning: Optimizing Continuous Prompts for Generation","render_title":"Prefix-Tuning: Optimizing Continuous Prompts for Generation"},"hub":{"state":{"work_id":"23ddcedc-909d-4b18-b28a-943a50a8cef7","tier":"hub","tier_reason":"10+ Pith inbound or 1,000+ external citations","pith_inbound_count":70,"external_cited_by_count":null,"distinct_field_count":13,"first_pith_cited_at":"2021-04-18T08:44:56+00:00","last_pith_cited_at":"2026-06-23T17:21:03+00:00","author_build_status":"not_needed","summary_status":"needed","contexts_status":"needed","graph_status":"needed","ask_index_status":"not_needed","reader_status":"not_needed","recognition_status":"not_needed","updated_at":"2026-06-29T15:19:00.474858+00:00","tier_text":"hub"},"tier":"hub","role_counts":[{"context_role":"background","n":9},{"context_role":"method","n":5},{"context_role":"baseline","n":2},{"context_role":"dataset","n":1},{"context_role":"other","n":1}],"polarity_counts":[{"context_polarity":"background","n":10},{"context_polarity":"use_method","n":4},{"context_polarity":"baseline","n":2},{"context_polarity":"unclear","n":1},{"context_polarity":"use_dataset","n":1}],"runs":{"context_extract":{"job_type":"context_extract","status":"succeeded","result":{"enqueued_papers":25},"error":null,"updated_at":"2026-05-14T18:20:00.283336+00:00"},"graph_features":{"job_type":"graph_features","status":"succeeded","result":{"co_cited":[{"title":"The Power of Scale for Parameter-Efficient Prompt Tuning","work_id":"1056ba8e-7b3f-4811-be8e-9a3ed9269acb","shared_citers":20},{"title":"LoRA: Low-Rank Adaptation of Large Language Models","work_id":"0426219a-789e-4964-adc8-a04538510818","shared_citers":11},{"title":"BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding","work_id":"ed240a10-5b19-406c-baa5-30803f465785","shared_citers":10},{"title":"An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale","work_id":"e96730e3-129b-4db6-b981-15ab7932e297","shared_citers":8},{"title":"Bitfit: Simple parameter- efficient fine-tuning for transformer-based masked language-models","work_id":"8505729c-c88b-43bd-8e2c-2c94644ca438","shared_citers":8},{"title":"LLaMA: Open and Efficient Foundation Language Models","work_id":"c018fc23-6f3f-4035-9d02-28a2173b2b9d","shared_citers":8},{"title":"Finetuned Language Models Are Zero-Shot Learners","work_id":"7ed6cdaa-ed67-4db4-aceb-b7e1b0e6e7c4","shared_citers":7},{"title":"GPT-4 Technical Report","work_id":"b928e041-6991-4c08-8c81-0359e4097c7b","shared_citers":7},{"title":"Towards a unified view of parameter-efficient transfer learning","work_id":"e758988d-4ae2-44ff-94fc-4417c242b1e4","shared_citers":7},{"title":"Evaluating Large Language Models Trained on Code","work_id":"042493e9-b26f-4b4e-bbde-382072ca9b08","shared_citers":6},{"title":"Llama 2: Open Foundation and Fine-Tuned Chat Models","work_id":"68a5177f-d644-44c1-bd4f-4e5278c22f5d","shared_citers":6},{"title":"OPT: Open Pre-trained Transformer Language Models","work_id":"d7ff3b21-1fff-4cf4-952a-4714e3ef2307","shared_citers":6},{"title":"Decoupled Weight Decay Regularization","work_id":"07ef7360-d385-4033-83f7-8384a6325204","shared_citers":5},{"title":"Distilling the Knowledge in a Neural Network","work_id":"d927ab1f-17b8-4002-9d09-c3d55764fbad","shared_citers":5},{"title":"GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding","work_id":"1bb6fb0c-482d-43cf-94a8-ed18f72a5563","shared_citers":5},{"title":"Language Models are Few-Shot Learners","work_id":"214732c0-2edd-44a0-af9e-28184a2b8279","shared_citers":5},{"title":"P-tuning v2: Prompt tuning can be comparable to fine-tuning universally across scales and tasks","work_id":"ceb357a3-8497-47d1-ae04-4e844ec59db5","shared_citers":5},{"title":"Scaling Laws for Neural Language Models","work_id":"b7dd8749-9c45-4977-ab9b-64478dce1ae8","shared_citers":5},{"title":"The Llama 3 Herd of Models","work_id":"1549a635-88af-4ac1-acfe-51ae7bb53345","shared_citers":5},{"title":"Training Compute-Optimal Large Language Models","work_id":"b2faf28d-86b7-429c-bc42-469458efc246","shared_citers":5},{"title":"Training Verifiers to Solve Math Word Problems","work_id":"acab1aa8-b4d6-40e0-a3ee-25341701dca2","shared_citers":5},{"title":"AdaLoRA: Adaptive Budget Allocation for Parameter-Efficient Fine-Tuning","work_id":"6fa49657-348b-42dd-b870-8758c71af878","shared_citers":4},{"title":"Delta tuning: A comprehensive study of parameter efficient methods for pre-trained language models","work_id":"400697bd-9533-4483-b9c4-68ce3f4ebfd0","shared_citers":4},{"title":"Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context","work_id":"80e3e977-f1bb-4c83-8d0c-1ab0a0c5c3f1","shared_citers":4}],"time_series":[{"n":1,"year":2021},{"n":2,"year":2022},{"n":5,"year":2023},{"n":3,"year":2024},{"n":1,"year":2025},{"n":23,"year":2026}],"dependency_candidates":[]},"error":null,"updated_at":"2026-05-14T18:19:51.582898+00:00"},"identity_refresh":{"job_type":"identity_refresh","status":"succeeded","result":{"items":[{"title":"Qwen3 Technical Report","outcome":"unchanged","work_id":"25a4e30c-1232-48e7-9925-02fa12ba7c9e","resolver":"local_arxiv","confidence":0.98,"old_work_id":"25a4e30c-1232-48e7-9925-02fa12ba7c9e"}],"counts":{"fixed":0,"merged":0,"unchanged":1,"quarantined":0,"needs_external_resolution":0},"errors":[],"attempted":1},"error":null,"updated_at":"2026-05-14T18:19:37.628380+00:00"},"summary_claims":{"job_type":"summary_claims","status":"succeeded","result":{"title":"Prefix-Tuning: Optimizing Continuous Prompts for Generation","claims":[{"claim_text":"Fine-tuning is the de facto way to leverage large pretrained language models to perform downstream tasks. However, it modifies all the language model parameters and therefore necessitates storing a full copy for each task. In this paper, we propose prefix-tuning, a lightweight alternative to fine-tuning for natural language generation tasks, which keeps language model parameters frozen, but optimizes a small continuous task-specific vector (called the prefix). Prefix-tuning draws inspiration from prompting, allowing subsequent tokens to attend to this prefix as if it were \"virtual tokens\". We ","claim_type":"abstract","evidence_strength":"source_metadata"}],"why_cited":"Pith tracks Prefix-Tuning: Optimizing Continuous Prompts for Generation because it crossed a citation-hub threshold.","role_counts":[]},"error":null,"updated_at":"2026-05-14T18:19:51.587842+00:00"}},"summary":{"title":"Prefix-Tuning: Optimizing Continuous Prompts for Generation","claims":[{"claim_text":"Fine-tuning is the de facto way to leverage large pretrained language models to perform downstream tasks. However, it modifies all the language model parameters and therefore necessitates storing a full copy for each task. In this paper, we propose prefix-tuning, a lightweight alternative to fine-tuning for natural language generation tasks, which keeps language model parameters frozen, but optimizes a small continuous task-specific vector (called the prefix). Prefix-tuning draws inspiration from prompting, allowing subsequent tokens to attend to this prefix as if it were \"virtual tokens\". We ","claim_type":"abstract","evidence_strength":"source_metadata"}],"why_cited":"Pith tracks Prefix-Tuning: Optimizing Continuous Prompts for Generation because it crossed a citation-hub threshold.","role_counts":[]},"graph":{"co_cited":[{"title":"The Power of Scale for Parameter-Efficient Prompt Tuning","work_id":"1056ba8e-7b3f-4811-be8e-9a3ed9269acb","shared_citers":20},{"title":"LoRA: Low-Rank Adaptation of Large Language Models","work_id":"0426219a-789e-4964-adc8-a04538510818","shared_citers":11},{"title":"BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding","work_id":"ed240a10-5b19-406c-baa5-30803f465785","shared_citers":10},{"title":"An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale","work_id":"e96730e3-129b-4db6-b981-15ab7932e297","shared_citers":8},{"title":"Bitfit: Simple parameter- efficient fine-tuning for transformer-based masked language-models","work_id":"8505729c-c88b-43bd-8e2c-2c94644ca438","shared_citers":8},{"title":"LLaMA: Open and Efficient Foundation Language Models","work_id":"c018fc23-6f3f-4035-9d02-28a2173b2b9d","shared_citers":8},{"title":"Finetuned Language Models Are Zero-Shot Learners","work_id":"7ed6cdaa-ed67-4db4-aceb-b7e1b0e6e7c4","shared_citers":7},{"title":"GPT-4 Technical Report","work_id":"b928e041-6991-4c08-8c81-0359e4097c7b","shared_citers":7},{"title":"Towards a unified view of parameter-efficient transfer learning","work_id":"e758988d-4ae2-44ff-94fc-4417c242b1e4","shared_citers":7},{"title":"Evaluating Large Language Models Trained on Code","work_id":"042493e9-b26f-4b4e-bbde-382072ca9b08","shared_citers":6},{"title":"Llama 2: Open Foundation and Fine-Tuned Chat Models","work_id":"68a5177f-d644-44c1-bd4f-4e5278c22f5d","shared_citers":6},{"title":"OPT: Open Pre-trained Transformer Language Models","work_id":"d7ff3b21-1fff-4cf4-952a-4714e3ef2307","shared_citers":6},{"title":"Decoupled Weight Decay Regularization","work_id":"07ef7360-d385-4033-83f7-8384a6325204","shared_citers":5},{"title":"Distilling the Knowledge in a Neural Network","work_id":"d927ab1f-17b8-4002-9d09-c3d55764fbad","shared_citers":5},{"title":"GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding","work_id":"1bb6fb0c-482d-43cf-94a8-ed18f72a5563","shared_citers":5},{"title":"Language Models are Few-Shot Learners","work_id":"214732c0-2edd-44a0-af9e-28184a2b8279","shared_citers":5},{"title":"P-tuning v2: Prompt tuning can be comparable to fine-tuning universally across scales and tasks","work_id":"ceb357a3-8497-47d1-ae04-4e844ec59db5","shared_citers":5},{"title":"Scaling Laws for Neural Language Models","work_id":"b7dd8749-9c45-4977-ab9b-64478dce1ae8","shared_citers":5},{"title":"The Llama 3 Herd of Models","work_id":"1549a635-88af-4ac1-acfe-51ae7bb53345","shared_citers":5},{"title":"Training Compute-Optimal Large Language Models","work_id":"b2faf28d-86b7-429c-bc42-469458efc246","shared_citers":5},{"title":"Training Verifiers to Solve Math Word Problems","work_id":"acab1aa8-b4d6-40e0-a3ee-25341701dca2","shared_citers":5},{"title":"AdaLoRA: Adaptive Budget Allocation for Parameter-Efficient Fine-Tuning","work_id":"6fa49657-348b-42dd-b870-8758c71af878","shared_citers":4},{"title":"Delta tuning: A comprehensive study of parameter efficient methods for pre-trained language models","work_id":"400697bd-9533-4483-b9c4-68ce3f4ebfd0","shared_citers":4},{"title":"Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context","work_id":"80e3e977-f1bb-4c83-8d0c-1ab0a0c5c3f1","shared_citers":4}],"time_series":[{"n":1,"year":2021},{"n":2,"year":2022},{"n":5,"year":2023},{"n":3,"year":2024},{"n":1,"year":2025},{"n":23,"year":2026}],"dependency_candidates":[]},"authors":[]}}