Chain-of-thought prompting, by including intermediate reasoning steps in few-shot examples, elicits strong reasoning abilities in large language models on arithmetic, commonsense, and symbolic tasks.
hub Mixed citations
Prefix-Tuning: Optimizing Continuous Prompts for Generation
Mixed citation behavior. Most common role is background (33%).
hub tools
citation-role summary
citation-polarity summary
representative citing papers
LOFT unifies orthogonal PEFT by treating adaptation as low-rank subspace rotation and adds task-aware support selection that improves efficiency under fixed budgets.
QD-LLM evolves prompt embeddings via neuroevolution in a quality-diversity framework, delivering 46% higher coverage and 41% higher QD-score than prior methods on coding and writing benchmarks.
PSR models that estimate token-specific steering coefficients from activations outperform standard activation steering and compare favorably to prompting on steering benchmarks.
CODI compresses explicit CoT into continuous space via self-distillation and is the first implicit method to match explicit CoT performance on GSM8k at GPT-2 scale with 3.1x compression and 28.2% higher accuracy than prior implicit approaches.
Q-Align trains LMMs on discrete text-defined levels for visual scoring, achieving SOTA on IQA, IAA, and VQA while unifying the tasks in OneAlign.
OPT releases open decoder-only transformers up to 175B parameters that match GPT-3 performance at one-seventh the carbon cost, along with code and training logs.
Prompt tuning matches full model tuning performance on large language models while tuning only a small fraction of parameters and improves robustness to domain shifts.
Any2Any transfers pretrained humanoid whole-body tracking policies to new embodiments with 1% of original training cost via kinematic alignment and parameter-efficient fine-tuning.
Transformers are limited to a linearly growing number of accessible output sequences with prompt length, with exponential decay in accessible proportion beyond a critical point, even under unbounded context.
Prefill-only adaptation of LLMs yields 1.9x higher throughput for 512 adapters on Llama 3.1 70B with near-parity performance on RL tasks and recoverable loss on SFT.
Summing outputs from separately trained QLoRA PEFT modules provides strong performance for attribute-controlled text generation, often matching or exceeding single-task modules even on single-attribute tests.
Memory Inception is a training-free method that injects latent KV banks at chosen layers to steer LLMs, achieving superior control-drift balance and up to 118x storage reduction on personality and structured-reasoning tasks.
Autoregressive generation modeled as a Markov process over tokens allows new knowledge to be incorporated by extending the state space with a token-to-dictionary mapping whose sample complexity is linear in the number of mapped existing tokens, realized via embedding tuning that induces zero forget.
ClusterRAG applies density-based clustering to user profiles for collaborative retrieval in personalized RAG and reports best performance on LaMP tasks by combining target and similar-user profiles.
VisPrompt improves prompt learning robustness under label noise by injecting instance-level visual semantics via attention and adaptive modulation while freezing the VLM backbone.
BrainROI achieves leading cross-subject brain-captioning results on NSD by combining multi-atlas soft-ROI fusion with interpretable prompt optimization.
PEFT-Bench is a standardized end-to-end benchmark for 7 PEFT methods across 27 NLP datasets on autoregressive LLMs, accompanied by the PSCP metric that penalizes based on trainable parameters, inference speed, and training memory.
PEP-FedPT achieves generalization and personalization in federated ViT prompt tuning via adaptive mixing of class-specific prompts weighted by global class prototypes and client priors, without per-client trainable parameters.
SUPT assigns prompt features at the subgraph level to enable universal prompt tuning for any GNN pre-training strategy and outperforms fine-tuning in 42 of 45 full-shot and 41 of 45 few-shot graph experiments with average gains of 2.5% and 6.6%.
UltraChat supplies 1.5 million high-quality multi-turn dialogues that, when used to fine-tune LLaMA, produce UltraLLaMA, which outperforms prior open-source chat models including Vicuna.
AdaLoRA uses SVD-based pruning to allocate the parameter budget for low-rank fine-tuning updates according to per-matrix importance scores, yielding better performance than uniform allocation especially under tight budgets.
Atlas reaches over 42% accuracy on Natural Questions with only 64 examples, outperforming a 540B-parameter model by 3% with 50x fewer parameters.
DIVE proposes a dimensionality-reduction adapter using self-limiting gradients and implicit view ensembles that outperforms prior adapters on all six BEIR datasets at every tested compression ratio.
citing papers explorer
-
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Chain-of-thought prompting, by including intermediate reasoning steps in few-shot examples, elicits strong reasoning abilities in large language models on arithmetic, commonsense, and symbolic tasks.
-
LOFT: Low-Rank Orthogonal Fine-Tuning via Task-Aware Support Selection
LOFT unifies orthogonal PEFT by treating adaptation as low-rank subspace rotation and adds task-aware support selection that improves efficiency under fixed budgets.
-
Parameter-Efficient Neuroevolution for Diverse LLM Generation: Quality-Diversity Optimization via Prompt Embedding Evolution
QD-LLM evolves prompt embeddings via neuroevolution in a quality-diversity framework, delivering 46% higher coverage and 41% higher QD-score than prior methods on coding and writing benchmarks.
-
Steer Like the LLM: Activation Steering that Mimics Prompting
PSR models that estimate token-specific steering coefficients from activations outperform standard activation steering and compare favorably to prompting on steering benchmarks.
-
CODI: Compressing Chain-of-Thought into Continuous Space via Self-Distillation
CODI compresses explicit CoT into continuous space via self-distillation and is the first implicit method to match explicit CoT performance on GSM8k at GPT-2 scale with 3.1x compression and 28.2% higher accuracy than prior implicit approaches.
-
Q-Align: Teaching LMMs for Visual Scoring via Discrete Text-Defined Levels
Q-Align trains LMMs on discrete text-defined levels for visual scoring, achieving SOTA on IQA, IAA, and VQA while unifying the tasks in OneAlign.
-
OPT: Open Pre-trained Transformer Language Models
OPT releases open decoder-only transformers up to 175B parameters that match GPT-3 performance at one-seventh the carbon cost, along with code and training logs.
-
The Power of Scale for Parameter-Efficient Prompt Tuning
Prompt tuning matches full model tuning performance on large language models while tuning only a small fraction of parameters and improves robustness to domain shifts.
-
Any2Any: Efficient Cross-Embodiment Transfer for Humanoid Whole-Body Tracking
Any2Any transfers pretrained humanoid whole-body tracking policies to new embodiments with 1% of original training cost via kinematic alignment and parameter-efficient fine-tuning.
-
How Many Different Outputs Can a Transformer Generate?
Transformers are limited to a linearly growing number of accessible output sequences with prompt length, with exponential decay in accessible proportion beyond a critical point, even under unbounded context.
-
PreFT: Prefill-only finetuning for efficient inference
Prefill-only adaptation of LLMs yields 1.9x higher throughput for 512 adapters on Llama 3.1 70B with near-parity performance on RL tasks and recoverable loss on SFT.
-
Output Composability of QLoRA PEFT Modules for Plug-and-Play Attribute-Controlled Text Generation
Summing outputs from separately trained QLoRA PEFT modules provides strong performance for attribute-controlled text generation, often matching or exceeding single-task modules even on single-attribute tests.
-
Memory Inception: Latent-Space KV Cache Manipulation for Steering LLMs
Memory Inception is a training-free method that injects latent KV banks at chosen layers to steer LLMs, achieving superior control-drift balance and up to 118x storage reduction on personality and structured-reasoning tasks.
-
Memory as a Markov Matrix: Sample Efficient Knowledge Expansion via Token-to-Dictionary Mapping
Autoregressive generation modeled as a Markov process over tokens allows new knowledge to be incorporated by extending the state space with a token-to-dictionary mapping whose sample complexity is linear in the number of mapped existing tokens, realized via embedding tuning that induces zero forget.
-
ClusterRAG: Cluster-Based Collaborative Filtering for Personalized Retrieval-Augmented Generation
ClusterRAG applies density-based clustering to user profiles for collaborative retrieval in personalized RAG and reports best performance on LaMP tasks by combining target and similar-user profiles.
-
Seeing is Believing: Robust Vision-Guided Cross-Modal Prompt Learning under Label Noise
VisPrompt improves prompt learning robustness under label noise by injecting instance-level visual semantics via attention and adaptive modulation while freezing the VLM backbone.
-
Unified Multimodal Brain Decoding via Cross-Subject Soft-ROI Fusion
BrainROI achieves leading cross-subject brain-captioning results on NSD by combining multi-atlas soft-ROI fusion with interpretable prompt optimization.
-
PEFT-Bench: A Parameter-Efficient Fine-Tuning Methods Benchmark
PEFT-Bench is a standardized end-to-end benchmark for 7 PEFT methods across 27 NLP datasets on autoregressive LLMs, accompanied by the PSCP metric that penalizes based on trainable parameters, inference speed, and training memory.
-
Prompt Estimation from Prototypes for Federated Prompt Tuning of Vision Transformers
PEP-FedPT achieves generalization and personalization in federated ViT prompt tuning via adaptive mixing of class-specific prompts weighted by global class prototypes and client priors, without per-client trainable parameters.
-
Subgraph-level Universal Prompt Tuning
SUPT assigns prompt features at the subgraph level to enable universal prompt tuning for any GNN pre-training strategy and outperforms fine-tuning in 42 of 45 full-shot and 41 of 45 few-shot graph experiments with average gains of 2.5% and 6.6%.
-
Enhancing Chat Language Models by Scaling High-quality Instructional Conversations
UltraChat supplies 1.5 million high-quality multi-turn dialogues that, when used to fine-tune LLaMA, produce UltraLLaMA, which outperforms prior open-source chat models including Vicuna.
-
AdaLoRA: Adaptive Budget Allocation for Parameter-Efficient Fine-Tuning
AdaLoRA uses SVD-based pruning to allocate the parameter budget for low-rank fine-tuning updates according to per-matrix importance scores, yielding better performance than uniform allocation especially under tight budgets.
-
Atlas: Few-shot Learning with Retrieval Augmented Language Models
Atlas reaches over 42% accuracy on Natural Questions with only 64 examples, outperforming a 540B-parameter model by 3% with 50x fewer parameters.
-
DIVE: Embedding Compression via Self-Limiting Gradient Updates
DIVE proposes a dimensionality-reduction adapter using self-limiting gradients and implicit view ensembles that outperforms prior adapters on all six BEIR datasets at every tested compression ratio.
-
PromptRad: Knowledge-Enhanced Multi-Label Prompt-Tuning for Low-Resource Radiology Report Labeling
PromptRad reformulates multi-label radiology report classification as masked language modeling and enriches verbalizers with UMLS synonyms, outperforming baselines with only 32 training examples.
-
Learning to Foresee: Unveiling the Unlocking Efficiency of On-Policy Distillation
On-policy distillation gains efficiency from early foresight in module allocation and update directions, which the proposed EffOPD method exploits for 3x faster training with comparable performance.
-
Towards Scalable Lifelong Knowledge Editing with Selective Knowledge Suppression
LightEdit enables scalable lifelong knowledge editing in LLMs via selective knowledge retrieval and probability suppression during decoding, outperforming prior methods on ZSRE, Counterfact, and RIPE while reducing training costs.
-
PEFT-Factory: Unified Parameter-Efficient Fine-Tuning of Autoregressive Large Language Models
PEFT-Factory supplies a ready-to-use, extensible codebase that unifies 19 PEFT methods and evaluation pipelines for fine-tuning large autoregressive language models.
-
Search-R3: Unifying Reasoning and Embedding in Large Language Models
Search-R3 trains LLMs to output search embeddings as a direct product of step-by-step reasoning via supervised pre-training and a specialized RL environment that avoids full corpus re-encoding.
-
A Survey on Vision-Language-Action Models: An Action Tokenization Perspective
The survey frames VLA models as pipelines that generate progressively grounded action tokens and classifies those tokens into eight types to guide future development.
-
PaLM 2 Technical Report
PaLM 2 reports state-of-the-art results on language, reasoning, and multilingual tasks with improved efficiency over PaLM.
-
AdaFRUGAL: Adaptive Memory-Efficient Training with Dynamic Control
AdaFRUGAL automates FRUGAL's static hyperparameters with linear decay on subspace ratio and loss-aware update frequency, delivering competitive accuracy with lower memory and faster training on C4, VietVault, and GLUE.
- Autoregressive Visual Generation Needs a Prologue