TrimCaching introduces parameter-sharing edge caching for AI models, formulates it as a submodular maximization problem with submodular constraints, provides approximation algorithms for special and general cases, and shows improved cache hit ratios in simulations.
Dart: Open-domain structured data record to text generation
3 Pith papers cite this work. Polarity classification is still indexing.
representative citing papers
Adapting large language models by training only a low-rank decomposition BA added to frozen weight matrices matches full fine-tuning while cutting trainable parameters by orders of magnitude and adding no inference latency.
Prefix-tuning matches or exceeds fine-tuning on NLG tasks by optimizing a continuous prefix using 0.1% of parameters while keeping the LM frozen.
citing papers explorer
-
TrimCaching: Parameter-sharing Edge Caching for AI Model Downloading
TrimCaching introduces parameter-sharing edge caching for AI models, formulates it as a submodular maximization problem with submodular constraints, provides approximation algorithms for special and general cases, and shows improved cache hit ratios in simulations.
-
LoRA: Low-Rank Adaptation of Large Language Models
Adapting large language models by training only a low-rank decomposition BA added to frozen weight matrices matches full fine-tuning while cutting trainable parameters by orders of magnitude and adding no inference latency.
-
Prefix-Tuning: Optimizing Continuous Prompts for Generation
Prefix-tuning matches or exceeds fine-tuning on NLG tasks by optimizing a continuous prefix using 0.1% of parameters while keeping the LM frozen.