hub

International Conference on Learning Representations , year=

LoRA: Low-Rank Adaptation of Large Language Models , author=

13 Pith papers cite this work. Polarity classification is still indexing.

13 Pith papers citing it

browse 13 citing papers

hub tools

JSON dossier citing papers JSON

representative citing papers

DP-Muon: Differentially Private Optimization via Matrix-Orthogonalized Momentum

cs.LG · 2026-05-13 · unverdicted · novelty 7.0

DP-Muon adapts matrix-orthogonalized momentum optimization to differential privacy via per-matrix clipping and noise addition, with proofs of inherited privacy and optimization guarantees plus a bias-corrected version that improves private fine-tuning utility.

UB-SMoE: Universally Balanced Sparse Mixture-of-Experts for Resource-adaptive Federated Fine-tuning of Foundation Models

cs.LG · 2026-05-15 · unverdicted · novelty 6.0

UB-SMoE balances expert utilization in heterogeneous federated SMoE fine-tuning via Dynamic Modulated Routing and Universal Pseudo-Gradient, delivering up to 45% compute reduction and 8.7x performance gains for low-resource clients over prior LoRA-rank methods.

Continual Fine-Tuning of Large Language Models via Program Memory

cs.LG · 2026-05-13 · unverdicted · novelty 6.0

ProCL organizes LoRA adapters into input-conditioned program memory slots that combine with a distributed adapter to improve retention and reduce forgetting in continual LLM fine-tuning.

Relative Kinetic Utility for Reasoning-Aware Structural Pruning in Large Language Models

cs.LG · 2026-05-09 · unverdicted · novelty 6.0

RKU is a curvature-aware structural pruning framework that improves LLM reasoning accuracy at 40% sparsity, reaching 13.34% on GSM8K while outperforming baselines and better preserving out-of-distribution representations.

Orth-Dion: Eliminating Geometric Mismatch in Distributed Low-Rank Spectral Optimization

cs.LG · 2026-05-07 · conditional · novelty 6.0

Orth-Dion uses QR factorization on the right factor instead of column normalization to eliminate the geometric mismatch in low-rank approximations of spectral optimizers like Muon, achieving O(sqrt(L_r/T)) rate under non-Euclidean smoothness.

Inference Scaling Laws: An Empirical Analysis of Compute-Optimal Inference for Problem-Solving with Language Models

cs.AI · 2024-08-01 · conditional · novelty 6.0

Empirical analysis shows scaling inference compute via strategies like tree search can be more efficient than scaling model parameters, with 7B models plus novel search outperforming 34B models.

SeeClick: Harnessing GUI Grounding for Advanced Visual GUI Agents

cs.HC · 2024-01-17 · unverdicted · novelty 6.0

SeeClick improves visual GUI agents via GUI grounding pre-training on automatically curated data and introduces the ScreenSpot benchmark, with results indicating that stronger grounding boosts downstream task performance.

Modeling Pathology-Like Behavioral Patterns in Language Models Through Behavioral Fine-Tuning

cs.CL · 2026-05-21 · unverdicted · novelty 5.0

Fine-tuning LLMs on structured tasks inspired by maladaptive behaviors produces stable, context-general shifts in next-token distributions and response tendencies consistent with altered behavioral priors.

CKT-WAM: Parameter-Efficient Context Knowledge Transfer Between World Action Models

cs.RO · 2026-05-07 · unverdicted · novelty 5.0

CKT-WAM transfers teacher WAM knowledge to students via compressed text-embedding contexts using LQCA and adapters, reaching 86.1% success on LIBERO-Plus with 1.17% trainable parameters and 83.3% in real-world tasks.

VLA-GSE: Boosting Parameter-Efficient Fine-Tuning in VLA with Generalized and Specialized Experts

cs.RO · 2026-05-07 · unverdicted · novelty 5.0

VLA-GSE uses spectral decomposition of the VLA backbone to create generalized and specialized experts, enabling effective robot task adaptation while updating only 2.51% of parameters and achieving 81.2% zero-shot success on LIBERO-Plus.

GiVA: Gradient-Informed Bases for Vector-Based Adaptation

cs.CL · 2026-04-23 · unverdicted · novelty 5.0

GiVA uses gradients to initialize vector adapters so they match LoRA performance at eight times lower rank while keeping extreme parameter efficiency.

TabEmb: Joint Semantic-Structure Embedding for Table Annotation

cs.LG · 2026-04-21 · unverdicted · novelty 5.0

TabEmb decouples LLM-based semantic column embeddings from graph-based structural modeling to produce joint representations that improve table annotation tasks.

Detecting Language Model Attacks with Perplexity

cs.CL · 2023-08-27 · unverdicted · novelty 5.0

Jailbreak prompts with adversarial suffixes have high GPT-2 perplexity, and a LightGBM model on perplexity and length detects most attacks.

citing papers explorer

Showing 13 of 13 citing papers.

DP-Muon: Differentially Private Optimization via Matrix-Orthogonalized Momentum cs.LG · 2026-05-13 · unverdicted · none · ref 22
DP-Muon adapts matrix-orthogonalized momentum optimization to differential privacy via per-matrix clipping and noise addition, with proofs of inherited privacy and optimization guarantees plus a bias-corrected version that improves private fine-tuning utility.
UB-SMoE: Universally Balanced Sparse Mixture-of-Experts for Resource-adaptive Federated Fine-tuning of Foundation Models cs.LG · 2026-05-15 · unverdicted · none · ref 1
UB-SMoE balances expert utilization in heterogeneous federated SMoE fine-tuning via Dynamic Modulated Routing and Universal Pseudo-Gradient, delivering up to 45% compute reduction and 8.7x performance gains for low-resource clients over prior LoRA-rank methods.
Continual Fine-Tuning of Large Language Models via Program Memory cs.LG · 2026-05-13 · unverdicted · none · ref 9
ProCL organizes LoRA adapters into input-conditioned program memory slots that combine with a distributed adapter to improve retention and reduce forgetting in continual LLM fine-tuning.
Relative Kinetic Utility for Reasoning-Aware Structural Pruning in Large Language Models cs.LG · 2026-05-09 · unverdicted · none · ref 20
RKU is a curvature-aware structural pruning framework that improves LLM reasoning accuracy at 40% sparsity, reaching 13.34% on GSM8K while outperforming baselines and better preserving out-of-distribution representations.
Orth-Dion: Eliminating Geometric Mismatch in Distributed Low-Rank Spectral Optimization cs.LG · 2026-05-07 · conditional · none · ref 12
Orth-Dion uses QR factorization on the right factor instead of column normalization to eliminate the geometric mismatch in low-rank approximations of spectral optimizers like Muon, achieving O(sqrt(L_r/T)) rate under non-Euclidean smoothness.
Inference Scaling Laws: An Empirical Analysis of Compute-Optimal Inference for Problem-Solving with Language Models cs.AI · 2024-08-01 · conditional · none · ref 294
Empirical analysis shows scaling inference compute via strategies like tree search can be more efficient than scaling model parameters, with 7B models plus novel search outperforming 34B models.
SeeClick: Harnessing GUI Grounding for Advanced Visual GUI Agents cs.HC · 2024-01-17 · unverdicted · none · ref 52
SeeClick improves visual GUI agents via GUI grounding pre-training on automatically curated data and introduces the ScreenSpot benchmark, with results indicating that stronger grounding boosts downstream task performance.
Modeling Pathology-Like Behavioral Patterns in Language Models Through Behavioral Fine-Tuning cs.CL · 2026-05-21 · unverdicted · none · ref 7
Fine-tuning LLMs on structured tasks inspired by maladaptive behaviors produces stable, context-general shifts in next-token distributions and response tendencies consistent with altered behavioral priors.
CKT-WAM: Parameter-Efficient Context Knowledge Transfer Between World Action Models cs.RO · 2026-05-07 · unverdicted · none · ref 38
CKT-WAM transfers teacher WAM knowledge to students via compressed text-embedding contexts using LQCA and adapters, reaching 86.1% success on LIBERO-Plus with 1.17% trainable parameters and 83.3% in real-world tasks.
VLA-GSE: Boosting Parameter-Efficient Fine-Tuning in VLA with Generalized and Specialized Experts cs.RO · 2026-05-07 · unverdicted · none · ref 47
VLA-GSE uses spectral decomposition of the VLA backbone to create generalized and specialized experts, enabling effective robot task adaptation while updating only 2.51% of parameters and achieving 81.2% zero-shot success on LIBERO-Plus.
GiVA: Gradient-Informed Bases for Vector-Based Adaptation cs.CL · 2026-04-23 · unverdicted · none · ref 4
GiVA uses gradients to initialize vector adapters so they match LoRA performance at eight times lower rank while keeping extreme parameter efficiency.
TabEmb: Joint Semantic-Structure Embedding for Table Annotation cs.LG · 2026-04-21 · unverdicted · none · ref 27
TabEmb decouples LLM-based semantic column embeddings from graph-based structural modeling to produce joint representations that improve table annotation tasks.
Detecting Language Model Attacks with Perplexity cs.CL · 2023-08-27 · unverdicted · none · ref 69
Jailbreak prompts with adversarial suffixes have high GPT-2 perplexity, and a LightGBM model on perplexity and length detects most attacks.

International Conference on Learning Representations , year=

hub tools

fields

years

verdicts

representative citing papers

citing papers explorer