InInternational Conference on Learning Representations

Pin-Lun Hsu, Yun Dai, Vignesh Kothapalli, Qingquan Song, Shao Tang, Siyu Zhu, Steven Shimizu, Shivam Sahni, Haowen Ning, Yanning Chen · 2024 · arXiv 2410.10989

9 Pith papers cite this work. Polarity classification is still indexing.

9 Pith papers citing it

read on arXiv browse 9 citing papers

representative citing papers

CODA: Rewriting Transformer Blocks as GEMM-Epilogue Programs

cs.LG · 2026-05-19 · unverdicted · novelty 7.0 · 2 refs

CODA re-expresses most non-attention Transformer computations as GEMM-plus-epilogue programs using a constrained set of composable primitives to keep intermediate results on-chip and cut global memory traffic.

Decoupling KL and Trajectories: A Unified Perspective for SFT, DAgger, Offline RL, and OPD in LLM Distillation

cs.LG · 2026-05-16 · unverdicted · novelty 7.0

Decoupling prefix source from token-level KL direction in autoregressive sequence KL yields four objectives unifying SFT, DAgger, offline RL and OPD, with KL mixing and entropy-gated curriculum improving math reasoning accuracy and shortening responses.

BOSCH: Black-Box Binary Optimization for Short-Context Attention-Head Selection in LLMs

cs.CL · 2026-04-07 · unverdicted · novelty 7.0

BOSCH decomposes attention-head selection for short-context hybridization into layer probing, adaptive ratio assignment, and grouped binary optimization, yielding better efficiency-performance tradeoffs than static or layer-wise baselines.

Bayesian Preference Learning for Test-Time Steerable Reward Models

cs.LG · 2026-02-09 · unverdicted · novelty 7.0

ICRM casts reward modeling as amortized variational inference over a latent preference probability with a Beta prior, enabling test-time adaptation to unseen preferences and improving benchmark performance.

Faster and Memory-Efficient Training of Sequential Recommendation Models for Large Catalogs

cs.IR · 2025-08-13 · accept · novelty 6.0

CCE- is a Triton kernel implementation of cross-entropy loss with negative sampling that reduces memory by more than 10x and accelerates training by up to 2x for large-catalog sequential recommenders.

LongSpec: Long-Context Lossless Speculative Decoding with Efficient Drafting and Verification

cs.CL · 2025-02-24 · unverdicted · novelty 6.0

LongSpec achieves up to 3.26x speedup over Flash Attention baselines on long-context datasets via memory-efficient drafting and verification techniques.

Genome-Factory: A Library for Tuning, Deploying, and Interpreting Genomic Foundation Models

q-bio.GN · 2025-09-13 · conditional · novelty 5.0

Genome-Factory is an open-source Python library that integrates data pipelines, model tuning, inference, benchmarks, and biological interpretation for genomic foundation models.

NVILA: Efficient Frontier Visual Language Models

cs.CV · 2024-12-05 · unverdicted · novelty 5.0

NVILA improves on VILA with a scale-then-compress visual token strategy and full-lifecycle efficiency optimizations, matching or exceeding leading VLMs on image and video benchmarks while reducing training cost 1.9-5.1x and latencies 1.2-2.8x.

Towards Generalization of Block Attention via Automatic Segmentation and Block Distillation

cs.CL · 2026-05-15 · 2 refs

citing papers explorer

Showing 9 of 9 citing papers.

CODA: Rewriting Transformer Blocks as GEMM-Epilogue Programs cs.LG · 2026-05-19 · unverdicted · none · ref 6 · 2 links
CODA re-expresses most non-attention Transformer computations as GEMM-plus-epilogue programs using a constrained set of composable primitives to keep intermediate results on-chip and cut global memory traffic.
Decoupling KL and Trajectories: A Unified Perspective for SFT, DAgger, Offline RL, and OPD in LLM Distillation cs.LG · 2026-05-16 · unverdicted · none · ref 14
Decoupling prefix source from token-level KL direction in autoregressive sequence KL yields four objectives unifying SFT, DAgger, offline RL and OPD, with KL mixing and entropy-gated curriculum improving math reasoning accuracy and shortening responses.
BOSCH: Black-Box Binary Optimization for Short-Context Attention-Head Selection in LLMs cs.CL · 2026-04-07 · unverdicted · none · ref 2
BOSCH decomposes attention-head selection for short-context hybridization into layer probing, adaptive ratio assignment, and grouped binary optimization, yielding better efficiency-performance tradeoffs than static or layer-wise baselines.
Bayesian Preference Learning for Test-Time Steerable Reward Models cs.LG · 2026-02-09 · unverdicted · none · ref 5
ICRM casts reward modeling as amortized variational inference over a latent preference probability with a Beta prior, enabling test-time adaptation to unseen preferences and improving benchmark performance.
Faster and Memory-Efficient Training of Sequential Recommendation Models for Large Catalogs cs.IR · 2025-08-13 · accept · none · ref 14
CCE- is a Triton kernel implementation of cross-entropy loss with negative sampling that reduces memory by more than 10x and accelerates training by up to 2x for large-catalog sequential recommenders.
LongSpec: Long-Context Lossless Speculative Decoding with Efficient Drafting and Verification cs.CL · 2025-02-24 · unverdicted · none · ref 7
LongSpec achieves up to 3.26x speedup over Flash Attention baselines on long-context datasets via memory-efficient drafting and verification techniques.
Genome-Factory: A Library for Tuning, Deploying, and Interpreting Genomic Foundation Models q-bio.GN · 2025-09-13 · conditional · none · ref 15
Genome-Factory is an open-source Python library that integrates data pipelines, model tuning, inference, benchmarks, and biological interpretation for genomic foundation models.
NVILA: Efficient Frontier Visual Language Models cs.CV · 2024-12-05 · unverdicted · none · ref 35
NVILA improves on VILA with a scale-then-compress visual token strategy and full-lifecycle efficiency optimizations, matching or exceeding leading VLMs on image and video benchmarks while reducing training cost 1.9-5.1x and latencies 1.2-2.8x.
Towards Generalization of Block Attention via Automatic Segmentation and Block Distillation cs.CL · 2026-05-15 · unreviewed · ref 2 · 2 links

InInternational Conference on Learning Representations

fields

years

verdicts

representative citing papers

citing papers explorer