Animation2Code benchmark with 1,069 videos tests VLMs on generating animation code, showing persistent failures in temporal consistency despite good visual matches.
hub
Liger Kernel: Efficient Triton Kernels for
11 Pith papers cite this work. Polarity classification is still indexing.
hub tools
citation-role summary
citation-polarity summary
roles
method 1polarities
use method 1representative citing papers
CODA re-expresses most non-attention Transformer computations as GEMM-plus-epilogue programs using a constrained set of composable primitives to keep intermediate results on-chip and cut global memory traffic.
Decoupling prefix source from token-level KL direction in autoregressive sequence KL yields four objectives unifying SFT, DAgger, offline RL and OPD, with KL mixing and entropy-gated curriculum improving math reasoning accuracy and shortening responses.
GoR extracts citation DAGs using position, frequency, predecessor links and time, then fine-tunes Qwen2.5-7B on 498 seed papers to generate ideas, claiming SOTA over gpt-4o baselines via LLM judges.
BOSCH decomposes attention-head selection for short-context hybridization into layer probing, adaptive ratio assignment, and grouped binary optimization, yielding better efficiency-performance tradeoffs than static or layer-wise baselines.
ICRM casts reward modeling as amortized variational inference over a latent preference probability with a Beta prior, enabling test-time adaptation to unseen preferences and improving benchmark performance.
A new 30k-instance semantic segmentation dataset plus block distillation with sink tokens, dropout, and weighted loss lets block-attention models reach near full-attention performance on long texts.
CCE- is a Triton kernel implementation of cross-entropy loss with negative sampling that reduces memory by more than 10x and accelerates training by up to 2x for large-catalog sequential recommenders.
LongSpec achieves up to 3.26x speedup over Flash Attention baselines on long-context datasets via memory-efficient drafting and verification techniques.
Genome-Factory is an open-source Python library that integrates data pipelines, model tuning, inference, benchmarks, and biological interpretation for genomic foundation models.
NVILA improves on VILA with a scale-then-compress visual token strategy and full-lifecycle efficiency optimizations, matching or exceeding leading VLMs on image and video benchmarks while reducing training cost 1.9-5.1x and latencies 1.2-2.8x.
citing papers explorer
-
NVILA: Efficient Frontier Visual Language Models
NVILA improves on VILA with a scale-then-compress visual token strategy and full-lifecycle efficiency optimizations, matching or exceeding leading VLMs on image and video benchmarks while reducing training cost 1.9-5.1x and latencies 1.2-2.8x.