{"total":11,"items":[{"citing_arxiv_id":"2606.28593","ref_index":12,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Animation2Code: Evaluating Temporal Visual Reasoning in Video-to-Code Generation","primary_cat":"cs.CV","submitted_at":"2026-06-26T20:38:04+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Animation2Code benchmark with 1,069 videos tests VLMs on generating animation code, showing persistent failures in temporal consistency despite good visual matches.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.19269","ref_index":6,"ref_count":2,"confidence":0.9,"is_internal_anchor":false,"paper_title":"CODA: Rewriting Transformer Blocks as GEMM-Epilogue Programs","primary_cat":"cs.LG","submitted_at":"2026-05-19T02:30:43+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"CODA re-expresses most non-attention Transformer computations as GEMM-plus-epilogue programs using a constrained set of composable primitives to keep intermediate results on-chip and cut global memory traffic.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.16826","ref_index":14,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Decoupling KL and Trajectories: A Unified Perspective for SFT, DAgger, Offline RL, and OPD in LLM Distillation","primary_cat":"cs.LG","submitted_at":"2026-05-16T06:05:27+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Decoupling prefix source from token-level KL direction in autoregressive sequence KL yields four objectives unifying SFT, DAgger, offline RL and OPD, with KL mixing and entropy-gated curriculum improving math reasoning accuracy and shortening responses.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.15913","ref_index":2,"ref_count":3,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Towards Generalization of Block Attention via Automatic Segmentation and Block Distillation","primary_cat":"cs.CL","submitted_at":"2026-05-15T12:51:32+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"A new 30k-instance semantic segmentation dataset plus block distillation with sink tokens, dropout, and weighted loss lets block-attention models reach near full-attention performance on long texts.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.14790","ref_index":13,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Graphs of Research: Citation Evolution Graphs as Supervision for Research Idea Generation","primary_cat":"cs.CL","submitted_at":"2026-05-14T12:57:56+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"GoR extracts citation DAGs using position, frequency, predecessor links and time, then fine-tunes Qwen2.5-7B on 498 seed papers to generate ideas, claiming SOTA over gpt-4o baselines via LLM judges.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.05942","ref_index":2,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"BOSCH: Black-Box Binary Optimization for Short-Context Attention-Head Selection in LLMs","primary_cat":"cs.CL","submitted_at":"2026-04-07T14:38:05+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"BOSCH decomposes attention-head selection for short-context hybridization into layer probing, adaptive ratio assignment, and grouped binary optimization, yielding better efficiency-performance tradeoffs than static or layer-wise baselines.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2602.08819","ref_index":5,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Bayesian Preference Learning for Test-Time Steerable Reward Models","primary_cat":"cs.LG","submitted_at":"2026-02-09T15:55:56+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"ICRM casts reward modeling as amortized variational inference over a latent preference probability with a Beta prior, enabling test-time adaptation to unseen preferences and improving benchmark performance.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2509.12266","ref_index":15,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Genome-Factory: A Library for Tuning, Deploying, and Interpreting Genomic Foundation Models","primary_cat":"q-bio.GN","submitted_at":"2025-09-13T03:31:55+00:00","verdict":"CONDITIONAL","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Genome-Factory is an open-source Python library that integrates data pipelines, model tuning, inference, benchmarks, and biological interpretation for genomic foundation models.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2509.09682","ref_index":14,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Faster and Memory-Efficient Training of Sequential Recommendation Models for Large Catalogs","primary_cat":"cs.IR","submitted_at":"2025-08-13T15:03:38+00:00","verdict":"ACCEPT","verdict_confidence":"MODERATE","novelty_score":6.0,"formal_verification":"none","one_line_summary":"CCE- is a Triton kernel implementation of cross-entropy loss with negative sampling that reduces memory by more than 10x and accelerates training by up to 2x for large-catalog sequential recommenders.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2502.17421","ref_index":7,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"LongSpec: Long-Context Lossless Speculative Decoding with Efficient Drafting and Verification","primary_cat":"cs.CL","submitted_at":"2025-02-24T18:53:31+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"LongSpec achieves up to 3.26x speedup over Flash Attention baselines on long-context datasets via memory-efficient drafting and verification techniques.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2412.04468","ref_index":35,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"NVILA: Efficient Frontier Visual Language Models","primary_cat":"cs.CV","submitted_at":"2024-12-05T18:59:55+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"NVILA improves on VILA with a scale-then-compress visual token strategy and full-lifecycle efficiency optimizations, matching or exceeding leading VLMs on image and video benchmarks while reducing training cost 1.9-5.1x and latencies 1.2-2.8x.","context_count":1,"top_context_role":"method","top_context_polarity":"use_method","context_text":"underutilized and can benefit greatly from increasing the batch size. As shown in Table 4, applying FP8 to both weights and activations allows NVILA to increase the batch size from 4 to 16, resulting in a 2×speedup. When gradient checkpointing is enabled, quantizing activations becomes less essential. Instead, we integrate the cross-entropy kernel from Liger [35] to reduce peak memory usage due to Qwen's large vocabulary size. In this case, FP8 training can still provide a 1.2×speedup compared to BF16 training. 2.3. Efficient Fine-Tuning Once a foundation VLM is trained, domain-specific fine-tuning is needed to adapt the model for special- ized tasks or domains. While fine-tuning effectively improves domain-specific vocabulary and concepts,"}],"limit":50,"offset":0}