{"total":14,"items":[{"citing_arxiv_id":"2605.23868","ref_index":1,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"Vision Transformers Need Better Token Interaction","primary_cat":"cs.CV","submitted_at":"2026-05-22T17:25:16+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Replacing softmax attention with entmax-1.5 in DINOv1 ViT-S/16 improves semantic segmentation mIoU on three benchmarks while keeping ImageNet linear-probing accuracy unchanged.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.21800","ref_index":37,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"stable-worldmodel: A Platform for Reproducible World Modeling Research and Evaluation","primary_cat":"cs.LG","submitted_at":"2026-05-20T22:58:15+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"The paper presents stable-worldmodel (swm), a platform with high-performance data layer, modern world model baselines, planning solvers, and extended environments for reproducible research and generalization evaluation.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.20922","ref_index":5,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"Winfree Oscillatory Neural Network","primary_cat":"cs.LG","submitted_at":"2026-05-20T09:08:00+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"WONN is a new oscillatory neural network based on generalized Winfree dynamics that scales competitively to ImageNet-1K and reaches 80.1% accuracy on Maze-hard with 1% of prior model parameters.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.14315","ref_index":10,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"TurboVGGT: Fast Visual Geometry Reconstruction with Adaptive Alternating Attention","primary_cat":"cs.CV","submitted_at":"2026-05-14T03:24:09+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"TurboVGGT uses adaptive sparse global attention with varying sparsity levels across frames and layers plus frame attention to enable faster multi-view 3D reconstruction while keeping competitive quality versus prior state-of-the-art methods.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.13688","ref_index":4,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"MedCore: Boundary-Preserving Medical Core Pruning for MedSAM","primary_cat":"cs.CV","submitted_at":"2026-05-13T15:42:39+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"MedCore achieves 60% parameter and 58.4% FLOP reduction on MedSAM with Dice 0.9549 and preserved boundary metrics via dual-intervention pruning and a new boundary leverage principle.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.12276","ref_index":11,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"NARA: Anchor-Conditioned Relation-Aware Contextualization of Heterogeneous Geoentities","primary_cat":"cs.AI","submitted_at":"2026-05-12T15:37:10+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"NARA introduces a unified self-supervised method for learning relational, context-dependent representations of heterogeneous vector geoentities that improves performance on building classification, traffic prediction, and POI recommendation.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"A unified mechanism that captures both metric and topological relations is needed to properly contextualize geoentities during representation learning. Existing SSL methods for vector geospatial data are tailored to individual geoentities or geometry types. Previous work typically develops isolated frameworks, e.g., for learning representations of point-of-interest (POI) [11, 12, 13], road networks [14, 15], or regional polygons [ 16], each with task-specific architectures. Consequently, these methods lack the flexibility to jointly represent heterogeneous entities within a single representation space. Recent efforts to incorporate multiple types are often constrained to fixed data or application assumptions, failing to generalize across"},{"citing_arxiv_id":"2605.11683","ref_index":1,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"DORA: Dynamic Online Reinforcement Agent for Token Merging in Vision Transformers","primary_cat":"cs.CV","submitted_at":"2026-05-12T07:42:18+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"DORA uses an online RL agent to adaptively merge tokens in Vision Transformers, reporting better accuracy-efficiency trade-offs than static baselines on ImageNet and OOD sets.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.06274","ref_index":7,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"When Labels Have Structure: Improving Image Classification with Hierarchy-Aware Cross-Entropy","primary_cat":"cs.LG","submitted_at":"2026-05-07T13:49:36+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Hierarchy-Aware Cross-Entropy improves image classification by incorporating class hierarchies into the loss through prediction aggregation and ancestral label smoothing, achieving mean accuracy gains of 4.66% in end-to-end training and 2.18% in linear probing.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.04496","ref_index":12,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"The Indra Representation Hypothesis for Multimodal Alignment","primary_cat":"cs.CV","submitted_at":"2026-04-06T07:46:04+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Unimodal model representations converge to a relational structure captured by the Indra representation via V-enriched Yoneda embedding, which is unique and structure-preserving and improves cross-model and cross-modal robustness when instantiated with angular distance.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.04175","ref_index":58,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"Uncertainty-Aware Foundation Models for Clinical Data","primary_cat":"cs.LG","submitted_at":"2026-04-05T16:44:13+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"The work introduces uncertainty-aware foundation models for clinical data by learning set-valued patient representations that enforce consistency across partial observations and integrate multimodal self-supervised objectives.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.16410","ref_index":2,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"Matched-Learning-Rate Analysis of Attention Drift and Transfer Retention in Fine-Tuned CLIP","primary_cat":"cs.LG","submitted_at":"2026-04-01T06:35:09+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"Matched learning-rate experiments show LoRA retains substantially higher zero-shot transfer (45% vs 11% on EuroSAT, 58% vs 9% on Pets) than Full FT in CLIP adaptation.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2512.03563","ref_index":9,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"State Space Models for Bioacoustics: A Comparative Evaluation with Transformers","primary_cat":"cs.SD","submitted_at":"2025-12-03T08:37:09+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"BioMamba matches Transformer performance on bioacoustics tasks while using significantly less VRAM.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2511.15572","ref_index":11,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"From Per-Image Low-Rank to Encoding Mismatch: Rethinking Feature Distillation in Vision Transformers","primary_cat":"cs.CV","submitted_at":"2025-11-19T16:03:21+00:00","verdict":null,"verdict_confidence":null,"novelty_score":null,"formal_verification":null,"one_line_summary":null,"context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2505.02369","ref_index":8,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"Sharpness-Aware Minimization with Z-Score Gradient Filtering","primary_cat":"cs.LG","submitted_at":"2025-05-05T05:13:12+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"Z-Score Filtered SAM retains only high absolute Z-score gradient components per layer during the ascent step and reports higher test accuracy than standard SAM on CIFAR and Tiny-ImageNet benchmarks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null}],"limit":50,"offset":0}