{"total":21,"items":[{"citing_arxiv_id":"2605.23410","ref_index":28,"ref_count":1,"confidence":0.35,"is_internal_anchor":false,"paper_title":"What Linear Probes Miss: Multi-View Probing for Weight-Space Learning","primary_cat":"cs.LG","submitted_at":"2026-05-22T09:18:01+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"MVProbe is a multi-perspective probing framework for weight-space learning that combines first-order and Gram-based views and outperforms ProbeX on the Model Jungle benchmark.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.22432","ref_index":53,"ref_count":1,"confidence":0.35,"is_internal_anchor":false,"paper_title":"AMUSE: Anytime Muon with Stable Gradient Evaluation","primary_cat":"cs.LG","submitted_at":"2026-05-21T12:55:52+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"AMUSE is a new optimizer integrating Muon orthogonalization with Schedule-Free averaging via adaptive interpolation for schedule-free anytime training that improves Pareto frontiers on vision and LLM tasks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.21882","ref_index":48,"ref_count":1,"confidence":0.35,"is_internal_anchor":false,"paper_title":"Thermo-VL: Extending Vision-Language Models to Thermal Infrared Perception","primary_cat":"cs.CV","submitted_at":"2026-05-21T01:43:05+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Thermo-VL augments a frozen Molmo-7B VLM with a trainable thermal encoder and prompt-conditioned dual-attention fusion to improve cross-spectrum visual reasoning.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.21070","ref_index":147,"ref_count":1,"confidence":0.35,"is_internal_anchor":false,"paper_title":"Towards Understanding Self-Pretraining for Sequence Classification","primary_cat":"cs.LG","submitted_at":"2026-05-20T11:56:15+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Self-pretraining improves Transformer sequence classification by enabling learning of proximity-biased attention from positional encodings that label supervision alone cannot easily acquire from random starts.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.20581","ref_index":67,"ref_count":1,"confidence":0.35,"is_internal_anchor":false,"paper_title":"TriForces: Augmenting Atomistic GNNs for Transferable Representations","primary_cat":"cs.LG","submitted_at":"2026-05-20T00:38:43+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"TriForces adds a model-agnostic three-stream architecture plus self-supervised objectives to atomistic GNNs, improving transfer performance on MatBench, QM9, and limited-data OMat24 without DFT labels.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.17837","ref_index":118,"ref_count":2,"confidence":0.35,"is_internal_anchor":false,"paper_title":"Temporal Aware Pruning for Efficient Diffusion-based Video Generation","primary_cat":"cs.CV","submitted_at":"2026-05-18T04:18:24+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"TAPE applies temporal-aware token pruning with smoothing, reselection, and timestep scheduling to speed up video diffusion models while preserving visual fidelity and coherence.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.16783","ref_index":99,"ref_count":1,"confidence":0.35,"is_internal_anchor":false,"paper_title":"Pretrain-to-alignment learning paradigm to improve geophysical AI applicability under scarce field labels and synthetic-to-field gaps: A case study of relative geologic time estimation in global shelf-edge clinothems","primary_cat":"physics.geo-ph","submitted_at":"2026-05-16T03:34:51+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Introduces a staged pretrain-to-alignment workflow for geophysical AI that improves relative geologic time estimation across global field surveys despite limited labels and domain gaps.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.14654","ref_index":8,"ref_count":1,"confidence":0.35,"is_internal_anchor":false,"paper_title":"Beyond Instance-Level Self-Supervision in 3D Multi-Modal Medical Imaging","primary_cat":"cs.CV","submitted_at":"2026-05-14T10:10:34+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"A self-supervised approach uses consistent spatial relationships of anatomical structures across patients to improve 3D multi-modal medical image representations, yielding modest gains on segmentation and classification tasks.","context_count":1,"top_context_role":"background","top_context_polarity":"unclear","context_text":"t denote the feature representations of h∈H and g∈H in modality i and t respectively. The pseudo correspondence between different instances is built based on mutual nearest neighborhood. Specifically, we define the correspondence C h→g i→t under the distance metric d(·,·) as follows C h→g i→t (zh i,a) =Z g t,a′ ⇐ ⇒ a= arg min o d(zh i,o, Zg t,a′)∧a ′ = arg min o d(zh i,a, Zg t,o). (8) Then, we can build a set of solid indices according to the condition of the correspondence V h,g i,t := n (a, a′) a= arg min o d(zh i,o, Z g t,a′), a′ = arg min o d(zh i,a, Z g t,o) o . (9) Given the a-th patch token feature zh i,a ∈Z h i , similar to the Equation 5, we can define the triplet set for inter-instance loss Triinter i→t, a, h→g :="},{"citing_arxiv_id":"2605.14075","ref_index":9,"ref_count":1,"confidence":0.35,"is_internal_anchor":false,"paper_title":"Rethinking Layer Relevance in Large Language Models Beyond Cosine Similarity","primary_cat":"cs.LG","submitted_at":"2026-05-13T19:51:25+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Cosine similarity poorly predicts performance degradation from layer removal in LLMs, making direct accuracy-drop ablation a more reliable relevance metric.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.11870","ref_index":38,"ref_count":1,"confidence":0.35,"is_internal_anchor":false,"paper_title":"Information theoretic underpinning of self-supervised learning by clustering","primary_cat":"cs.LG","submitted_at":"2026-05-12T09:50:11+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"SSL clustering is derived as KL-divergence optimization where a teacher-distribution constraint normalizes via inverse cluster priors and simplifies to batch centering by Jensen's inequality.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.16384","ref_index":50,"ref_count":1,"confidence":0.35,"is_internal_anchor":false,"paper_title":"Mutual Enhancement Between Global Tokens and Patch Tokens: From Theory to Practice","primary_cat":"cs.CV","submitted_at":"2026-05-11T10:51:02+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"TaTok is a theoretically grounded adaptive tokenization method that uses global tokens and cumulative conditional entropy filtering to reduce redundancy while improving reconstruction quality over fixed-rate patch tokenization.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.09146","ref_index":87,"ref_count":1,"confidence":0.35,"is_internal_anchor":false,"paper_title":"Beyond Thinking: Imagining in 360$^\\circ$ for Humanoid Visual Search","primary_cat":"cs.CV","submitted_at":"2026-05-09T20:10:53+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Imagining in 360° decouples visual search into a single-step probabilistic semantic layout predictor and an actor, removing the need for multi-turn CoT reasoning and trajectory annotations while improving efficiency in 360° environments.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.08935","ref_index":15,"ref_count":4,"confidence":0.35,"is_internal_anchor":false,"paper_title":"PnP-Corrector: A Universal Correction Framework for Coupled Spatiotemporal Forecasting","primary_cat":"cs.AI","submitted_at":"2026-05-09T13:12:33+00:00","verdict":null,"verdict_confidence":null,"novelty_score":null,"formal_verification":null,"one_line_summary":null,"context_count":2,"top_context_role":"background","top_context_polarity":"background","context_text":"wise multiplication, and MLPC is a channel-wise MLP. Differentiable Semi-Lagrangian Advection Block.The DSL-Block introduces an inductive bias for advection. For an input Fin, it first extracts features Ff eat =F AGB(Fin) using an AGB trunk. A 3×3 convolution, Cf low, then operates onF f eat to estimate a displacement flow fieldu: u=u max ·tanh(C f low(Ff eat))(15) where umax is a predefined hyperparameter for the max- imum displacement. We employ a semi-Lagrangian ap- proach, performing backward tracing by subtracting the flow field from a normalized base grid Gbase to compute a sampling grid: Gsample =G base −u(16) Using a differentiable bilinear interpolation operator W, we warp the feature map Ff eat according to the sampling grid,"},{"citing_arxiv_id":"2605.08832","ref_index":6,"ref_count":1,"confidence":0.35,"is_internal_anchor":false,"paper_title":"Inpainting physics: self-supervised learning for context-driven fluid simulation","primary_cat":"cs.LG","submitted_at":"2026-05-09T09:37:34+00:00","verdict":null,"verdict_confidence":null,"novelty_score":null,"formal_verification":null,"one_line_summary":null,"context_count":1,"top_context_role":"method","top_context_polarity":"use_method","context_text":"ground-truth velocityv i ∈R 3. Normalised MSE (nMSE).We report the sample-wise normalised error: nMSE(ˆv,v) = P i∥ˆvi −v i∥2 2P i∥vi∥2 2 (5) Cosine similarity.To measure directional agreement independently of magnitude, we report the mean cosine similarity between predicted and ground-truth velocity vectors: CosSim(ˆv,v) = 1 N NX i=1 ˆvi ·v i ∥ˆvi∥2 ∥vi∥2 .(6) We exclude the lowest 10% ofv i in each sample. Normalised divergence.For an incompressible flow, the continuity equation requires ∇ ·v= 0 . We estimate the velocity JacobianJi ∈R 3×3 at each point from its k nearest neighbours by local least squares on the first-order Taylor expansionv(xj)−v(x i)≈J ⊤ i (xj −x i), and report the divergence residual normalised by the average Jacobian Frobenius norm so that the score is dimensionless and"},{"citing_arxiv_id":"2605.05646","ref_index":15,"ref_count":1,"confidence":0.35,"is_internal_anchor":false,"paper_title":"MUSE: Resolving Manifold Misalignment in Visual Tokenization via Topological Orthogonality","primary_cat":"cs.CV","submitted_at":"2026-05-07T03:53:54+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"MUSE decouples reconstruction and semantic learning in visual tokenization via topological orthogonality, yielding SOTA generation quality and improved semantic performance over its teacher model.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.01466","ref_index":25,"ref_count":2,"confidence":0.35,"is_internal_anchor":false,"paper_title":"SplAttN: Bridging 2D and 3D with Gaussian Soft Splatting and Attention for Point Cloud Completion","primary_cat":"cs.CV","submitted_at":"2026-05-02T14:34:21+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"SplAttN uses Gaussian soft splatting and attention to avoid sparse projection collapse in point cloud completion, achieving SOTA results and demonstrating genuine visual cue reliance on KITTI.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.00130","ref_index":8,"ref_count":1,"confidence":0.35,"is_internal_anchor":false,"paper_title":"Learning Fingerprints for Medical Time Series with Redundancy-Constrained Information Maximization","primary_cat":"cs.LG","submitted_at":"2026-04-30T18:33:40+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"A self-supervised method learns a fixed set of disentangled fingerprint tokens from medical time series by combining reconstruction loss with a total coding rate diversity penalty, framed as a disentangled rate-distortion problem.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2412.14803","ref_index":20,"ref_count":1,"confidence":0.35,"is_internal_anchor":false,"paper_title":"Video Prediction Policy: A Generalist Robot Policy with Predictive Visual Representations","primary_cat":"cs.CV","submitted_at":"2024-12-19T12:48:40+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Video Prediction Policy conditions robot action learning on future-frame predictions inside fine-tuned video diffusion models, yielding 18.6% relative gains on Calvin ABC-D and 31.6% higher real-world success rates.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2408.04840","ref_index":86,"ref_count":1,"confidence":0.35,"is_internal_anchor":false,"paper_title":"mPLUG-Owl3: Towards Long Image-Sequence Understanding in Multi-Modal Large Language Models","primary_cat":"cs.CV","submitted_at":"2024-08-09T03:25:42+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"mPLUG-Owl3 introduces hyper attention blocks to integrate vision and language for long image-sequence understanding and reports SOTA results on single-image, multi-image, and video benchmarks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2311.10122","ref_index":111,"ref_count":1,"confidence":0.35,"is_internal_anchor":false,"paper_title":"Video-LLaVA: Learning United Visual Representation by Alignment Before Projection","primary_cat":"cs.CV","submitted_at":"2023-11-16T10:59:44+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Video-LLaVA creates a unified visual representation for images and videos via pre-projection alignment, enabling mutual enhancement from joint training and strong results on image and video benchmarks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2310.16828","ref_index":157,"ref_count":1,"confidence":0.35,"is_internal_anchor":false,"paper_title":"TD-MPC2: Scalable, Robust World Models for Continuous Control","primary_cat":"cs.LG","submitted_at":"2023-10-25T17:57:07+00:00","verdict":"CONDITIONAL","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"TD-MPC2 scales an implicit world-model RL method to a 317M-parameter agent that masters 80 tasks across four domains with a single hyperparameter configuration.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null}],"limit":50,"offset":0}