{"total":138,"items":[{"citing_arxiv_id":"2605.23446","ref_index":24,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"Weisfeiler-Leman Is Incomplete on Simple Spectrum Graphs, so Canonicalize Them","primary_cat":"cs.LG","submitted_at":"2026-05-22T10:01:28+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"k-WL is incomplete on simple spectrum graphs; PRiSM is the first provably complete canonicalization for their eigendecompositions.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.23346","ref_index":80,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"Contrastive Distribution Matching for Amortized Sequential Monte Carlo in Discrete Diffusion","primary_cat":"cs.LG","submitted_at":"2026-05-22T08:06:52+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"CDM amortizes SMC inference for reward-tilted discrete diffusion by training a parameterized twist function on contrastive samples with closed-form kernels.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.23258","ref_index":26,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"A Simple Plug-in for Improving Eviction-Based KV Cache Compression","primary_cat":"cs.LG","submitted_at":"2026-05-22T06:00:15+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"VECTOR augments eviction-based KV cache compression with three-way token routing that combines importance scoring and offline regression-based reconstructability estimation to improve quality at high compression ratios.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.23156","ref_index":12,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"Any-Dimensional Invariant Universality","primary_cat":"cs.LG","submitted_at":"2026-05-22T02:07:27+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":8.0,"formal_verification":"none","one_line_summary":"A systematic approach maps any-dimensional invariant functions to a unique function on an infinite-dimensional limit space admitting a topology with compact sets where universality holds, with examples of non-universal architectures and fixes.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.23138","ref_index":100,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"Classical State Preparation for Variational Quantum Algorithms via Reinforcement Learning","primary_cat":"quant-ph","submitted_at":"2026-05-22T01:24:54+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"CRiSP uses neural-guided MCTS and curriculum learning to insert Clifford prefixes before parameterized rotations in VQAs, yielding mean 3.17x and max 45x gains in energy accuracy on 22-qubit QAOA benchmarks versus prior Clifford initializers.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.22715","ref_index":67,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"AnyMo: Geometry-Aware Setup-Agnostic Modeling of Human Motion in the Wild","primary_cat":"cs.CV","submitted_at":"2026-05-21T16:52:10+00:00","verdict":null,"verdict_confidence":null,"novelty_score":null,"formal_verification":null,"one_line_summary":null,"context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.22472","ref_index":7,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"Winner-Take-All bottlenecks enforce disentangled symbolic representations in multi-task learning","primary_cat":"cs.LG","submitted_at":"2026-05-21T13:33:08+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"WTA bottlenecks enforce highly symbolic, disentangled categorical representations of latent factors under defined conditions in multi-task DNNs, shown via theorem and experiments on two datasets.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.21714","ref_index":35,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"AVI-HT: Adaptive Vision-IMU Fusion for 3D Hand Tracking","primary_cat":"cs.CV","submitted_at":"2026-05-20T20:19:24+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"AVI-HT adaptively fuses vision and IMU data via attention to cut 3D hand keypoint error by 16.1% (24.2% wrist-aligned) on a new 100K+ sample DexGloveHOI dataset in occluded hand-object scenarios.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.21272","ref_index":94,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"MONET: A Massive, Open, Non-redundant and Enriched Text-to-image dataset","primary_cat":"cs.CV","submitted_at":"2026-05-20T15:04:56+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"MONET is an open 104.9M image-text pair dataset created via safety filtering, deduplication, and multi-VLM recaptioning from 2.9B raw pairs, validated by training a competitive 4B-parameter latent diffusion model.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"Data deduplicationAn often under-addressed issue in large multimodal corpora is the prevalence of duplicate or near-duplicate samples, which skew the data distribution and induce memorization [43, 52], a particularly pressing concern for diffusion models [85, 8]. MONET addresses this issue by using a combination of deduplication methods, such as perceptual hashing [94] and Self-Supervised Copy Detection (SSCD) [66], to remove near-duplicate images from the dataset. 3 Dataset construction In this section, we detail the construction of the MONET dataset. Starting from heterogeneous open sources totaling 2.9B raw image-text pairs, we apply successive stages of pre-filtering, safety filtering, deduplication, and domain-based filtering, followed by multi-VLM re-captioning and synthetic-data"},{"citing_arxiv_id":"2605.21160","ref_index":23,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"Learning First Integrals via Backward-Generated Data and Guided Reinforcement Learning","primary_cat":"cs.LG","submitted_at":"2026-05-20T13:27:39+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"FISolver trains a compact LLM on backward-generated (differential equation, first integral) pairs and uses guided reinforcement learning to outperform larger models and Mathematica on first-integral benchmarks at lower cost.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.20856","ref_index":42,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"DISC: Decoupling Instruction from State-Conditioned Control via Policy Generation","primary_cat":"cs.RO","submitted_at":"2026-05-20T07:45:50+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"A hypernetwork generates complete task-specific visuomotor policy parameters from instructions alone to structurally eliminate observation leakage in language-conditioned robotic control.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.20730","ref_index":43,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"Distributional Alignment as a Criterion for Designing Task Vectors in In-Context Learning","primary_cat":"cs.CL","submitted_at":"2026-05-20T05:26:38+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"A distributional alignment metric d_NTP and a linear regression method LTV for task vectors that improves accuracy by 9.2% over baselines on classification and regression tasks across multiple LLMs.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.20624","ref_index":43,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"Accelerating Video Inverse Problem Solvers with Autoregressive Diffusion Models","primary_cat":"cs.CV","submitted_at":"2026-05-20T02:16:30+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"AVIS applies autoregressive diffusion models to video inverse problems by streaming restoration with measurement-consistent initialization, reducing latency from 114s to 4s and raising throughput to 1.18 FPS (or 5.91 FPS in the Flash variant).","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.20134","ref_index":23,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"TrajTok: Adaptive Spatial Tokenization for Trajectory Representation Learning","primary_cat":"cs.LG","submitted_at":"2026-05-19T17:18:32+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"TrajTok learns multi-resolution hexagonal spatial tokens from GPS data and pretrains a factorized transformer with ST-RoPE and masked modeling to yield frozen encoders that outperform task-specific methods on similarity, classification, and travel-time tasks in the Porto dataset.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.20005","ref_index":64,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"Fine-Tuning Without Forgetting via Loss-Adaptive Learning Rates","primary_cat":"cs.LG","submitted_at":"2026-05-19T15:36:52+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"FINCH is a loss-adaptive learning-rate schedule that reduces forgetting by 93% on average during LLM fine-tuning while matching standard task performance across several benchmarks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.19619","ref_index":41,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"MiMuon: Mixed Muon Optimizer with Improved Generalization for Large Models","primary_cat":"cs.LG","submitted_at":"2026-05-19T09:56:27+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"MiMuon is a hybrid optimizer that achieves a generalization error bound of O(1/N) independent of the small singular-value gap that limits the original Muon bound, while retaining the same O(1/T^{1/4}) convergence rate.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.19376","ref_index":39,"ref_count":2,"confidence":0.55,"is_internal_anchor":false,"paper_title":"Generative Recursive Reasoning","primary_cat":"cs.AI","submitted_at":"2026-05-19T05:20:56+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"GRAM is a latent-variable generative model that performs recursive reasoning via stochastic trajectories, trained with amortized variational inference to support multi-hypothesis reasoning and unconditional generation.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.18753","ref_index":3,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"DashAttention: Differentiable and Adaptive Sparse Hierarchical Attention","primary_cat":"cs.CL","submitted_at":"2026-05-18T17:59:52+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"DashAttention introduces differentiable adaptive sparse hierarchical attention via α-entmax block selection, achieving full-attention accuracy at 75% sparsity with improved Pareto performance over NSA and InfLLMv2.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.18735","ref_index":41,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"PIXLRelight: Controllable Relighting via Intrinsic Conditioning","primary_cat":"cs.CV","submitted_at":"2026-05-18T17:55:03+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"A transformer-based neural renderer that transfers arbitrary PBR lighting to single images via shared intrinsic conditioning extracted from both multi-illumination photos and path-traced coarse 3D renders.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.18932","ref_index":24,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"HypergraphFormer: Learning Hypergraphs from LLMs for Editable Floor Plan Generation","primary_cat":"cs.LG","submitted_at":"2026-05-18T15:00:29+00:00","verdict":null,"verdict_confidence":null,"novelty_score":null,"formal_verification":null,"one_line_summary":null,"context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.17866","ref_index":42,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"DAD4TS: Data-Augmentation-Oriented Diffusion Model for Time-Series Forecasting with Small-Scale Data","primary_cat":"cs.LG","submitted_at":"2026-05-18T05:19:13+00:00","verdict":null,"verdict_confidence":null,"novelty_score":null,"formal_verification":null,"one_line_summary":null,"context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.17311","ref_index":39,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"SpecSem-Net: Integrating Spectral and Semantic Features for Robust AI-generated Video Detection","primary_cat":"cs.CV","submitted_at":"2026-05-17T08:02:42+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"SpecSem-Net integrates Fourier-based spectral filtering with semantic-guided gated merging to detect AI-generated videos, reporting 87.25% accuracy on a new benchmark of five commercial generators and 95.59% on public datasets.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.16147","ref_index":4,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"Registers Matter for Pixel-Space Diffusion Transformers","primary_cat":"cs.CV","submitted_at":"2026-05-15T16:27:10+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Register tokens enhance pixel-space DiT training and output quality via cleaner high-noise feature maps, and a dual-stream design adds further gains with little overhead.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.15923","ref_index":14,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"Invaria: Learning Scale and Density Invariance in Point Clouds via Next-Resolution Prediction","primary_cat":"cs.CV","submitted_at":"2026-05-15T13:06:41+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Invaria trains point cloud encoders with next-resolution prediction to learn scale and density invariant features, yielding higher mIoU on ScanNet under lower resolution and scaled objects while using a smaller model.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.15752","ref_index":19,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"Forecasting megaelectron-volt electron flux in the Earth's outer radiation belt using supervised machine learning algorithms and a timeseries foundation model","primary_cat":"astro-ph.IM","submitted_at":"2026-05-15T09:13:02+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Hybrid TimesFM plus ridge regression on covariates forecasts 1-MeV electron flux with average R² of 0.9 on out-of-sample 2024 data, outperforming linear regression, CNN, LSTM and Transformer models.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.15676","ref_index":41,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"Dynamic Chunking for Diffusion Language Models","primary_cat":"cs.CL","submitted_at":"2026-05-15T06:56:05+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"DCDM replaces positional blocks with learnable semantic chunks via differentiable Chunking Attention, yielding consistent gains over block and unstructured diffusion baselines up to 1.5B parameters.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.15488","ref_index":100,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"SurvivalPFN: Amortizing Survival Prediction via In-Context Bayesian Inference","primary_cat":"cs.LG","submitted_at":"2026-05-15T00:13:04+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"SurvivalPFN amortizes Bayesian survival analysis for right-censored data by pretraining a prior-data fitted network on synthetic identifiable DGPs and then performing in-context inference, achieving competitive results on 61 real datasets.","context_count":1,"top_context_role":"method","top_context_polarity":"use_method","context_text":"selection, training, and validation require substantial domain and methodological expertise. This work aims to design a survival estimator that(i) avoids rigid simplifying assumptions;(ii) adapts to the effective complexity of the observed data; and(iii) enables efficient inference without extensive training or hyperparameter tuning. To do so, we build on prior-data fitted networks (PFNs) [66]: transformer-based models [100] that learn in-context approximations of posterior predictive distributions using synthetic tasks. Rather than fitting a new survival model for each dataset, SurvivalPFN shifts computation to an offline prior-data pretraining stage. At inference time, an observed right-censored dataset is provided as context, and a single forward pass returns posterior survival distributions for new individuals."},{"citing_arxiv_id":"2605.16423","ref_index":35,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"Nonlinear Bipolar Compensation: Handling Outliers in Post-Training Quantization","primary_cat":"cs.CV","submitted_at":"2026-05-14T14:55:46+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Nonlinear Bipolar Compensation with Bipolar Logarithmic Transformation reduces outlier effects in post-training quantization by performing compensation in a compressed transformed space.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.14458","ref_index":26,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"OmniDrop: Layer-wise Token Pruning for Omni-modal LLMs via Query-Guidance","primary_cat":"cs.AI","submitted_at":"2026-05-14T06:54:37+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"OmniDrop is a training-free layer-wise token pruning framework for omni-modal LLMs that uses query guidance and temporal diversity to reduce prefill latency by up to 40% and memory by 14.7% while improving benchmark scores by up to 3.58 points.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.14333","ref_index":44,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"InsightTok: Improving Text and Face Fidelity in Discrete Tokenization for Autoregressive Image Generation","primary_cat":"cs.CV","submitted_at":"2026-05-14T03:57:25+00:00","verdict":"CONDITIONAL","verdict_confidence":"MODERATE","novelty_score":6.0,"formal_verification":"none","one_line_summary":"InsightTok improves text and face fidelity in discrete image tokenization via content-aware perceptual losses, with gains transferring to autoregressive generation.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.17620","ref_index":91,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"SynVA: A Modular Toolkit for Vessel Generation and Aneurysm Editing","primary_cat":"cs.CV","submitted_at":"2026-05-13T16:00:50+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"SynVA toolkit generates realistic vascular meshes and anatomically plausible aneurysms, releasing 50,000 labeled samples for medical vision tasks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.13638","ref_index":35,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"CO-MAP: A Reinforcement Learning Approach to the Qubit Allocation Problem","primary_cat":"quant-ph","submitted_at":"2026-05-13T15:04:09+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Reinforcement learning policy for qubit mapping reduces SWAP overhead by 65-85% versus standard quantum compilers on MQTBench and Queko benchmark circuits.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.12836","ref_index":30,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"Discrete Stochastic Localization for Non-autoregressive Generation","primary_cat":"cs.LG","submitted_at":"2026-05-13T00:12:24+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"DSL provides a continuous embedding framework where one denoiser supports a family of SNR paths for discrete sequences, improving MAUVE scores on OpenWebText and allowing random-order and hybrid sampling from a fine-tuned MDLM checkpoint.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.12624","ref_index":43,"ref_count":2,"confidence":0.55,"is_internal_anchor":false,"paper_title":"MindVLA-U1: VLA Beats VA with Unified Streaming Architecture for Autonomous Driving","primary_cat":"cs.RO","submitted_at":"2026-05-12T18:09:42+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"MindVLA-U1 is the first unified streaming VLA architecture that surpasses human drivers on WOD-E2E planning metrics while matching VA latency and preserving language interfaces.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.12343","ref_index":22,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"Neural-Schwarz Tiling for Geometry-Universal PDE Solving at Scale","primary_cat":"cs.LG","submitted_at":"2026-05-12T16:20:26+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Local neural operators on 3x3x3 patches, composed via Schwarz iteration, solve large-scale nonlinear elasticity on arbitrary geometries without domain-specific retraining.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"operators extend this idea to operator learning on irregular domains, using graph kernels, point-cloud representations, signed-distance functions, or learned mappings between irregular and regular domains [18]. More recently, transformer architectures have advanced PDE modeling; Transolver [19, 20] handles complex geometries via physics-aware attention mechanisms [21], whereas HAMLET [22] tackles parametric problems using graph attention. These approaches are important steps toward geometry-aware and parametric learned simulation. However, they still primarily follow a global learning paradigm: the model is trained to map from a full geometry and its associated physical inputs to a full solution field. Thus, generalization to geometries, topologies, domain sizes, or boundary-condition configurations far outside the training"},{"citing_arxiv_id":"2605.12241","ref_index":25,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"Pretraining Strategies and Scaling for ECG Foundation Models: A Systematic Study","primary_cat":"eess.SP","submitted_at":"2026-05-12T15:10:15+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Contrastive predictive coding pretraining combined with structured state space models yields the strongest ECG foundation models, with continued gains from scaling data to 11 million samples.","context_count":1,"top_context_role":"method","top_context_polarity":"use_method","context_text":"The SSL objectives underlying these models are largely inherited from speech (CPC [22], HuBERT [23]) and vision (I-JEPA [24]) adapted to physiological time series without systematic evaluation of which objectives or backbone archi- tectures are best suited for ECG data. Backbone choices have similarly followed trends from other domains, with transformers [ 25] dominating despite structured state space models [ 26] showing superior performance on long sequences in supervised ECG settings [27, 28]. Scaling lawsScaling laws relating model performance to pretraining dataset size have been studied extensively in language [ 29] and vision [ 30], typically revealing power-law improvements with increasing data."},{"citing_arxiv_id":"2605.12163","ref_index":42,"ref_count":2,"confidence":0.55,"is_internal_anchor":false,"paper_title":"Self-Consistent Latent Reasoning: Long Latent Sequence Reasoning for Vision-Language Model","primary_cat":"cs.CV","submitted_at":"2026-05-12T14:13:08+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"SCOLAR fixes information gain collapse in latent visual reasoning by generating independent auxiliary visual tokens via a detransformer, extending acceptable CoT length over 30x and delivering +14.12% gains on reasoning benchmarks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.12049","ref_index":6,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"Scaling Laws and Tradeoffs in Recurrent Networks of Expressive Neurons","primary_cat":"cs.LG","submitted_at":"2026-05-12T12:29:33+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Recurrent networks built from tunable expressive neurons reveal scaling laws with an optimal parameter split that shifts toward higher per-neuron complexity at larger scales.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"Recurrent networks and modular architecturessplit computation across interacting stateful com- ponents.Computational neurosciencetypically asks how feedback and population dynamics in networks of simple spike or rate-based neurons supports computation [ 26-30].Deep learningis typically focused on performance and found specialized modules [31-33, 8] and parallel modular blocks like mixture of experts and multi-head setups [6, 34] to be beneficial, recently also structured state-spaces [35, 7]. These approaches compartmentalize computation yet have large-dimensional outputs that act akin to isolated brain circuits rather than a single neuron. Closest-in-spirit to our work are Continuous Thought Machines [36] which also employ expressive neural units, yet encode infor- mation in neuron-external synchronization patterns."},{"citing_arxiv_id":"2605.16392","ref_index":42,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"Bridging the Modality Bottleneck in Pathology MIL through Virtual Molecular Staining","primary_cat":"q-bio.QM","submitted_at":"2026-05-12T06:49:35+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"MIST augments MIL projection layers with cross-modal gene-expression prototypes derived from spatial transcriptomics, yielding consistent gains on survival, subtyping, and biomarker tasks across 23 endpoints and 8 aggregators.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.11300","ref_index":57,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"Can Graphs Help Vision SSMs See Better?","primary_cat":"cs.CV","submitted_at":"2026-05-11T22:40:37+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"GraphScan replaces geometric or coordinate-based scanning in Vision SSMs with learned local semantic graph routing, yielding SOTA results among such models on classification and segmentation tasks.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"1 Introduction The design of visual backbones has repeatedly been shaped by how an image is represented before it is processed. Convolutional networks preserve the image as a regular grid [28, 26, 47, 50, 19, 21, 51, 43, 37, 65, 59, 8, 34, 74] while Vision Transformers recast it as a sequence of patches with self-attention for long-range interaction [ 57, 10, 54, 36, 61, 62, 3, 9, 73, 17, 67, 77, 56, 5, 2, 63]; all-MLP and graph-based variants further show this representation choice is broader than convolution or attention alone [53, 55, 32, 16]. More recently, structured state space models (SSMs) have emerged as efficient sequence mixers with long-range modeling and linear scaling [15, 14, 48, 11, 42, 13, 6, 27], raising"},{"citing_arxiv_id":"2605.11093","ref_index":41,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"Enabling Performant and Flexible Model-Internal Observability for LLM Inference","primary_cat":"cs.LG","submitted_at":"2026-05-11T18:01:36+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"DMI-Lib delivers 0.4-6.8% overhead for offline batch LLM inference and ~6% for moderate online serving while exposing rich internal signals across backends, cutting latency overhead 2-15x versus prior observability baselines.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"monitoring capabilities. 2 2.1 Transformer Model and Inference Infrastructure Modern LLMs are built on the Transformer architecture. In a standard dense, decode-only Trans- former, the main components include an embedding layer, a stack of Transformer layers (each com- prising self-attention and MLP modules), and residual connections with normalization layers [41]. Recent models extend this dense design with architectural variants such as Mixture-of-Experts (MoE) layers [16, 35] and modified attention or sequence-processing blocks [ 5, 13], further increasing architectural diversity. Model architectures are commonly implemented in PyTorch [31], which represents neural network components as nn.Module objects."},{"citing_arxiv_id":"2605.10364","ref_index":36,"ref_count":3,"confidence":0.55,"is_internal_anchor":false,"paper_title":"DeepL\\'evy: Learning Heavy-Tailed Uncertainty in Highly Volatile Time Series","primary_cat":"cs.LG","submitted_at":"2026-05-11T11:08:40+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"DeepLévy learns mixtures of Lévy stable distributions for heavy-tailed time series forecasting by minimizing discrepancies between empirical and parametric characteristic functions, outperforming prior methods on tail risk metrics under extreme volatility.","context_count":1,"top_context_role":"method","top_context_polarity":"use_method","context_text":"architecture consists of an encoder-decoder structure that enables multi-horizon forecasting. Encoder.The encoder maps the raw historical observations to a context-aware latent representation c∈R d via a sequence encoderE θ: c=E θ(x1:T ),(6) where d is the hidden dimension, and θ⊂Θ denotes the encoder parameters. The encoder Eθ can be parameterized by architectures such as Transformers [36], or Long Short-Term Memory (LSTM) networks [11]. 4 Autoregressive Decoder.To capture dependencies across prediction horizons, we employ an autoregressive decoder that generates horizon-specific hidden states: h(h) =D ϕ(c,h (h−1),ˆyT+h−1 ), h= 1, . . . , H,(7) where h(0) =c , ˆyT is initialized as the last observation xT , and ˆyT+h−1 denotes the autoregressive"},{"citing_arxiv_id":"2605.10198","ref_index":59,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"Empty SPACE: Cross-Attention Sparsity for Concept Erasure in Diffusion Models","primary_cat":"cs.LG","submitted_at":"2026-05-11T08:46:29+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"SPACE induces sparsity in cross-attention parameters via closed-form iterative updates to erase target concepts more effectively than dense baselines in large diffusion models.","context_count":1,"top_context_role":"method","top_context_polarity":"use_method","context_text":"2 Cross-Attention Cross attention is a fundamental building block of many state-of-the-art T2I diffusion models, as it incorporates text conditioning into the image generation process. Specifically, a text encoder is fed with a text prompt and generates a text embedding which is integrated into the image generation process through a Query-Key-Value (QKV) structure [59]. Given a text embedding ci, the Keys and Values are obtained as ki =W kci and vi =W vci, respectively, and are responsible for projecting text embeddings. The cross-attention output is computed as O ∝softmax(q ikT i )vi,(1) where qi represents the visual features, and A ∝softmax(q ikT i ) is the attention map aligning relevant text and image regions."},{"citing_arxiv_id":"2605.10115","ref_index":15,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"Generating Symmetric Materials using Latent Flow Matching","primary_cat":"cs.LG","submitted_at":"2026-05-11T07:32:01+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"SymADiT generates stable symmetric materials by enforcing Wyckoff-position and space-group constraints inside a latent generative model built on the prior ADiT architecture.","context_count":1,"top_context_role":"method","top_context_polarity":"use_method","context_text":"The central idea behind our representation is to restrict the model to predicting only the degrees of freedom (DOF) not constrained by the space group. By encoding symmetry directly through Wyckoff positions, the symmetry properties are enforced at the representation level rather than learned implicitly by the network. This allows the use of a standard Transformer architecture [15] while reducing the number of tokens relative to ADiT's symmetry-agnostic representation. Due to the nature of ADiT (which encodes both materials and molecules), the generative modelling is performed in latent space. Following their design, we first train an autoencoder (AE) to encode crystals using our symmetry-aware representation, and subsequently perform generative modelling in"},{"citing_arxiv_id":"2605.09992","ref_index":9,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"Attention Drift: What Autoregressive Speculative Decoding Models Learn","primary_cat":"cs.LG","submitted_at":"2026-05-11T05:08:39+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Drafter models in speculative decoding suffer progressive attention drift caused by monotonically growing hidden-state magnitudes along the residual path; post-norm plus per-state RMSNorm reduces this drift and improves acceptance length up to 2x on perturbed templates and 1.18x on long-context data","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.09981","ref_index":23,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"Yeti: A compact protein structure tokenizer for reconstruction and multi-modal generation","primary_cat":"q-bio.BM","submitted_at":"2026-05-11T04:49:47+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Yeti is a compact tokenizer for protein structures that delivers strong codebook use, token diversity, and reconstruction while enabling from-scratch multimodal generation of plausible sequences and structures with 10x fewer parameters than ESM3.","context_count":1,"top_context_role":"method","top_context_polarity":"use_method","context_text":"culminating in alate commitmentphenomenon where global protein topology consolidates only in the final decoding steps. 3 Methods Tokenization.The high-level overview ofYeti's architecture is shown inFigure 1. Given a clean protein structure with mean-centered coordinates x, the encoder processes the input to generate latent embeddings Z=E(x)∈R L×D, where L denotes the protein length. The encoder consists of a stack of Transformer [23] blocks utilizing multi-head attention with rotary positional encoding [24]. The latent embeddings are further projected into a quantization space with dimensionality D= log 2 K, where K denotes the codebook size. Each embedding vector z∈R D is then passed through the Lookup-Free Quantizer (LFQ) [15] q. The LFQ latent space is defined as the Cartesian"},{"citing_arxiv_id":"2605.18791","ref_index":23,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"SpecX: A Large-Scale Benchmark for Multi-Modal Spectroscopy and Cross-Paradigm Evaluation","primary_cat":"eess.IV","submitted_at":"2026-05-11T04:12:58+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"SpecX is a new large-scale multi-modal spectroscopy benchmark with tiered datasets that supports unified evaluation across specialized models and MLLMs, showing specialized models excel at signal-level tasks while MLLMs are stronger in high-level reasoning but weaker in precise spectral grounding.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.09742","ref_index":16,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"TIDES: Implicit Time-Awareness in Selective State Space Models","primary_cat":"cs.LG","submitted_at":"2026-05-10T20:34:06+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"TIDES reconciles selective SSM expressivity with continuous-time physical discretization by moving input dependence onto the state matrix, enabling native irregular time series handling and achieving SOTA on UEA and Physiome-ODE benchmarks.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"Neural rough differential equations for long time series. InInternational Conference on Machine Learning, pages 7829-7838. PMLR, 2021. [15] Benjamin Walker, Andrew D McLeod, Tiexin Qin, Yichuan Cheng, Haoliang Li, and Terry Lyons. Log neu- ral controlled differential equations: The lie brackets make a difference.arXiv preprint arXiv:2402.18512, 2024. [16] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need.Advances in neural information processing systems, 30, 2017. [17] Edward De Brouwer, Jaak Simm, Adam Arany, and Yves Moreau. Gru-ode-bayes: Continuous modeling of sporadically-observed time series.Advances in neural information processing systems, 32, 2019."},{"citing_arxiv_id":"2605.09685","ref_index":23,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"Learning Unified Representations of Normalcy for Time Series Anomaly Detection","primary_cat":"cs.LG","submitted_at":"2026-05-10T18:12:22+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"U²AD learns unified normal data representations via score-based generative modeling and a novel time-dependent score network to outperform prior methods in accuracy and early anomaly detection for multivariate time series.","context_count":1,"top_context_role":"method","top_context_polarity":"use_method","context_text":"At each layer k, the model processes the input through two parallel branches to extract contextual characteristics before computing the final score and after the final layerK, the network outputs the final score estimate,s θ(x(t), t). Global Contextual Characteristics ( ψ:R N×N ):The first branch, global context pathway, (Figure 3) utilizes a standard multi-head self-attention mechanism, as seen in transformers [ 23]. The attention matrix at the k-th layer, ψk :R N×N , captures the long-range, global dependencies as it explicitly encodes the pairwise influence between every point in the sequence. We define the global characteristics as the attention weights themselves: ψk(Q,K,V) =Softmax( QKT √dmodel ) instead of merely being an intermediate step, we directly leverage, ψk as a map of"},{"citing_arxiv_id":"2605.09204","ref_index":29,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"LBI: Parallel Scan Backpropagation via Latent Bounded Interfaces","primary_cat":"cs.LG","submitted_at":"2026-05-09T22:46:46+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"LBI enables tractable parallel backpropagation by reducing inter-region adjoint computation to low-dimensional r x r Jacobians while preserving exact gradients under a bounded-interface model.","context_count":1,"top_context_role":"other","top_context_polarity":"unclear","context_text":"parameter gradients independently (Corollary 2.9). The three-phase structure exposes opportunities for overlapped execution, with a candidate streaming schedule given in Appendix C (Algorithm 2). 4 Experiments Experimental Setup.We evaluate bounded-interface backpropagation across four model backends: Mamba-2 [9], Mamba-3 SISO [ 20] (MIMO is excluded, see Appendix D.2), Transformer [ 29], and Hybrid (3 × Mamba-3 SISO + 1 × Transformer). All models are trained on 20.48M tokens 6 Algorithm 1Bounded-Interface Backpropagation Require: Region transition maps {Rk}K−1 k=0 , forward caches {Ck}K−1 k=0 , local parameters {θk}K−1 k=0 , terminal interface adjoint¯mK Ensure:Interface adjoints{¯m k}K k=0 and parameter gradients{∇ θk L}K−1 k=0"}],"limit":50,"offset":0}