{"total":29,"items":[{"citing_arxiv_id":"2606.00535","ref_index":12,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"DREAM-S: Speculative Decoding with Searchable Drafting and Target-Aware Refinement for Multimodal Generation","primary_cat":"cs.LG","submitted_at":"2026-05-30T05:05:24+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"DREAM-S combines neural architecture search, target-aware supernet training, and attention-entropy-guided distillation to accelerate speculative decoding in VLMs, reporting up to 3.85x speedup over standard methods.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.07604","ref_index":74,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Contribution Weights: A Geometrical Analysis of Self-Attention Transformers","primary_cat":"cs.LG","submitted_at":"2026-05-29T09:40:38+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Contribution Weights combine attention, value magnitude, and directional alignment to measure token influence more faithfully than attention alone, and show attention sinks actively suppress information via a convex sink-rate to output-norm relationship.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.27970","ref_index":14,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Geometry of Human Perceptual Domains Emerges Transiently in LLM Representations","primary_cat":"cs.AI","submitted_at":"2026-05-27T05:04:29+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"Perceptual geometry for color, pitch, emotion and taste emerges transiently in intermediate layers of transformer LLMs despite purely textual training.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.23033","ref_index":40,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Uncovering the Latent Potential of Deep Intermediate Representations","primary_cat":"cs.LG","submitted_at":"2026-05-21T20:58:42+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Introduces LOES, a constructive spectral method to select task-discriminative subspaces from intermediate layer embeddings, and GeoReg for enforcing simplicial class geometry during fine-tuning, with reported gains increasing with model depth across modalities.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.17084","ref_index":9,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Scale Determines Whether Language Models Organize Representation Geometry for Prediction","primary_cat":"cs.LG","submitted_at":"2026-05-16T17:01:17+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Representation geometry in language models aligns with the unembedding readout subspace in a scale-dependent manner, preserved throughout training in large models but progressively lost in late layers of small models despite continued loss improvement.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.14738","ref_index":1,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"TAPIOCA: Why Task- Aware Pruning Improves OOD model Capability","primary_cat":"cs.LG","submitted_at":"2026-05-14T12:01:05+00:00","verdict":null,"verdict_confidence":null,"novelty_score":null,"formal_verification":null,"one_line_summary":null,"context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.12765","ref_index":37,"ref_count":2,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Inference-Time Machine Unlearning via Gated Activation Redirection","primary_cat":"cs.LG","submitted_at":"2026-05-12T21:26:25+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"GUARD-IT performs machine unlearning in LLMs via input-dependent activation steering at inference time, matching or exceeding gradient-based baselines on TOFU and MUSE while preserving utility and working under quantization.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.12714","ref_index":60,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Layer-wise Representation Dynamics: An Empirical Investigation Across Embedders and Base LLMs","primary_cat":"cs.LG","submitted_at":"2026-05-12T20:22:45+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"LRD framework with Frenet, NRS, and GFMI metrics shows layer-wise structure in 31 models provides usable signal for model selection and pruning on MTEB tasks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.11856","ref_index":29,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"UniVLR: Unifying Text and Vision in Visual Latent Reasoning for Multimodal LLMs","primary_cat":"cs.CV","submitted_at":"2026-05-12T09:40:03+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"UniVLR unifies textual and visual reasoning in multimodal LLMs by compressing reasoning traces and auxiliary images into visual latent tokens for direct inference without interleaved text CoT.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"alignment and autoregressive stability, outperforming both head-less and over-parameterizedGLU designs. Moreover, aligning latent tokens to middle-layer hidden states yields better performance 8 than aligning them to final-layer hidden states, likely because middle layers preserve richer visual and spatial information. This finding aligns with previous research ([29][30][31][32])suggesting that the middle layers of MLLMs primarily encode visual information. 4 Related Work Thinking with Images.Recent studies have exploredthinking with images, where visual infor- mation is no longer treated as a passive input but as an active reasoning workspace. Representative methods equip MLLMs with visual operations such as cropping, zooming, grounding, or frame"},{"citing_arxiv_id":"2605.11808","ref_index":40,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Mitigating Action-Relation Hallucinations in LVLMs via Relation-aware Visual Enhancement","primary_cat":"cs.CV","submitted_at":"2026-05-12T09:03:19+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"A new attention-enhancement method using ARS scores and RVE reduces action-relation hallucinations in LVLMs while generalizing to spatial and object hallucinations.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.11739","ref_index":78,"ref_count":2,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Learning to Foresee: Unveiling the Unlocking Efficiency of On-Policy Distillation","primary_cat":"cs.CL","submitted_at":"2026-05-12T08:19:15+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"On-policy distillation gains efficiency from early foresight in module allocation and update directions, which the proposed EffOPD method exploits for 3x faster training with comparable performance.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.09430","ref_index":13,"ref_count":2,"confidence":0.9,"is_internal_anchor":true,"paper_title":"FlashAR: Efficient Post-Training Acceleration for Autoregressive Image Generation","primary_cat":"cs.CV","submitted_at":"2026-05-10T09:07:20+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"FlashAR accelerates autoregressive image generation up to 22.9x by post-training a pre-trained raster-scan model with a complementary vertical head and dynamic fusion for two-way next-token prediction.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.05668","ref_index":15,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Large Vision-Language Models Get Lost in Attention","primary_cat":"cs.AI","submitted_at":"2026-05-07T04:45:52+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"In LVLMs, attention can be replaced by random Gaussian weights with little or no performance loss, indicating that current models get lost in attention rather than efficiently using visual context.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.00226","ref_index":34,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Why Do LLMs Struggle in Strategic Play? Broken Links Between Observations, Beliefs, and Actions","primary_cat":"cs.CL","submitted_at":"2026-04-30T21:04:38+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"LLMs encode accurate but brittle internal beliefs about latent game states and convert them poorly into actions, creating systematic gaps that explain strategic failures.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"sampled payoff matrices, private cards, etc.). Internal probes are trained on hidden states extracted from the intermedi- ate layers of the LLM while it is processing the final token of the prompt prior to action generation. The choice of middle layers is motivated by prior work, which found that these layers contain most of the high-level semantic infor- mation [34, 35, 36]. We use the hidden states from the last token position because in the decoder-only Transformer architecture [37, 38], information relevant for next-token prediction (action selection) must be represented in this final state to causally influence the next-token probabilities. We split the data into disjoint subsets for the probe's training,"},{"citing_arxiv_id":"2604.27169","ref_index":14,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Semantic Structure of Feature Space in Large Language Models","primary_cat":"cs.CL","submitted_at":"2026-04-29T20:17:43+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"LLM hidden states encode semantic features whose geometric relations, including axis projections, cosine similarities, low-dimensional subspaces, and steering spillovers, closely mirror human psychological associations.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.18519","ref_index":29,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"LLM Safety From Within: Detecting Harmful Content with Internal Representations","primary_cat":"cs.AI","submitted_at":"2026-04-20T17:17:07+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"SIREN identifies safety neurons via linear probing on internal LLM layers and combines them with adaptive weighting to detect harm, outperforming prior guard models with 250x fewer parameters.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.16902","ref_index":36,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Beyond Text-Dominance: Understanding Modality Preference of Omni-modal Large Language Models","primary_cat":"cs.AI","submitted_at":"2026-04-18T08:25:52+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Omni-modal LLMs exhibit visual preference that emerges in mid-to-late layers, enabling hallucination detection without task-specific training.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.16879","ref_index":46,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Adaptive Forensic Feature Refinement via Intrinsic Importance Perception","primary_cat":"cs.CV","submitted_at":"2026-04-18T07:07:53+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"I2P adaptively selects the most discriminative layers from visual foundation models for synthetic image detection and constrains task updates to low-sensitivity parameter subspaces to improve specificity without harming generalization.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.14838","ref_index":3,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Intermediate Layers Encode Optimal Biological Representations in Single-Cell Foundation Models","primary_cat":"cs.AI","submitted_at":"2026-04-16T10:16:11+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Intermediate layers in single-cell foundation models encode optimal representations for biological tasks, outperforming final layers in a task- and context-dependent manner.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.10448","ref_index":4,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Instruction Data Selection via Answer Divergence","primary_cat":"cs.CL","submitted_at":"2026-04-12T04:11:12+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"ADG selects 10K instruction examples by scoring the geometric divergence of multiple high-temperature model outputs in embedding space, outperforming prior selectors on reasoning, knowledge, and coding benchmarks across two model backbones.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.09425","ref_index":12,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Do Vision Language Models Need to Process Image Tokens?","primary_cat":"cs.CV","submitted_at":"2026-04-10T15:38:00+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Visual representations in VLMs converge quickly to stable low-complexity forms while text continues evolving, with task-dependent needs for sustained image token access.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.06377","ref_index":58,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"The Master Key Hypothesis: Unlocking Cross-Model Capability Transfer via Linear Subspace Alignment","primary_cat":"cs.LG","submitted_at":"2026-04-07T19:02:10+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"The Master Key Hypothesis states that capabilities are low-dimensional directions transferable across models through linear subspace alignment, with UNLOCK demonstrating gains such as 12.1% accuracy improvement on MATH when transferring CoT from 14B to 7B models.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"source and target; and (3)Latent-space transfer[42]: This strategy involves intervening on internal representations to steer the Target model towards the desired output. Inthiswork,wefocusonthelatentspace,wherecapabilitiesareencodedasshiftsininternalactivations. Existing methods typically construct steering directions from labeled contrastive examples (positive vs. negative[58;42])usingasingleSourcemodelandapplythematinferencetimetosimilarprompts. Furthermore, these methods are largely focused on alignment and surface-level behavioral control (e.g., safety, toxicity, bias, and stylistic shaping [35; 53; 16; 11; 52]) rather than advanced capabilities such as reasoning. We address these limitations by proposingUnlock- atraining-freeandlabel-freeframework for"},{"citing_arxiv_id":"2603.12451","ref_index":7,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Overcoming the Modality Gap in Context-Aided Forecasting","primary_cat":"cs.LG","submitted_at":"2026-03-12T21:05:33+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"A semi-synthetic augmentation creates the CAF-7M dataset and demonstrates that improved context data enables multimodal models to outperform unimodal baselines in context-aided forecasting.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2603.07475","ref_index":13,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"A Comparative analysis of Layer-wise Representational Capacity in AR and Diffusion LLMs","primary_cat":"cs.CL","submitted_at":"2026-03-08T05:31:52+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Diffusion language models form more global representations with early-layer redundancy compared to autoregressive models, allowing layer skipping for up to 18.75% FLOP savings while maintaining over 90% performance.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2602.21750","ref_index":32,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"From Words to Amino Acids: Does the Curse of Depth Persist?","primary_cat":"cs.LG","submitted_at":"2026-02-25T10:06:12+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Protein language models exhibit consistent depth inefficiency where most task-relevant computation occurs in a subset of layers, mirroring patterns in large language models.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2601.21619","ref_index":23,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"On the Overscaling Curse of Parallel Thinking: System Efficacy Contradicts Sample Efficiency","primary_cat":"cs.LG","submitted_at":"2026-01-29T12:22:45+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Parallel thinking in LLMs suffers from overscaling where fixed global budgets waste samples; LanBo predicts per-sample budgets from latent states to raise utilization without hurting accuracy.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2601.03233","ref_index":26,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"LTX-2: Efficient Joint Audio-Visual Foundation Model","primary_cat":"cs.CV","submitted_at":"2026-01-06T18:24:41+00:00","verdict":"CONDITIONAL","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"LTX-2 generates high-quality synchronized audiovisual content from text prompts via an asymmetric 14B-video / 5B-audio dual-stream transformer with cross-attention and modality-aware guidance.","context_count":1,"top_context_role":"background","top_context_polarity":"support","context_text":"[25] Chitwan Saharia, William Chan, Saurabh Saxena, Lala Li, Jay Whang, Emily Denton, Seyed Kamyar Seyed Ghasemipour, Burcu Karagol Ayan, S. Sara Mahdavi, Rapha Gontijo Lopes, Tim Salimans, Jonathan Ho, David J Fleet, and Mohammad Norouzi. Photorealistic text-to-image diffusion models with deep language understanding, 2022. URL https://arxiv.org/abs/ 2205.11487. [26] Oscar Skean, Md Rifat Arefin, Dan Zhao, Niket Patel, Jalal Naghiyev, Yann LeCun, and Ravid Shwartz-Ziv. Layer by layer: Uncovering hidden representations in language models.arXiv preprint arXiv:2502.02013, 2025. [27] Gemma Team, Aishwarya Kamath, Johan Ferret, Shreya Pathak, Nino Vieillard, Ramona Merhej, Sarah Perrin, Tatiana Matejovicova, Alexandre Ramé, Morgane Rivière, et al."},{"citing_arxiv_id":"2511.06516","ref_index":53,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"You Had One Job: Per-Task Quantization Using LLMs' Hidden Representations","primary_cat":"cs.CL","submitted_at":"2025-11-09T19:58:24+00:00","verdict":"CONDITIONAL","verdict_confidence":"MODERATE","novelty_score":6.0,"formal_verification":"none","one_line_summary":"TAQ estimates per-layer importance from hidden representations and output sensitivity on task calibration data to allocate mixed precision in a training-free PTQ setting, outperforming task-agnostic baselines on accuracy-memory ratio across benchmarks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2507.05387","ref_index":10,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"The Generalization Ridge: Information Flow in Natural Language Generation","primary_cat":"cs.CL","submitted_at":"2025-07-07T18:18:51+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"InfoRidge reveals a non-monotonic pattern in which predictive mutual information between hidden states and outputs peaks in intermediate layers before declining in final layers.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null}],"limit":50,"offset":0}