{"total":22,"items":[{"citing_arxiv_id":"2607.00958","ref_index":1,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"LeNEPA: No-Augmentation Next-Latent Prediction for Time-Series Representation Learning","primary_cat":"cs.LG","submitted_at":"2026-07-01T13:56:21+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"LeNEPA proposes a no-augmentation next-latent prediction recipe that maintains frozen-probe performance across ECG and synthetic diagnostic time-series datasets under fixed-recipe conditions where a tuned JEPA baseline degrades.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.29723","ref_index":15,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"ScaleAware-JEPA: Latent Representation for Discovery in Multiscale Physical Fields","primary_cat":"cs.LG","submitted_at":"2026-06-29T03:00:54+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"ScaleAware-JEPA combines Constrained Diffusion Decomposition with a scale-tied JEPA objective to learn label-free latent coordinates that recover coherent morphology in multiscale fields such as MHD turbulence and interstellar gas.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.20781","ref_index":3,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"World Action Models: A Survey","primary_cat":"cs.RO","submitted_at":"2026-06-18T17:05:19+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":3.0,"formal_verification":"none","one_line_summary":"A survey that clarifies boundaries and organizes World Action Models by generation requirements and predictive substrates, identifying a trend toward generating less of the future.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"(st+k+1,(h k, at+k, c))at rollout stepk,action-conditioned rollout, (st+1:t+H , c)followed byq ψ(at:t+H−1 |s t+1:t+H , c),post-prediction head. (29) Action-conditioned rollout treats the JEPA predictor as a world model and recovers actions through planning over imagined consequences. A post-prediction head predicts the full substrate window first and then decodes action from it with a separate expertqψ. I-JEPA [3] introduces the architecture on images, predicting the latent representations of large masked target blocks from a spatially distributed context block. A-JEPA [35] carries the same recipe to audio spectrograms under a curriculum time-frequency masking strategy, and MC-JEPA [6] couples the prediction objective with optical-flow learning so that one shared encoder serves both motion and content."},{"citing_arxiv_id":"2606.16009","ref_index":88,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Bridging the Usability Gap: Lessons from Interpreting Studies for Machine Interpreting Design","primary_cat":"cs.CL","submitted_at":"2026-06-14T20:41:59+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"Machine interpreting should shift from fidelity metrics to three design priorities—agency, grounding, and experience—drawn from interpreting studies to close the usability gap with human-mediated communication.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.09646","ref_index":3,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Do Video Foundation Models Understand Intuitive Physics? A Layerwise Probing Analysis","primary_cat":"cs.CV","submitted_at":"2026-06-08T15:40:32+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Video foundation models encode intuitive physics knowledge that is strongest in V-JEPA at intermediate-to-late layers and depends on pretraining type and probe design.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.01443","ref_index":2,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"UR-JEPA: Uniform Rectifiability as a Regularizer for Joint-Embedding Predictive Architectures","primary_cat":"cs.LG","submitted_at":"2026-05-31T20:26:54+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"UR-JEPA applies uniform rectifiability regularization via a smoothed Carleson square function to JEPA training, producing embeddings with 4-5 order PCA spectral drop at dimension 20-25 and lower seed variance than Gaussian regularization on Inet10, Galaxy10, and EuroSAT.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.30724","ref_index":267,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Research progress on quantum neural networks and quantum machine learning","primary_cat":"quant-ph","submitted_at":"2026-05-29T01:39:36+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":2.0,"formal_verification":"none","one_line_summary":"Survey summarizing performance metrics of fully connected QNNs, quantum CNNs, equivariant QNNs, quantum Hopfield networks, quantum Boltzmann machines, quantum reservoir computing, and composite networks for reinforcement, generative, and transfer learning.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.25012","ref_index":2,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Learning from Semantic Dictionaries: Discriminative Codebook Contrastive Learning for Unified Visual Representation and Generation","primary_cat":"cs.CV","submitted_at":"2026-05-24T11:32:30+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"LEASE achieves state-of-the-art unified performance on ImageNet-1K by combining masked token reconstruction and codebook contrast losses in a one-time precomputed discrete token space.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.23699","ref_index":2,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"CRONOS: Benchmarking Counterfactual Physical Consistency in Video Models","primary_cat":"cs.CV","submitted_at":"2026-05-22T14:51:22+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"CRONOS benchmark shows recent open-source video generators fail to preserve physical consistency under controlled changes to viewpoint, scene, object category, and appearance.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.21075","ref_index":2,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"SpectralEarth-FM: Bringing Hyperspectral Imagery into Multimodal Earth Observation Pretraining","primary_cat":"cs.CV","submitted_at":"2026-05-20T12:08:16+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"SpectralEarth-FM is a multisensor hierarchical transformer pretrained on a 40TB co-located HSI-MSI-SAR dataset using a JEPA-style objective and reports state-of-the-art results on hyperspectral and standard EO benchmarks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.19462","ref_index":10,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Quantifying the Pre-training Dividend: Generative versus Latent Self-Supervised Learning for Time Series Foundation Models","primary_cat":"cs.LG","submitted_at":"2026-05-19T07:13:20+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Self-supervised pre-training delivers large gains up to 375% on time series anomaly detection and classification but only marginal benefits for forecasting, driven by a precision-invariance trade-off in the learned representations.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.18714","ref_index":1,"ref_count":2,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Semantic Generative Tuning for Unified Multimodal Models","primary_cat":"cs.CV","submitted_at":"2026-05-18T17:46:46+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Semantic Generative Tuning applies segmentation-based generative proxies during post-training to align and improve both understanding and generation in unified multimodal models.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.16477","ref_index":16,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Seeking the Unfamiliar but Memorable: Conceptual Creativity as Meta-Learning","primary_cat":"cs.LG","submitted_at":"2026-05-15T16:09:56+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Creativity is defined as meta-learning where a frozen diffusion creator optimizes candidates for rapid improvement by an adapting appraiser such as an autoencoder or CLIP adapter.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.15466","ref_index":8,"ref_count":2,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Entity-Centric World Models: Interaction-Aware Masking for Causal Video Prediction","primary_cat":"cs.CV","submitted_at":"2026-05-14T23:10:04+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"IA-JEPA applies motion-centric masking in JEPA to focus on entity interactions, reporting 14.26% causal reasoning accuracy on CLEVRER versus 3.22% for standard baselines plus higher latent entropy and R²=0.43 energy linearization.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.15394","ref_index":4,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Representation Without Reward: A JEPA Audit for LLM Fine-Tuning","primary_cat":"cs.LG","submitted_at":"2026-05-14T20:27:32+00:00","verdict":"CONDITIONAL","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"An empirical audit of 22 JEPA-style training auxiliaries on Llama-3.2-1B fine-tuning for regex generation finds no statistically significant task improvement after multiple-testing correction, even when auxiliaries visibly alter hidden-state geometry.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.08078","ref_index":2,"ref_count":2,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Normalizing Trajectory Models","primary_cat":"cs.CV","submitted_at":"2026-05-08T17:57:14+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"NTM models each generative reverse step as a conditional normalizing flow with a hybrid shallow-deep architecture, enabling exact-likelihood training and strong four-step sampling performance on text-to-image tasks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.07554","ref_index":25,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"ProteinJEPA: Latent prediction complements protein language models","primary_cat":"cs.LG","submitted_at":"2026-05-08T10:30:54+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Masked-position MLM plus JEPA latent prediction outperforms MLM-only pretraining on 10-11 of 16 downstream tasks for 35M-150M protein models while JEPA alone fails.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.01694","ref_index":2,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Latent State Design for World Models under Sufficiency Constraints","primary_cat":"cs.AI","submitted_at":"2026-05-03T03:19:42+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"World models succeed when their latent states are built to meet task-specific sufficiency constraints rather than preserving the maximum amount of information.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"occupy different compression targets, from observation-faithful reconstruction to task-relevant abstraction. Position Training objective Year range Representative methods Reconstruction- heavy Pixel reconstruction or generative modeling 2018-2024 World Models [22], SimPLe [34] Token compression Discrete-token prediction 2022-2025 IRIS [42], GAIA-1 [30], GAIA-2 [50] Representation prediction Embedding-space prediction 2023-2026 I-JEPA [2], V-JEPA 2 [3], V-JEPA 2.1 [44], LeWorldModel [40] Reward / value-shaped Reward and policy-relevant supervision 2019-2021 TPC [46], value-aligned latent planning [28] Value-equivalent Bellman-relevant statistics only 2020-2023 MuZero [52], EfficientZero [66], TD-MPC2 [26] Causal / counterfactual Intervention-sensitive structural variables 2026 Causal-JEPA [45], CausalV AE-WM [14]"},{"citing_arxiv_id":"2605.00412","ref_index":1,"ref_count":3,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Physically Native World Models: A Hamiltonian Perspective on Generative World Modeling","primary_cat":"cs.AI","submitted_at":"2026-05-01T05:09:32+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"The paper introduces Hamiltonian World Models by encoding observations into structured latent phase space and evolving states via Hamiltonian-inspired dynamics for physically meaningful rollouts in embodied AI.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.15451","ref_index":1,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Weak-to-Strong Knowledge Distillation Accelerates Visual Learning","primary_cat":"cs.CV","submitted_at":"2026-04-16T18:10:18+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Weak-to-strong knowledge distillation applied early and then turned off accelerates convergence to target performance in visual learning tasks by factors of 1.7-4.8x.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"we use a deliberately weaker frozen teacher and a surpass-based stopping rule to improve first@τspeedup at target thresholds. Vision Pretraining.Foundation-scale vision models increasingly rely on long- horizon pretraining that blends self-supervision, language-image alignment, and promptable segmentation. Recent work such as DINOv2 [33], EVA-02 [9], I- JEPA [1], and InternImage [48] scales masked or predictive objectives to large corporaandyieldshighlytransferablefeatures.Vision-languageefforts(SigLIP[54], PaLI-3 [4]) and unified decoders (Florence-2 [50]) broaden this to multi-task in- terfaces, while Segment Anything [24] shows promptable segmentation at scale. Within this landscape, we position our method as a plug-and-play training recipe"},{"citing_arxiv_id":"2604.07745","ref_index":4,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"The Cartesian Cut in Agentic AI","primary_cat":"cs.AI","submitted_at":"2026-04-09T03:03:06+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"LLM agents use a Cartesian split between learned prediction and engineered control, enabling modularity but creating sensitivity and bottlenecks unlike integrated biological systems.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"same agenda toward generalist vision-language-action models that directly map multimodal observations and instructions to closed-loop motor control across many tasks [68, 35, 27]. In parallel, JEPA-style objectives aim to learn hierarchical latent predictors that capture action-relevant structure without committing to full generative reconstruction, offering a complementary route to learned world models for planning [4, 5]. Despite rapid progress, these integrated stacks remain data- and engineering-intensive and have therefore not yet matched the deployment ubiquity of tool-mediated Cartesian agents [35]. The hypothesized upside is tighter intervention calibration. If arbitration (when to seek information, backtrack, stop, or hand off), memory updates, and uncertainty surrogates are"},{"citing_arxiv_id":"2602.06912","ref_index":2,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"PANC: Prior-Aware Normalized Cut via Anchor-Augmented Token Graphs","primary_cat":"cs.CV","submitted_at":"2026-02-06T18:07:20+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"PANC augments Normalized Cut with anchor-augmented token graphs using priors to steer spectral partitions, yielding mIoU gains of 2.3-8.7% over baselines on DUTS-TE, DUT-OMRON, and CrackForest.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null}],"limit":50,"offset":0}