Mind-ParaWorld creates parallel worlds with atomic facts to evaluate search agents on future scenarios, showing they synthesize evidence well but struggle with collection, coverage, sufficiency judgment, and stopping decisions.
hub Canonical reference
URL https://openreview.net/ pdf/9a7e7a9787d14ac8302215f8e4ef959606b78a94.pdf
Canonical reference. 79% of citing Pith papers cite this work as background.
hub tools
citation-role summary
citation-polarity summary
representative citing papers
Symbolic rational-function networks recover an admissible PDE from noiseless complete measurements and select the regularization-minimizing parameterization within the architecture.
SQuTR aggregates 37k queries from six text retrieval datasets, synthesizes speech from 200 speakers, adds 17 noise categories at varying SNR, and shows that even large retrieval models degrade sharply under extreme acoustic noise.
DuFal combines global and local high-frequency Fourier neural operators with cross-attention fusion to recover fine anatomical structures in extremely sparse-view CBCT, outperforming prior methods on LUNA16 and ToothFairy data.
Diffusion models reconstruct high-resolution 3D cardiac ultrasound volumes from heavily undersampled elevation planes and outperform traditional interpolation and supervised deep learning baselines.
TextTeacher uses frozen text embeddings from captions as semantic anchors to guide vision model training, improving ImageNet accuracy by up to 2.7 p.p. and transfer performance by 1.0 p.p. on average.
A connectionless Bluetooth LE Channel Sounding system via PAwR eliminates connection overhead, cuts energy use by up to 88% under partner switching, and supports 16384 devices per train.
Transcoda achieves state-of-the-art zero-shot OMR with an 18.46% OMR-NED error rate on synthetic scores and 63.97% on historical Polish scans using a 59M model trained in 6 hours via synthetic data, kern normalization, and grammar decoding.
Lexical acoustic coding lets LLMs transmit audio waveforms as editable natural-language sentences that another LLM can parse and reconstruct into sound.
SAGE uses sparse autoencoders to boost vulnerability signals in LLMs, raising internal SNR 12.7x and delivering up to 318% MCC gains on vulnerability detection benchmarks.
Sonata is a small hybrid world model pre-trained to predict future IMU states that outperforms autoregressive baselines on clinical discrimination, fall-risk prediction, and cross-cohort transfer while fitting on-device wearables.
LLM-Codec augments audio codec training with multi-step token prediction and contrastive semantic alignment to improve both waveform reconstruction and autoregressive predictability for speech language models.
Anonymization placement in RAG—at the dataset or at the generated answer—creates observable differences in privacy protection versus response utility.
SyncBreaker jointly attacks image and audio streams with Multi-Interval Sampling and Cross-Attention Fooling to degrade speech-driven talking head generation more than single-modality baselines.
MONETA is the first multimodal benchmark for industry classification using text and geographic sources, with MLLM baselines at 62-74% accuracy and up to 22.8% gains from multi-turn context enrichment and explanations.
ACARec attends over artist catalogs to generate CF embeddings for new tracks, more than doubling recall and NDCG versus content-only baselines in music recommendation.
Activation steering at a semantic bottleneck in audio diffusion models achieves state-of-the-art control over musical attributes such as instruments, vocals, and genres.
zea is a Python toolbox that supplies a modular differentiable pipeline for ultrasound imaging and signal processing, built on Keras 3 to support TensorFlow, PyTorch, and JAX backends.
SonicMaster is a text-conditioned flow-matching generative model for unified music restoration and mastering, trained on a dataset of simulated degradations across equalization, dynamics, reverb, amplitude, and stereo.
A self-supervised Degradation Estimation Network estimates parameters for physics-informed noise distributions to generate realistic synthetic low-light data, showing gains on noise replication, enhancement, and detection tasks.
Articulatory configurations during vowel production create distinct electromagnetic transmission patterns through the vocal tract, confirmed by qualitative agreement between finite-element simulations and scattering-matrix measurements on two subjects.
Training-inference input alignment outweighs framework choice for longitudinal retinal image prediction, with deterministic regression matching complex models when acquisition variability dominates disease progression.
A systematic review of AI for depressive disorder detection that introduces a novel hierarchical taxonomy organized by clinical task, data modality, and model class.
The paper unifies perspectives on Long CoT in reasoning LLMs by introducing a taxonomy, detailing characteristics of deep reasoning and reflection, and discussing emergence phenomena and future directions.
citing papers explorer
-
Evaluating the Search Agent in a Parallel World
Mind-ParaWorld creates parallel worlds with atomic facts to evaluate search agents on future scenarios, showing they synthesize evidence well but struggle with collection, coverage, sufficiency judgment, and stopping decisions.
-
Symbolic recovery of PDEs from measurement data
Symbolic rational-function networks recover an admissible PDE from noiseless complete measurements and select the regularization-minimizing parameterization within the architecture.
-
SQuTR: A Robustness Benchmark for Spoken Query to Text Retrieval under Acoustic Noise
SQuTR aggregates 37k queries from six text retrieval datasets, synthesizes speech from 200 speakers, adds 17 noise categories at varying SNR, and shows that even large retrieval models degrade sharply under extreme acoustic noise.
-
DuFal: Dual-Frequency-Aware Learning for High-Fidelity Extremely Sparse-view CBCT Reconstruction
DuFal combines global and local high-frequency Fourier neural operators with cross-attention fusion to recover fine anatomical structures in extremely sparse-view CBCT, outperforming prior methods on LUNA16 and ToothFairy data.
-
High Volume Rate 3D Ultrasound Reconstruction with Diffusion Models
Diffusion models reconstruct high-resolution 3D cardiac ultrasound volumes from heavily undersampled elevation planes and outperform traditional interpolation and supervised deep learning baselines.
-
TextTeacher: What Can Language Teach About Images?
TextTeacher uses frozen text embeddings from captions as semantic anchors to guide vision model training, improving ImageNet accuracy by up to 2.7 p.p. and transfer performance by 1.0 p.p. on average.
-
Connectionless Bluetooth LE Channel Sounding via PAwR for Scalable and Energy-Efficient Ranging
A connectionless Bluetooth LE Channel Sounding system via PAwR eliminates connection overhead, cuts energy use by up to 88% under partner switching, and supports 16384 devices per train.
-
Transcoda: End-to-End Zero-Shot Optical Music Recognition via Data-Centric Synthetic Training
Transcoda achieves state-of-the-art zero-shot OMR with an 18.46% OMR-NED error rate on synthetic scores and 63.97% on historical Polish scans using a 59M model trained in 6 hours via synthetic data, kern normalization, and grammar decoding.
-
Communicating Sound Through Natural Language
Lexical acoustic coding lets LLMs transmit audio waveforms as editable natural-language sentences that another LLM can parse and reconstruct into sound.
-
SAGE: Signal-Amplified Guided Embeddings for LLM-based Vulnerability Detection
SAGE uses sparse autoencoders to boost vulnerability signals in LLMs, raising internal SNR 12.7x and delivering up to 318% MCC gains on vulnerability detection benchmarks.
-
Sonata: A Hybrid World Model for Inertial Kinematics under Clinical Data Scarcity
Sonata is a small hybrid world model pre-trained to predict future IMU states that outperforms autoregressive baselines on clinical discrimination, fall-risk prediction, and cross-cohort transfer while fitting on-device wearables.
-
LLM-Codec: Neural Audio Codec Meets Language Model Objectives
LLM-Codec augments audio codec training with multi-step token prediction and contrastive semantic alignment to improve both waveform reconstruction and autoregressive predictability for speech language models.
-
A Case Study on the Impact of Anonymization Along the RAG Pipeline
Anonymization placement in RAG—at the dataset or at the generated answer—creates observable differences in privacy protection versus response utility.
-
SyncBreaker:Stage-Aware Multimodal Adversarial Attacks on Audio-Driven Talking Head Generation
SyncBreaker jointly attacks image and audio streams with Multi-Interval Sampling and Cross-Attention Fooling to degrade speech-driven talking head generation more than single-modality baselines.
-
MONETA: Multimodal Industry Classification through Geographic Information with Multi Agent Systems
MONETA is the first multimodal benchmark for industry classification using text and geographic sources, with MLLM baselines at 62-74% accuracy and up to 22.8% gains from multi-turn context enrichment and explanations.
-
Leveraging Artist Catalogs for Cold-Start Music Recommendation
ACARec attends over artist catalogs to generate CF embeddings for new tracks, more than doubling recall and NDCG versus content-only baselines in music recommendation.
-
TADA! Tuning Audio Diffusion Models through Activation Steering
Activation steering at a semantic bottleneck in audio diffusion models achieves state-of-the-art control over musical attributes such as instruments, vocals, and genres.
-
zea: A Toolbox for Cognitive Ultrasound Imaging
zea is a Python toolbox that supplies a modular differentiable pipeline for ultrasound imaging and signal processing, built on Keras 3 to support TensorFlow, PyTorch, and JAX backends.
-
SonicMaster: Towards Controllable All-in-One Music Restoration and Mastering
SonicMaster is a text-conditioned flow-matching generative model for unified music restoration and mastering, trained on a dataset of simulated degradations across equalization, dynamics, reverb, amplitude, and stereo.
-
Towards a General-Purpose Zero-Shot Synthetic Low-Light Image and Video Pipeline
A self-supervised Degradation Estimation Network estimates parameters for physics-informed noise distributions to generate realistic synthetic low-light data, showing gains on noise replication, enhancement, and detection tasks.
-
Articulatory movements influence electromagnetic wave transmission through the vocal tract
Articulatory configurations during vowel production create distinct electromagnetic transmission patterns through the vocal tract, confirmed by qualitative agreement between finite-element simulations and scattering-matrix measurements on two subjects.
-
Training-inference input alignment outweighs framework choice in longitudinal retinal image prediction
Training-inference input alignment outweighs framework choice for longitudinal retinal image prediction, with deterministic regression matching complex models when acquisition variability dominates disease progression.
-
AI Models for Depressive Disorder Detection and Diagnosis: A Review
A systematic review of AI for depressive disorder detection that introduces a novel hierarchical taxonomy organized by clinical task, data modality, and model class.
-
Towards Reasoning Era: A Survey of Long Chain-of-Thought for Reasoning Large Language Models
The paper unifies perspectives on Long CoT in reasoning LLMs by introducing a taxonomy, detailing characteristics of deep reasoning and reflection, and discussing emergence phenomena and future directions.
-
MSDS: Deep Structural Similarity with Multiscale Representation
MSDS computes DeepSSIM at multiple pyramid scales and fuses the scores with learned weights, producing consistent improvements over single-scale DeepSSIM on IQA benchmarks with negligible extra cost.
-
SatBLIP: Context Understanding and Feature Identification from Satellite Imagery with Vision-Language Learning
SatBLIP fine-tunes a satellite-adapted BLIP model on GPT-4o-generated captions to predict county-level SVI from satellite tiles and uses SHAP to highlight key features like roof condition and vegetation.
-
LLM4Log: A Systematic Review of Large Language Model-based Log Analysis
Systematic review of 145 papers on LLM-based log analysis, providing a unified taxonomy, common design patterns, evaluation practices, and challenges for deployment under drift and limited labels.
-
The Prompt Engineering Report Distilled: Quick Start Guide for Life Sciences
The paper reduces a broad set of prompt engineering techniques to six core approaches and applies them to life sciences use cases while addressing common LLM pitfalls.
-
Looking Beyond the Obvious: A Survey on Abstract Concept Recognition for Video Understanding
A literature survey on abstract concept recognition in videos that catalogs prior tasks and datasets while advocating for foundation models and reuse of decades of community experience.
-
Four Decades of Digital Waveguides
Digital waveguide models enable efficient physically accurate sound synthesis and are now being optimized using classical, evolutionary, and neural methods.
-
A Review of Diffusion-based Simulation-Based Inference: Foundations and Applications in Non-Ideal Data Scenarios
A synthesis of diffusion-based simulation-based inference methods that address model misspecification, irregular observations, and missing data in scientific applications.
-
Secure Password Generator Based on Secure Pseudo-Random Number Generator
A MAC-based PRNG for passwords is implemented and shown to meet NIST SP 800-90B entropy and IID criteria.