X-VC achieves zero-shot streaming voice conversion via one-step codec-space conversion with dual-conditioning acoustic converter and role-assignment training on generated paired data.
hub Tool reference
FiLM : Visual reasoning with a general conditioning layer
Tool reference. 83% of classified Pith citations use this work as a method, library, or software dependency, not as a substantive claim.
hub tools
citation-role summary
citation-polarity summary
representative citing papers
Generative sequence models for physical tasks exhibit physical misgeneralization where local prediction errors propagate through physical measurements to distort aggregate distributions over quantities like distance or energy; a data deviation kernel explains and predicts the shifts and supports a内核
SkyPart achieves state-of-the-art single-pass cross-view geo-localization on SUES-200, University-1652, and DenseUAV by using prototype-based part discovery, altitude-conditioned modulation, and Kendall-weighted loss, with widening gains under weather corruptions.
Independent quantum signal injection into graph DEQs yields higher test accuracy and fewer solver iterations than state-dependent or backbone-dependent injection and classical equilibrium models on NCI1, PROTEINS, and MUTAG benchmarks.
Conditional neural fields combined with LSTM networks predict aircraft ditching loads accurately across heterogeneous spatial discretizations using fewer parameters than convolutional autoencoders.
A cVAE plus flow-matching model generates realistic complex-valued brain MRI that preserves phase coherence above 0.997 and yields synthetic data that trains abnormality classifiers to 0.880 AUROC, beating the 0.842 real-data baseline on fastMRI.
NeuVolEx extracts robust spatial features from INR training via a structural encoder and multi-task scheme to enable accurate ROI classification with limited supervision and unsupervised viewpoint clustering in volume exploration.
PREFAB applies preference learning grounded in the peak-end rule to let users annotate only key affective change segments while interpolating the rest, reducing workload and improving confidence in a 25-participant study.
CodecSep performs prompt-driven universal sound separation directly in neural audio codec latents by combining a frozen DAC backbone with a lightweight FiLM-conditioned Transformer masker driven by CLAP embeddings, yielding efficiency gains over AudioSep.
Knowledge distillation from a hybrid CNN-Transformer teacher to a depth-wise separable CNN student, combined with realistic motion and environmental augmentation, produces a 15x smaller EDA denoiser that cuts underwater reconstruction error from 2.809 to 0.215 MAE and raises downstream CNS-OT AUROC.
Empirical study of DP transfer learning reveals that larger clipping bounds outperform under tight privacy and cumulative DP noise explains batch-size effects better than existing heuristics.
The survey frames VLA models as pipelines that generate progressively grounded action tokens and classifies those tokens into eight types to guide future development.
citing papers explorer
-
X-VC: Zero-shot Streaming Voice Conversion in Codec Space
X-VC achieves zero-shot streaming voice conversion via one-step codec-space conversion with dual-conditioning acoustic converter and role-assignment training on generated paired data.
-
Mechanisms of Misgeneralization in Physical Sequence Modeling
Generative sequence models for physical tasks exhibit physical misgeneralization where local prediction errors propagate through physical measurements to distort aggregate distributions over quantities like distance or energy; a data deviation kernel explains and predicts the shifts and supports a内核
-
Weather-Robust Cross-View Geo-Localization via Prototype-Based Semantic Part Discovery
SkyPart achieves state-of-the-art single-pass cross-view geo-localization on SUES-200, University-1652, and DenseUAV by using prototype-based part discovery, altitude-conditioned modulation, and Kendall-weighted loss, with widening gains under weather corruptions.
-
Quantum Injection Pathways for Implicit Graph Neural Networks
Independent quantum signal injection into graph DEQs yields higher test accuracy and fewer solver iterations than state-dependent or backbone-dependent injection and classical equilibrium models on NCI1, PROTEINS, and MUTAG benchmarks.
-
Conditional Neural Field based Reduced Order Model for Dynamic Ditching Load Prediction
Conditional neural fields combined with LSTM networks predict aircraft ditching loads accurately across heterogeneous spatial discretizations using fewer parameters than convolutional autoencoders.
-
Generative Modeling of Complex-Valued Brain MRI Data
A cVAE plus flow-matching model generates realistic complex-valued brain MRI that preserves phase coherence above 0.997 and yields synthetic data that trains abnormality classifiers to 0.880 AUROC, beating the 0.842 real-data baseline on fastMRI.
-
NeuVolEx: Implicit Neural Features for Volume Exploration
NeuVolEx extracts robust spatial features from INR training via a structural encoder and multi-task scheme to enable accurate ROI classification with limited supervision and unsupervised viewpoint clustering in volume exploration.
-
PREFAB: PREFerence-based Affective Modeling for Low-Budget Self-Annotation
PREFAB applies preference learning grounded in the peak-end rule to let users annotate only key affective change segments while interpolating the rest, reducing workload and improving confidence in a 25-participant study.
-
CodecSep: Prompt-Driven Universal Sound Separation on Neural Audio Codec Latents
CodecSep performs prompt-driven universal sound separation directly in neural audio codec latents by combining a frozen DAC backbone with a lightweight FiLM-conditioned Transformer masker driven by CLAP embeddings, yielding efficiency gains over AudioSep.
-
Memory-Efficient EDA Denoising via Knowledge Distillation for Wearable IoT Under Severe Motion Artifacts and Underwater Conditions
Knowledge distillation from a hybrid CNN-Transformer teacher to a depth-wise separable CNN student, combined with realistic motion and environmental augmentation, produces a 15x smaller EDA denoiser that cuts underwater reconstruction error from 2.809 to 0.215 MAE and raises downstream CNS-OT AUROC.
-
On Optimal Hyperparameters for Differentially Private Deep Transfer Learning
Empirical study of DP transfer learning reveals that larger clipping bounds outperform under tight privacy and cumulative DP noise explains batch-size effects better than existing heuristics.
-
A Survey on Vision-Language-Action Models: An Action Tokenization Perspective
The survey frames VLA models as pipelines that generate progressively grounded action tokens and classifies those tokens into eight types to guide future development.