Floating-point neural networks with automatic differentiation can represent arbitrary floating-point functions and their gradients under mild conditions.
hub Mixed citations
nature , volume=
Mixed citation behavior. Most common role is background (60%).
hub tools
citation-role summary
citation-polarity summary
representative citing papers
Coherent-state propagation enables quasi-polynomial classical simulation of bosonic circuits with logarithmically many Kerr gates at exponentially small trace-distance error, with polynomial runtime in the weak-nonlinearity regime.
Proposes pointwise Riemannian Dimension from feature eigenvalues to derive tighter, representation-aware generalization bounds for deep networks in the nonlinear regime.
A solvable hierarchical model with power-law feature strengths yields explicit power-law scaling of prediction error through sequential recovery of latent directions by a layer-wise spectral algorithm.
A diameter criterion tied to a potential function certifies convergence of difference inclusions, enabling discrete proofs for first-order optimization methods with diminishing steps.
A quantum prototype learning scheme encodes class representatives as generative matrix product states and performs classification and clustering via geometric measures in Hilbert space, outperforming classical prototypes on Fashion-MNIST and ECG data.
PnP-Corrector decouples physics simulation from error correction via a plug-and-play agent, cutting error by 29% in 300-day global ocean-atmosphere forecasts.
Neural networks possess a propagation field of trajectories and Jacobians whose quality can be measured and optimized independently of endpoint loss, yielding better unseen-path generalization and reduced forgetting in continual learning.
NEO induces compositional latent programs as world theories from observations and executes them to enable explanation-driven generalization.
Triplet constraints realizable in D-dimensional Euclidean space cannot be preserved above 50% accuracy by any embedding of dimension at most cD for constant c<1, with UGC-hardness preventing better polynomial-time solutions in any dimension.
QHyer replaces return-to-go with a state-conditioned Q-estimator and adds a gated hybrid attention-mamba backbone to achieve state-of-the-art performance in offline goal-conditioned RL on both Markovian and non-Markovian datasets.
Bangla Key2Text releases 2.6M keyword-text pairs and demonstrates that fine-tuned mT5 and BanglaT5 outperform zero-shot LLMs on keyword-conditioned Bangla text generation.
LeJEPA derives an optimal isotropic Gaussian target for embeddings and enforces it via sketched regularization to deliver scalable, heuristics-free self-supervised pretraining with 79% ImageNet linear accuracy on ViT-H/14.
PixArt-α matches commercial text-to-image quality with a diffusion transformer trained in 675 A100 GPU days through decomposed training stages, cross-attention text injection, and vision-language model dense captions.
UltraChat supplies 1.5 million high-quality multi-turn dialogues that, when used to fine-tune LLaMA, produce UltraLLaMA, which outperforms prior open-source chat models including Vicuna.
CAFE assesses the fit of observational CATE estimates by partitioning RCT data via propensity scores and comparing to experimental group averages, with theory and extensions for confounders.
Introduces a margin-adaptive confidence ranking method that learns an estimator from simulated diversity and derives margin-dependent generalization bounds for use in fixed-sequence testing of LLM-human agreement.
Active learning with randomly initialized models achieves comparable results to traditional candidate-model methods, with low-confidence sampling proving most effective.
A graph-spectral importance score based on layer-wise structural distortion between pre- and post-activation neuron graphs identifies removable neurons for iterative pruning without intermediate updates, followed by recovery fine-tuning.
A pre-activation regularizer seeds more affine regions near data in piecewise affine networks, increasing local region count and improving early training performance.
ZScribbleSeg maximizes scribble supervision with efficient annotation forms, spatial regularization, and EM-estimated class ratios to deliver competitive performance on six medical segmentation tasks without full labels.
TabGRAA applies group-relative advantage alignment in an iterative reward-guided post-training loop to improve tabular language model generators on fidelity, utility, and privacy trade-offs across five benchmarks.
TabEmb decouples LLM-based semantic column embeddings from graph-based structural modeling to produce joint representations that improve table annotation tasks.
MS-RCGR is a reversible multi-scale chaos game representation that enhances biological sequence classification when used alone or combined with protein language model embeddings.
citing papers explorer
-
Floating-Point Networks with Automatic Differentiation Can Represent Almost All Floating-Point Functions and Their Gradients
Floating-point neural networks with automatic differentiation can represent arbitrary floating-point functions and their gradients under mild conditions.
-
Coherent-State Propagation: A Computational Framework for Simulating Bosonic Quantum Systems
Coherent-state propagation enables quasi-polynomial classical simulation of bosonic circuits with logarithmically many Kerr gates at exponentially small trace-distance error, with polynomial runtime in the weak-nonlinearity regime.
-
Pointwise Generalization in Deep Neural Networks
Proposes pointwise Riemannian Dimension from feature eigenvalues to derive tighter, representation-aware generalization bounds for deep networks in the nonlinear regime.
-
Scaling Laws from Sequential Feature Recovery: A Solvable Hierarchical Model
A solvable hierarchical model with power-law feature strengths yields explicit power-law scaling of prediction error through sequential recovery of latent directions by a layer-wise spectral algorithm.
-
Convergence of difference inclusions via a diameter criterion
A diameter criterion tied to a potential function certifies convergence of difference inclusions, enabling discrete proofs for first-order optimization methods with diminishing steps.
-
Geometric Prototype Learning in Quantum Hilbert Space with Matrix Product States
A quantum prototype learning scheme encodes class representatives as generative matrix product states and performs classification and clustering via geometric measures in Hilbert space, outperforming classical prototypes on Fashion-MNIST and ECG data.
-
PnP-Corrector: A Universal Correction Framework for Coupled Spatiotemporal Forecasting
PnP-Corrector decouples physics simulation from error correction via a plug-and-play agent, cutting error by 29% in 300-day global ocean-atmosphere forecasts.
-
The Propagation Field: A Geometric Substrate Theory of Deep Learning
Neural networks possess a propagation field of trajectories and Jacobians whose quality can be measured and optimized independently of endpoint loss, yielding better unseen-path generalization and reduced forgetting in continual learning.
-
Learning to Theorize the World from Observation
NEO induces compositional latent programs as world theories from observations and executes them to enable explanation-driven generalization.
-
Provable Accuracy Collapse in Embedding-Based Representations under Dimensionality Mismatch
Triplet constraints realizable in D-dimensional Euclidean space cannot be preserved above 50% accuracy by any embedding of dimension at most cD for constant c<1, with UGC-hardness preventing better polynomial-time solutions in any dimension.
-
QHyer: Q-conditioned Hybrid Attention-mamba Transformer for Offline Goal-conditioned RL
QHyer replaces return-to-go with a state-conditioned Q-estimator and adds a gated hybrid attention-mamba backbone to achieve state-of-the-art performance in offline goal-conditioned RL on both Markovian and non-Markovian datasets.
-
Bangla Key2Text: Text Generation from Keywords for a Low Resource Language
Bangla Key2Text releases 2.6M keyword-text pairs and demonstrates that fine-tuned mT5 and BanglaT5 outperform zero-shot LLMs on keyword-conditioned Bangla text generation.
-
LeJEPA: Provable and Scalable Self-Supervised Learning Without the Heuristics
LeJEPA derives an optimal isotropic Gaussian target for embeddings and enforces it via sketched regularization to deliver scalable, heuristics-free self-supervised pretraining with 79% ImageNet linear accuracy on ViT-H/14.
-
PixArt-$\alpha$: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis
PixArt-α matches commercial text-to-image quality with a diffusion transformer trained in 675 A100 GPU days through decomposed training stages, cross-attention text injection, and vision-language model dense captions.
-
Enhancing Chat Language Models by Scaling High-quality Instructional Conversations
UltraChat supplies 1.5 million high-quality multi-turn dialogues that, when used to fine-tune LLaMA, produce UltraLLaMA, which outperforms prior open-source chat models including Vicuna.
-
Assessing Estimate of CATE from Observational Data via an RCT Study
CAFE assesses the fit of observational CATE estimates by partitioning RCT data via propensity scores and comparing to experimental group averages, with theory and extensions for confounders.
-
Margin-Adaptive Confidence Ranking for Reliable LLM Judgement
Introduces a margin-adaptive confidence ranking method that learns an estimator from simulated diversity and derives margin-dependent generalization bounds for use in fixed-sequence testing of LLM-human agreement.
-
Are Candidate Models Really Needed for Active Learning?
Active learning with randomly initialized models achieves comparable results to traditional candidate-model methods, with low-confidence sampling proving most effective.
-
Spectral structural distortion reveals redundant neurons in neural networks
A graph-spectral importance score based on layer-wise structural distortion between pre- and post-activation neuron graphs identifies removable neurons for iterative pruning without intermediate updates, followed by recovery fine-tuning.
-
Region Seeding via Pre-Activation Regularization: A Geometric View of Piecewise Affine Neural Networks
A pre-activation regularizer seeds more affine regions near data in piecewise affine networks, increasing local region count and improving early training performance.
-
ZScribbleSeg: A comprehensive segmentation framework with modeling of efficient annotation and maximization of scribble supervision
ZScribbleSeg maximizes scribble supervision with efficient annotation forms, spatial regularization, and EM-estimated class ratios to deliver competitive performance on six medical segmentation tasks without full labels.
-
Self-Improving Tabular Language Models via Iterative Reward-Guided Post-Training
TabGRAA applies group-relative advantage alignment in an iterative reward-guided post-training loop to improve tabular language model generators on fidelity, utility, and privacy trade-offs across five benchmarks.
-
TabEmb: Joint Semantic-Structure Embedding for Table Annotation
TabEmb decouples LLM-based semantic column embeddings from graph-based structural modeling to produce joint representations that improve table annotation tasks.
-
Multi-Scale Reversible Chaos Game Representation: A Unified Framework for Sequence Classification
MS-RCGR is a reversible multi-scale chaos game representation that enhances biological sequence classification when used alone or combined with protein language model embeddings.
-
Understanding the Prompt Sensitivity
LLMs disperse meaning-preserving prompts internally instead of clustering them, which produces an excessively high upper bound on output log-probability differences via Taylor expansion and Cauchy-Schwarz.
-
Gemma: Open Models Based on Gemini Research and Technology
Gemma introduces open 2B and 7B LLMs derived from Gemini technology that beat comparable open models on 11 of 18 text tasks and come with safety assessments.
-
Gemma 2: Improving Open Language Models at a Practical Size
Gemma 2 models achieve leading performance at their sizes by combining established Transformer modifications with knowledge distillation for the 2B and 9B variants.
-
Stochastic Optimization and Data Science
The paper motivates stochastic optimization problems from statistical perspectives and describes offline and online approaches to solve expectation minimization problems.
-
Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems
Offline RL promises to extract high-utility policies from static datasets but faces fundamental challenges that current methods only partially address.
- Neural Flow Operators can Approximate any Operator: Abstract Frameworks and Universal Approximations