super hub Mixed citations

Layer Normalization

Jamie Ryan Kiros, Jimmy Lei Ba · 2016 · stat.ML · arXiv 1607.06450

Mixed citation behavior. Most common role is background (57%).

315 Pith papers citing it

Background 57% of classified citations

open full Pith review browse 315 citing papers more from Jamie Ryan Kiros arXiv PDF

abstract

Training state-of-the-art, deep neural networks is computationally expensive. One way to reduce the training time is to normalize the activities of the neurons. A recently introduced technique called batch normalization uses the distribution of the summed input to a neuron over a mini-batch of training cases to compute a mean and variance which are then used to normalize the summed input to that neuron on each training case. This significantly reduces the training time in feed-forward neural networks. However, the effect of batch normalization is dependent on the mini-batch size and it is not obvious how to apply it to recurrent neural networks. In this paper, we transpose batch normalization into layer normalization by computing the mean and variance used for normalization from all of the summed inputs to the neurons in a layer on a single training case. Like batch normalization, we also give each neuron its own adaptive bias and gain which are applied after the normalization but before the non-linearity. Unlike batch normalization, layer normalization performs exactly the same computation at training and test times. It is also straightforward to apply to recurrent neural networks by computing the normalization statistics separately at each time step. Layer normalization is very effective at stabilizing the hidden state dynamics in recurrent networks. Empirically, we show that layer normalization can substantially reduce the training time compared with previously published techniques.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 43 method 23 baseline 2 other 2

citation-polarity summary

background 40 use method 23 unclear 5 baseline 2

claims ledger

abstract Training state-of-the-art, deep neural networks is computationally expensive. One way to reduce the training time is to normalize the activities of the neurons. A recently introduced technique called batch normalization uses the distribution of the summed input to a neuron over a mini-batch of training cases to compute a mean and variance which are then used to normalize the summed input to that neuron on each training case. This significantly reduces the training time in feed-forward neural networks. However, the effect of batch normalization is dependent on the mini-batch size and it is not

authors

and Geoffrey E Jamie Ryan Kiros Jimmy Lei Ba

co-cited works

representative citing papers

MinMax Recurrent Neural Cascades

cs.LG · 2026-05-07 · conditional · novelty 8.0 · 2 refs

MinMax RNCs are recurrent neural models using min-max recurrence that achieve full regular-language expressivity, logarithmic parallel evaluation, uniformly bounded states, and constant state gradients independent of time distance.

CanViT: Toward Active-Vision Foundation Models

cs.CV · 2026-03-23 · conditional · novelty 8.0

CanViT is the first task- and policy-agnostic AVFM pretrained via passive-to-active dense latent distillation on 13.2M scenes and 1B random glimpses, achieving 38.5% ADE20K mIoU in one glimpse and 84.5% ImageNet-1k top-1 after fine-tuning.

Mamba: Linear-Time Sequence Modeling with Selective State Spaces

cs.LG · 2023-12-01 · unverdicted · novelty 8.0

Mamba is a linear-time sequence model using input-dependent selective SSMs that achieves SOTA results across modalities and matches twice-larger Transformers on language modeling with 5x higher inference throughput.

What learning algorithm is in-context learning? Investigations with linear models

cs.LG · 2022-11-28 · accept · novelty 8.0

Transformers performing in-context learning implicitly implement gradient descent, ridge regression, and least-squares predictors for linear models, with behavior shifting based on model depth, width, and data noise.

Masked Autoencoders Are Scalable Vision Learners

cs.CV · 2021-11-11 · accept · novelty 8.0

Masked autoencoders with asymmetric encoder-decoder and 75% masking ratio enable scalable self-supervised pre-training of vision transformers, achieving 87.8% ImageNet-1K accuracy with ViT-Huge using only unlabeled data.

Decision Transformer: Reinforcement Learning via Sequence Modeling

cs.LG · 2021-06-02 · accept · novelty 8.0

Decision Transformer casts RL as autoregressive sequence modeling conditioned on desired returns, past states and actions, matching or exceeding offline RL baselines on Atari, Gym and Key-to-Door tasks.

The Pile: An 800GB Dataset of Diverse Text for Language Modeling

cs.CL · 2020-12-31 · conditional · novelty 8.0

The Pile is a newly constructed 825 GiB dataset from 22 diverse sources that enables language models to achieve better performance on academic, professional, and cross-domain tasks than models trained on Common Crawl variants.

Reformer: The Efficient Transformer

cs.LG · 2020-01-13 · accept · novelty 8.0

Reformer matches standard Transformer accuracy on long sequences while using far less memory and running faster via LSH attention and reversible residual layers.

Classical State Preparation for Variational Quantum Algorithms via Reinforcement Learning

quant-ph · 2026-05-22 · unverdicted · novelty 7.0

CRiSP uses neural-guided MCTS and curriculum learning to insert Clifford prefixes before parameterized rotations in VQAs, yielding mean 3.17x and max 45x gains in energy accuracy on 22-qubit QAOA benchmarks versus prior Clifford initializers.

Thermo-VL: Extending Vision-Language Models to Thermal Infrared Perception

cs.CV · 2026-05-21 · unverdicted · novelty 7.0

Thermo-VL augments a frozen Molmo-7B VLM with a trainable thermal encoder and prompt-conditioned dual-attention fusion to improve cross-spectrum visual reasoning.

Compositional Transduction with Latent Analogies for Offline Goal-Conditioned Reinforcement Learning

cs.LG · 2026-05-20 · unverdicted · novelty 7.0

Proposes latent analogies and analogy transduction to enable compositional generalization to unseen goal-context pairs in offline GCRL, outperforming trajectory-stitching baselines on manipulation tasks.

Riemannian Networks over Full-Rank Correlation Matrices

cs.LG · 2026-05-18 · unverdicted · novelty 7.0

Riemannian networks are introduced for the full-rank correlation matrix manifold by extending MLR, FC, and convolutional layers to five geometries with backpropagation methods for two, showing effectiveness over SPD and Grassmannian baselines.

Randomized Advantage Transformation (RAT): Computing Natural Policy Gradients via Direct Backpropagation

cs.LG · 2026-05-18 · unverdicted · novelty 7.0

RAT reformulates regularized natural policy gradients as vanilla gradients with a transformed advantage, computed efficiently via randomized block Kaczmarz iterations on on-policy data.

Nested-GPT for variable-multiplicity parton showers: A case study in the resummation of non-global logarithms

hep-ph · 2026-05-18 · unverdicted · novelty 7.0 · 2 refs

Nested-GPT is an autoregressive Transformer surrogate that generates variable-multiplicity parton showers while enforcing ordered Markovian branching and matches reference Monte Carlo results for leading-log non-global logarithm resummation in the large-Nc limit.

Domain Transfer Becomes Identifiable via a Single Alignment

cs.LG · 2026-05-18 · unverdicted · novelty 7.0

Domain transfer becomes identifiable from marginals plus one anchor under Jacobian sparsity, enabled by a randomized masked finite-difference regularizer.

Bug or Feature$^2$: Weight Drift, Activation Sparsity and Spikes

cs.LG · 2026-05-17 · accept · novelty 7.0 · 2 refs

The paper proves negative weight drift at initialization under MSE or cross-entropy with asymmetric activations, links it to up to 90% sparsity in GPT-nano, maps the sparsity-accuracy cliff across 79 configurations, and shows clipped ReLU² and GELU² improve validation loss.

FishBack: Pullback Fisher Geometry for Optimal Activation Steering in Transformers

cs.LG · 2026-05-17 · unverdicted · novelty 7.0

FishBack derives a closed-form minimum-distortion steering direction from the pullback Fisher metric of the softmax layer, outperforming Euclidean baselines on GPT-2 verb-morphology tasks with lower off-target KL divergence.

ChangeFlow -- Latent Rectified Flow for Change Detection in Remote Sensing

cs.CV · 2026-05-14 · unverdicted · novelty 7.0

ChangeFlow reformulates remote sensing change detection as latent rectified-flow mask synthesis, reaching 80.4% average F1 across four benchmarks with 1.3-point gain and sampling-based ensembling.

Training-Free Generative Sampling via Moment-Matched Score Smoothing

stat.ML · 2026-05-14 · unverdicted · novelty 7.0

MM-SOLD is a training-free particle sampler whose large-particle limit converges to a moment-matched Gibbs distribution obtained by exponentially tilting a score-smoothed target.

Trajectory-Agnostic Asteroid Detection in TESS with Deep Learning

astro-ph.EP · 2026-05-12 · unverdicted · novelty 7.0

A W-Net deep learning model detects asteroids in TESS data independently of trajectory by rotating training image cubes and using adaptive normalization for data scaling.

QAP-Router: Tackling Qubit Routing as Dynamic Quadratic Assignment with Reinforcement Learning

quant-ph · 2026-05-12 · unverdicted · novelty 7.0

QAP-Router models qubit routing as dynamic QAP and applies RL with a solution-aware Transformer to cut CNOT counts by 12-30% versus industry compilers on real circuit benchmarks.

Learning Less Is More: Premature Upper-Layer Attention Specialization Hurts Language Model Pretraining

cs.CL · 2026-05-11 · unverdicted · novelty 7.0

Temporarily reducing the learning rate on upper-layer query and key projections during early GPT pretraining prevents premature attention specialization and improves model performance.

OpenSGA: Efficient 3D Scene Graph Alignment in the Open World

cs.CV · 2026-05-11 · conditional · novelty 7.0

OpenSGA fuses vision-language, textual, and geometric features via a distance-gated attention encoder and minimum-cost-flow allocator to outperform prior methods on both frame-to-scan and subscan-to-subscan 3D scene graph alignment, backed by a new 700k-sample ScanNet-SG dataset.

Meta-Black-Box Optimization Can Do Search Guidance for Expensive Constrained Multi-Objective Optimization

cs.NE · 2026-05-11 · unverdicted · novelty 7.0

MetaSG-SAEA is a bi-level meta-BBO framework that uses a meta-policy for search guidance via the MM-CCI constraint abstraction and diffusion-based population initialization to outperform baselines on expensive constrained multi-objective optimization problems.

citing papers explorer

Showing 50 of 315 citing papers.

MinMax Recurrent Neural Cascades cs.LG · 2026-05-07 · conditional · none · ref 38 · 2 links · internal anchor
MinMax RNCs are recurrent neural models using min-max recurrence that achieve full regular-language expressivity, logarithmic parallel evaluation, uniformly bounded states, and constant state gradients independent of time distance.
CanViT: Toward Active-Vision Foundation Models cs.CV · 2026-03-23 · conditional · none · ref 56 · internal anchor
CanViT is the first task- and policy-agnostic AVFM pretrained via passive-to-active dense latent distillation on 13.2M scenes and 1B random glimpses, achieving 38.5% ADE20K mIoU in one glimpse and 84.5% ImageNet-1k top-1 after fine-tuning.
Mamba: Linear-Time Sequence Modeling with Selective State Spaces cs.LG · 2023-12-01 · unverdicted · none · ref 4 · internal anchor
Mamba is a linear-time sequence model using input-dependent selective SSMs that achieves SOTA results across modalities and matches twice-larger Transformers on language modeling with 5x higher inference throughput.
What learning algorithm is in-context learning? Investigations with linear models cs.LG · 2022-11-28 · accept · none · ref 3 · internal anchor
Transformers performing in-context learning implicitly implement gradient descent, ridge regression, and least-squares predictors for linear models, with behavior shifting based on model depth, width, and data noise.
Masked Autoencoders Are Scalable Vision Learners cs.CV · 2021-11-11 · accept · none · ref 1 · internal anchor
Masked autoencoders with asymmetric encoder-decoder and 75% masking ratio enable scalable self-supervised pre-training of vision transformers, achieving 87.8% ImageNet-1K accuracy with ViT-Huge using only unlabeled data.
Decision Transformer: Reinforcement Learning via Sequence Modeling cs.LG · 2021-06-02 · accept · none · ref 15 · internal anchor
Decision Transformer casts RL as autoregressive sequence modeling conditioned on desired returns, past states and actions, matching or exceeding offline RL baselines on Atari, Gym and Key-to-Door tasks.
The Pile: An 800GB Dataset of Diverse Text for Language Modeling cs.CL · 2020-12-31 · conditional · none · ref 167 · internal anchor
The Pile is a newly constructed 825 GiB dataset from 22 diverse sources that enables language models to achieve better performance on academic, professional, and cross-domain tasks than models trained on Common Crawl variants.
Reformer: The Efficient Transformer cs.LG · 2020-01-13 · accept · none · ref 4 · internal anchor
Reformer matches standard Transformer accuracy on long sequences while using far less memory and running faster via LSH attention and reversible residual layers.
Classical State Preparation for Variational Quantum Algorithms via Reinforcement Learning quant-ph · 2026-05-22 · unverdicted · none · ref 3 · internal anchor
CRiSP uses neural-guided MCTS and curriculum learning to insert Clifford prefixes before parameterized rotations in VQAs, yielding mean 3.17x and max 45x gains in energy accuracy on 22-qubit QAOA benchmarks versus prior Clifford initializers.
Thermo-VL: Extending Vision-Language Models to Thermal Infrared Perception cs.CV · 2026-05-21 · unverdicted · none · ref 49 · internal anchor
Thermo-VL augments a frozen Molmo-7B VLM with a trainable thermal encoder and prompt-conditioned dual-attention fusion to improve cross-spectrum visual reasoning.
Compositional Transduction with Latent Analogies for Offline Goal-Conditioned Reinforcement Learning cs.LG · 2026-05-20 · unverdicted · none · ref 1 · internal anchor
Proposes latent analogies and analogy transduction to enable compositional generalization to unseen goal-context pairs in offline GCRL, outperforming trajectory-stitching baselines on manipulation tasks.
Riemannian Networks over Full-Rank Correlation Matrices cs.LG · 2026-05-18 · unverdicted · none · ref 98 · internal anchor
Riemannian networks are introduced for the full-rank correlation matrix manifold by extending MLR, FC, and convolutional layers to five geometries with backpropagation methods for two, showing effectiveness over SPD and Grassmannian baselines.
Randomized Advantage Transformation (RAT): Computing Natural Policy Gradients via Direct Backpropagation cs.LG · 2026-05-18 · unverdicted · none · ref 86 · internal anchor
RAT reformulates regularized natural policy gradients as vanilla gradients with a transformed advantage, computed efficiently via randomized block Kaczmarz iterations on on-policy data.
Nested-GPT for variable-multiplicity parton showers: A case study in the resummation of non-global logarithms hep-ph · 2026-05-18 · unverdicted · none · ref 69 · 2 links · internal anchor
Nested-GPT is an autoregressive Transformer surrogate that generates variable-multiplicity parton showers while enforcing ordered Markovian branching and matches reference Monte Carlo results for leading-log non-global logarithm resummation in the large-Nc limit.
Domain Transfer Becomes Identifiable via a Single Alignment cs.LG · 2026-05-18 · unverdicted · none · ref 50 · internal anchor
Domain transfer becomes identifiable from marginals plus one anchor under Jacobian sparsity, enabled by a randomized masked finite-difference regularizer.
Bug or Feature$^2$: Weight Drift, Activation Sparsity and Spikes cs.LG · 2026-05-17 · accept · none · ref 15 · 2 links · internal anchor
The paper proves negative weight drift at initialization under MSE or cross-entropy with asymmetric activations, links it to up to 90% sparsity in GPT-nano, maps the sparsity-accuracy cliff across 79 configurations, and shows clipped ReLU² and GELU² improve validation loss.
FishBack: Pullback Fisher Geometry for Optimal Activation Steering in Transformers cs.LG · 2026-05-17 · unverdicted · none · ref 31 · internal anchor
FishBack derives a closed-form minimum-distortion steering direction from the pullback Fisher metric of the softmax layer, outperforming Euclidean baselines on GPT-2 verb-morphology tasks with lower off-target KL divergence.
ChangeFlow -- Latent Rectified Flow for Change Detection in Remote Sensing cs.CV · 2026-05-14 · unverdicted · none · ref 1 · internal anchor
ChangeFlow reformulates remote sensing change detection as latent rectified-flow mask synthesis, reaching 80.4% average F1 across four benchmarks with 1.3-point gain and sampling-based ensembling.
Training-Free Generative Sampling via Moment-Matched Score Smoothing stat.ML · 2026-05-14 · unverdicted · none · ref 56 · internal anchor
MM-SOLD is a training-free particle sampler whose large-particle limit converges to a moment-matched Gibbs distribution obtained by exponentially tilting a score-smoothed target.
Trajectory-Agnostic Asteroid Detection in TESS with Deep Learning astro-ph.EP · 2026-05-12 · unverdicted · none · ref 23 · internal anchor
A W-Net deep learning model detects asteroids in TESS data independently of trajectory by rotating training image cubes and using adaptive normalization for data scaling.
QAP-Router: Tackling Qubit Routing as Dynamic Quadratic Assignment with Reinforcement Learning quant-ph · 2026-05-12 · unverdicted · none · ref 3 · internal anchor
QAP-Router models qubit routing as dynamic QAP and applies RL with a solution-aware Transformer to cut CNOT counts by 12-30% versus industry compilers on real circuit benchmarks.
Learning Less Is More: Premature Upper-Layer Attention Specialization Hurts Language Model Pretraining cs.CL · 2026-05-11 · unverdicted · none · ref 2 · internal anchor
Temporarily reducing the learning rate on upper-layer query and key projections during early GPT pretraining prevents premature attention specialization and improves model performance.
OpenSGA: Efficient 3D Scene Graph Alignment in the Open World cs.CV · 2026-05-11 · conditional · none · ref 11 · internal anchor
OpenSGA fuses vision-language, textual, and geometric features via a distance-gated attention encoder and minimum-cost-flow allocator to outperform prior methods on both frame-to-scan and subscan-to-subscan 3D scene graph alignment, backed by a new 700k-sample ScanNet-SG dataset.
Meta-Black-Box Optimization Can Do Search Guidance for Expensive Constrained Multi-Objective Optimization cs.NE · 2026-05-11 · unverdicted · none · ref 53 · internal anchor
MetaSG-SAEA is a bi-level meta-BBO framework that uses a meta-policy for search guidance via the MM-CCI constraint abstraction and diffusion-based population initialization to outperform baselines on expensive constrained multi-objective optimization problems.
Why Do Aligned LLMs Remain Jailbreakable: Refusal-Escape Directions, Operator-Level Sources, and Safety-Utility Trade-off cs.CR · 2026-05-09 · unverdicted · none · ref 43 · internal anchor
Aligned LLMs exhibit Refusal-Escape Directions (RED) that enable refusal-to-answer transitions via input perturbations; these directions decompose exactly into operator-level sources, creating an inherent safety-utility trade-off when trying to eliminate them.
Delta-Adapter: Scalable Exemplar-Based Image Editing with Single-Pair Supervision cs.CV · 2026-05-08 · unverdicted · none · ref 4 · internal anchor
Delta-Adapter extracts a semantic delta from a single image pair via a pre-trained vision encoder and injects it through a Perceiver adapter to enable scalable single-pair supervised editing.
Neural network quantum states in the grand canonical ensemble quant-ph · 2026-05-08 · unverdicted · none · ref 42 · internal anchor
A new neural quantum state ansatz for bosons in the grand canonical ensemble achieves competitive variational energies in 1D and 2D systems and provides access to one-body reduced density matrices.
QuadNorm: Resolution-Robust Normalization for Neural Operators cs.LG · 2026-05-08 · unverdicted · none · ref 1 · internal anchor
QuadNorm uses quadrature-based moments instead of uniform averaging in normalization layers, achieving O(h²) consistency across resolutions and better cross-resolution transfer in neural operators.
GPROF-IR: An Improved Single-Channel Infrared Precipitation Retrieval for Merged Satellite Precipitation Products physics.ao-ph · 2026-05-08 · unverdicted · none · ref 142 · internal anchor
GPROF-IR is a CNN-based retrieval that uses temporal context in geostationary IR observations to produce precipitation estimates with lower error than prior IR methods and climatological consistency with PMW retrievals for integration into IMERG V08.
Solving Max-Cut to Global Optimality via Feasibility-Preserving Graph Neural Networks cs.LG · 2026-05-08 · unverdicted · none · ref 7 · internal anchor
A Max-Cut-specific graph neural network predicts primal- and dual-feasible SDP solutions in linearithmic time, cutting bounding costs in exact branch-and-bound by up to 10.6 times versus a commercial SDP solver while training without any solved SDP labels.
Every Feedforward Neural Network Definable in an o-Minimal Structure Has Finite Sample Complexity stat.ML · 2026-05-08 · unverdicted · none · ref 6 · internal anchor
Every fixed finite feedforward neural network definable in an o-minimal structure has finite sample complexity in the agnostic PAC setting.
How Long Does Infinite Width Last? Signal Propagation in Long-Range Linear Recurrences cs.LG · 2026-05-06 · unverdicted · none · ref 3 · internal anchor
In linear recurrent models, infinite-width signal propagation remains accurate only for depths t much smaller than sqrt(width n), with a critical regime at t ~ c sqrt(n) where finite-width effects emerge and dominate for larger t.
iGENE: A Differentiable Flux-Tube Gyrokinetic Code in TensorFlow physics.plasm-ph · 2026-05-04 · unverdicted · none · ref 72 · internal anchor
A fully differentiable TensorFlow gyrokinetic code allows approximate gradients of nonlinear turbulence quantities to be used for outer-loop tasks such as profile prediction despite stochasticity.
Graph Transformers and Stabilized Reinforcement Learning for Large-Scale Dynamic Routing Modulation and Spectrum Allocation in Elastic Optical Networks cs.NI · 2026-05-03 · unverdicted · none · ref 28 · 2 links · internal anchor
A graph transformer with RL stabilizations is the first to exceed benchmarks for dynamic RMSA, supporting up to 13% more traffic load on networks up to 143 nodes.
Evaluating LLMs on Large-Scale Graph Property Estimation via Random Walks cs.LG · 2026-05-02 · unverdicted · none · ref 197 · internal anchor
EstGraph benchmark evaluates LLMs on estimating properties of very large graphs from random-walk samples that fit in context limits.
Characterizing the Expressivity of Local Attention in Transformers cs.CL · 2026-05-01 · conditional · none · ref 1 · 2 links · internal anchor
Local attention strictly enlarges the class of regular languages recognizable by fixed-precision transformers by introducing a second temporal operator in LTL, with global and local attention being expressively complementary.
ResRL: Boosting LLM Reasoning via Negative Sample Projection Residual Reinforcement Learning cs.LG · 2026-05-01 · unverdicted · none · ref 19 · 2 links · internal anchor
ResRL decouples shared semantics between positive and negative responses in LLM reinforcement learning via SVD-based projection residuals, outperforming baselines including NSR by up to 9.4% on math reasoning benchmarks.
DEFault++: Automated Fault Detection, Categorization, and Diagnosis for Transformer Architectures cs.SE · 2026-04-30 · unverdicted · none · ref 1 · internal anchor
DEFault++ delivers automated hierarchical fault detection, categorization into 12 transformer-specific types, and root-cause diagnosis among 45 mechanisms on a new benchmark of 3,739 mutated instances, with AUROC >0.96 and Macro-F1 0.85, plus improved developer repair accuracy in a user study.
Learning Neural Operator Surrogates for the Black Hole Accretion Code astro-ph.HE · 2026-04-28 · unverdicted · none · ref 48 · internal anchor
Physics-informed Fourier neural operators recover plasmoid formation in sparse SRRMHD vortex data where data-only models fail, and transformer operators approximate AMR jet evolution, marking first reported uses in these relativistic MHD settings.
Pareto Frontier of Neural Quantum States: Scalable, Affordable, and Accurate Convolutional Backflow for Strongly Correlated Lattice Fermions cond-mat.str-el · 2026-04-28 · unverdicted · none · ref 43 · internal anchor
SCALE and ACE are new convolutional backflow architectures for Neural Quantum States that deliver O(N^3) scaling with high accuracy and over 40x speedup on Hubbard and t-J models up to 32x32 lattices.
Reference-Augmented Learning for Precise Tracking Policy of Tendon-Driven Continuum Robots cs.RO · 2026-04-28 · unverdicted · none · ref 20 · internal anchor
Reference-augmented learning with RNN surrogate and stochastic perturbations cuts average position error by 50.9% for 6-DOF tracking on a three-section TDCR compared to non-augmented baselines.
HAC: Parameter-Efficient Hyperbolic Adaptation of CLIP for Zero-Shot VQA cs.CV · 2026-04-26 · unverdicted · none · ref 2 · internal anchor
HAC provides a parameter-efficient way to move CLIP into hyperbolic geometry, yielding consistent gains on zero-shot VQA benchmarks without any VQA training data overlap.
A satellite foundation model for improved wealth monitoring cs.CY · 2026-04-25 · unverdicted · none · ref 51 · internal anchor
Tempov is a self-supervised satellite foundation model that predicts wealth levels and decadal changes at high resolution across Africa from Landsat imagery, outperforming baselines even with limited labels and generalizing temporally.
Latent Space Probing for Adult Content Detection in Video Generative Models cs.CV · 2026-04-25 · unverdicted · none · ref 50 · internal anchor
Latent space probing on CogVideoX achieves 97.29% F1 for adult content detection on a new 11k-clip dataset with 4-6ms overhead.
To See the Unseen: on the Generalization Ability of Transformers in Symbolic Reasoning cs.AI · 2026-04-23 · conditional · none · ref 3 · internal anchor
Unembedding collapse in transformers prevents distinguishing unseen tokens in symbolic reasoning, but targeted interventions restore generalization.
Linear Image Generation by Synthesizing Exposure Brackets cs.CV · 2026-04-22 · unverdicted · none · ref 1 · internal anchor
The paper introduces a DiT-based flow-matching model that generates linear images by synthesizing text-conditioned exposure brackets to preserve full dynamic range.
Understanding and Enforcing Weight Disentanglement in Task Arithmetic cs.AI · 2026-04-18 · unverdicted · none · ref 3 · internal anchor
Task-Feature Specialization explains weight disentanglement in task arithmetic and leads to orthogonality, which OrthoReg enforces to enhance performance of model composition methods.
Machine learning isotope shifts in molecular energy levels astro-ph.EP · 2026-04-17 · unverdicted · none · ref 42 · internal anchor
Neural network corrects residual errors in isotopologue energy extrapolations for CO2 (MAE reduction in >87% of levels vs Marvel) and transfers patterns to improve CO predictions in >93% of samples.
DEMUX: Boundary-Aware Multi-Scale Traffic Demixing for Multi-Tab Website Fingerprinting cs.CR · 2026-04-17 · unverdicted · none · ref 34 · internal anchor
DEMUX achieves state-of-the-art multi-tab website fingerprinting accuracy by preserving boundary signals, modeling at multiple scales, and associating dispersed traffic fragments with a new three-component architecture.
Data-driven oscillator model for multi-frequency turbulent flows physics.flu-dyn · 2026-04-13 · unverdicted · none · ref 3 · internal anchor
A data-driven framework extracts oscillators from multi-frequency turbulent flow data via autoencoders and models their dynamics with neural networks to enable long-term forecasting, demonstrated on supersonic cavity flow.

Layer Normalization

hub tools

citation-role summary

citation-polarity summary

claims ledger

authors

co-cited works

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer