FLUID is a continuous-time transformer using Liquid Attention Networks to model attention as stable ODE solutions that interpolate between discrete SDPA and CT-RNNs, with an explicit sink gate and liquid hyper-connections for better information flow.
hub
Gomez, Lukasz Kaiser, and Illia Polosukhin
11 Pith papers cite this work. Polarity classification is still indexing.
hub tools
citation-role summary
citation-polarity summary
roles
background 1polarities
background 1representative citing papers
Averaging and temporally interpolating text latents in VLAs enables 83% success on novel task combinations in the libero-ood benchmark where SOTA models achieve under 15%.
ChronoVAE-HOPE proposes a VAE foundation model for time series classification that replaces attention with a HOPE Block dual-memory system and uses disentangled trend-seasonal latent representations, pre-trained on Monash and evaluated on UCR datasets.
Adam-SHANG is a convergent Adam variant for stochastic smooth convex optimization that uses a stable lagged-preconditioner update and a computable trace-ratio stepsize rule.
S-FLM is a hyperspherical latent flow language model that learns velocity fields on the unit sphere to generate token sequences via deterministic ODE integration without materializing one-hot vectors.
WhisperRT converts Whisper to a causal streaming ASR model via encoder causality, decoder synchronization on partial states, and fine-tuning, achieving better performance than non-fine-tuned streaming methods on sub-300ms chunks with lower complexity.
LLMs form functional subspaces in activation space where in-context learning tasks are solved by vector algebra operations such as addition and subtraction.
An adapted scaling law predicts GPU energy consumption for diffusion model inference with R² > 0.9 within architectures and strong cross-architecture generalization.
JAM aligns frozen vision and language models via joint autoencoders and multimodal Spread Loss, reliably inducing cross-modal alignment across layer depths, objectives, and model scales.
KairosHope proposes a HOPE block with dual-memory (Titans + CMS) and hybrid statistical-deep decision head for TSFM classification, pre-trained via MTSM and InfoNCE on Monash then adapted via LP-FT to UCR, claiming superior results on causal domains.
Teger is a backbone-agnostic structured uncertainty module that uses discrete Forman curvature for spatial graph rewiring inside a low-rank-plus-diagonal covariance head to mitigate over-squashing and improve residual error propagation in spatio-temporal forecasting.
citing papers explorer
-
FLUID: Continuous-Time Hyperconnected Sparse Transformer for Sink-Free Learning
FLUID is a continuous-time transformer using Liquid Attention Networks to model attention as stable ODE solutions that interpolate between discrete SDPA and CT-RNNs, with an explicit sink gate and liquid hyper-connections for better information flow.
-
VLAs are Confined yet Capable of Generalizing to Novel Instructions
Averaging and temporally interpolating text latents in VLAs enables 83% success on novel task combinations in the libero-ood benchmark where SOTA models achieve under 15%.
-
ChronoVAE-HOPE: Beyond Attention -- A Next-Generation VAE Foundation Model for Specialized Time Series Classification
ChronoVAE-HOPE proposes a VAE foundation model for time series classification that replaces attention with a HOPE Block dual-memory system and uses disentangled trend-seasonal latent representations, pre-trained on Monash and evaluated on UCR datasets.
-
Adam-SHANG: A Convergent Adam-Type Method for Stochastic Smooth Convex Optimization
Adam-SHANG is a convergent Adam variant for stochastic smooth convex optimization that uses a stable lagged-preconditioner update and a computable trace-ratio stepsize rule.
-
Language Modeling with Hyperspherical Flows
S-FLM is a hyperspherical latent flow language model that learns velocity fields on the unit sphere to generate token sequences via deterministic ODE integration without materializing one-hot vectors.
-
Functional Subspace, where language models can use vector algebra to solve problems
LLMs form functional subspaces in activation space where in-context learning tasks are solved by vector algebra operations such as addition and subtraction.
-
Energy Scaling Laws for Diffusion Models: Quantifying Compute in Image Generation
An adapted scaling law predicts GPU energy consumption for diffusion model inference with R² > 0.9 within architectures and strong cross-architecture generalization.
-
Escaping Plato's Cave: JAM for Aligning Independently Trained Vision and Language Models
JAM aligns frozen vision and language models via joint autoencoders and multimodal Spread Loss, reliably inducing cross-modal alignment across layer depths, objectives, and model scales.
-
KairosHope: A Next-Generation Time-Series Foundation Model for Specialized Classification via Dual-Memory Architecture
KairosHope proposes a HOPE block with dual-memory (Titans + CMS) and hybrid statistical-deep decision head for TSFM classification, pre-trained via MTSM and InfoNCE on Monash then adapted via LP-FT to UCR, claiming superior results on causal domains.
-
Improving Spatio-Temporal Residual Error Propagation by Mitigating Over-Squashing
Teger is a backbone-agnostic structured uncertainty module that uses discrete Forman curvature for spatial graph rewiring inside a low-rank-plus-diagonal covariance head to mitigate over-squashing and improve residual error propagation in spatio-temporal forecasting.