Grokking arises from gradual amplification of a Fourier-based circuit in the weights followed by removal of memorizing components.
hub
Pytorch: An imperative style, high-performance deep learning library
16 Pith papers cite this work. Polarity classification is still indexing.
hub tools
citation-role summary
citation-polarity summary
roles
background 1polarities
background 1representative citing papers
LeapTS reformulates forecasting as adaptive multi-horizon scheduling via hierarchical control and NCDEs, delivering at least 7.4% better performance and 2.6-5.3x faster inference than Transformer baselines while adapting to non-stationary dynamics.
A new marginalized reparameterization estimator allows low-variance training of mixture policies in entropy-regularized actor-critic algorithms, matching or exceeding Gaussian policy performance in several continuous control benchmarks.
FieryGS integrates LLM-based material reasoning, volumetric combustion simulation, and a unified renderer with 3D Gaussian Splatting to generate physically plausible and user-controllable fire in in-the-wild scenes.
A new modeling framework represents pulsed polariton waveguide dynamics as a dissipative bosonic quantum circuit to predict antibunching and sub-Poissonian statistics in single and multimode integrated circuit configurations.
MetaEmbed trains fixed learnable Meta Tokens to produce granularity-organized multi-vector embeddings that support test-time scaling in multimodal retrieval.
AnyPos automates task-agnostic action collection and inverse-dynamics modeling with arm/end-effector decoupling plus a direction-aware decoder, delivering 51% higher test accuracy and 30-40% better success rates on bimanual tasks.
ShockCast is a two-phase ML method that predicts adaptive timestep sizes to model high-speed flows with shocks more efficiently than fixed-step approaches.
The method prompts LLMs to output both answers and references to the executed instructions, then filters out any answers not linked to the original input instructions, reducing attack success rates to zero in tested scenarios while preserving utility.
DAO pretrains Siamese diffusion-based models on stable/unstable crystal data to achieve 100% experimental match on Cr6Os2 and 2000x speedup over DFT on real superconductors.
An analytical post-training method restructures FFNs into MoE by partitioning neurons based on activation patterns and building a router from statistics, achieving 1.17x speedup with minimal resources.
ConjNorm reframes OOD detection score design as optimizing norm p in an exponential family density model via a Bregman divergence theorem, with a tractable Monte Carlo estimator, claiming SOTA gains on CIFAR-100 and ImageNet-1K.
SGLang is a new system that speeds up structured LLM programs by up to 6.4x using RadixAttention for KV cache reuse and compressed finite state machines for output decoding.
M²FedAQI is a lightweight multimodal federated framework that fuses visual and tabular data via feature modulation for improved AQI prediction and regression on heterogeneous edge devices.
CraftGraffiti applies LoRA-tuned diffusion transformers followed by identity-augmented self-attention and CLIP-guided pose extension to generate graffiti while preserving facial features.
A differentiable framework integrates function encoder-based neural ODEs with predictive control to enable zero-shot adaptation of explicit policies across families of nonlinear systems.
citing papers explorer
-
Progress measures for grokking via mechanistic interpretability
Grokking arises from gradual amplification of a Fourier-based circuit in the weights followed by removal of memorizing components.
-
LeapTS: Rethinking Time Series Forecasting as Adaptive Multi-Horizon Scheduling
LeapTS reformulates forecasting as adaptive multi-horizon scheduling via hierarchical control and NCDEs, delivering at least 7.4% better performance and 2.6-5.3x faster inference than Transformer baselines while adapting to non-stationary dynamics.
-
Revisiting Mixture Policies in Entropy-Regularized Actor-Critic
A new marginalized reparameterization estimator allows low-variance training of mixture policies in entropy-regularized actor-critic algorithms, matching or exceeding Gaussian policy performance in several continuous control benchmarks.
-
FieryGS: In-the-Wild Fire Synthesis with Physics-Integrated Gaussian Splatting
FieryGS integrates LLM-based material reasoning, volumetric combustion simulation, and a unified renderer with 3D Gaussian Splatting to generate physically plausible and user-controllable fire in in-the-wild scenes.
-
Modeling the Quantum Photon Statistics in Hybrid Light-Matter Integrated Circuits
A new modeling framework represents pulsed polariton waveguide dynamics as a dissipative bosonic quantum circuit to predict antibunching and sub-Poissonian statistics in single and multimode integrated circuit configurations.
-
MetaEmbed: Scaling Multimodal Retrieval at Test-Time with Flexible Late Interaction
MetaEmbed trains fixed learnable Meta Tokens to produce granularity-organized multi-vector embeddings that support test-time scaling in multimodal retrieval.
-
AnyPos: Automated Task-Agnostic Actions for Bimanual Manipulation
AnyPos automates task-agnostic action collection and inverse-dynamics modeling with arm/end-effector decoupling plus a direction-aware decoder, delivering 51% higher test accuracy and 30-40% better success rates on bimanual tasks.
-
A Two-Phase Deep Learning Framework for Adaptive Time-Stepping in High-Speed Flow Modeling
ShockCast is a two-phase ML method that predicts adaptive timestep sizes to model high-speed flows with shocks more efficiently than fixed-step approaches.
-
Robustness via Referencing: Defending against Prompt Injection Attacks by Referencing the Executed Instruction
The method prompts LLMs to output both answers and references to the executed instructions, then filters out any answers not linked to the original input instructions, reducing attack success rates to zero in tested scenarios while preserving utility.
-
Siamese Foundation Models for Crystal Structure Prediction
DAO pretrains Siamese diffusion-based models on stable/unstable crystal data to achieve 100% experimental match on Cr6Os2 and 2000x speedup over DFT on real superconductors.
-
Analytical FFN-to-MoE Restructuring via Activation Pattern Analysis
An analytical post-training method restructures FFNs into MoE by partitioning neurons based on activation patterns and building a router from statistics, achieving 1.17x speedup with minimal resources.
-
ConjNorm: Tractable Density Estimation for Out-of-Distribution Detection
ConjNorm reframes OOD detection score design as optimizing norm p in an exponential family density model via a Bregman divergence theorem, with a tractable Monte Carlo estimator, claiming SOTA gains on CIFAR-100 and ImageNet-1K.
-
SGLang: Efficient Execution of Structured Language Model Programs
SGLang is a new system that speeds up structured LLM programs by up to 6.4x using RadixAttention for KV cache reuse and compressed finite state machines for output decoding.
-
M$^2$FedAQI: Multimodal Federated Learning for Air Quality Prediction on Heterogeneous Edge Devices
M²FedAQI is a lightweight multimodal federated framework that fuses visual and tabular data via feature modulation for improved AQI prediction and regression on heterogeneous edge devices.
-
CraftGraffiti: Exploring Human Identity with Custom Graffiti Art via Facial-Preserving Diffusion Models
CraftGraffiti applies LoRA-tuned diffusion transformers followed by identity-augmented self-attention and CLIP-guided pose extension to generate graffiti while preserving facial features.
-
Zero-Shot Function Encoder-Based Differentiable Predictive Control
A differentiable framework integrates function encoder-based neural ODEs with predictive control to enable zero-shot adaptation of explicit policies across families of nonlinear systems.