MapFormers learn cognitive maps via input-dependent Lie-algebra positional encodings and achieve near-perfect OOD generalization on cognitive tasks where standard transformers fail.
hub
Decoupled weight decay regularization
13 Pith papers cite this work. Polarity classification is still indexing.
hub tools
citation-role summary
citation-polarity summary
roles
method 1polarities
use method 1representative citing papers
A foundation model trained only on disease simulations achieves top-ranked forecasting accuracy across 16 diseases and beats all CDC COVID-19 hub models on early unseen pandemic data.
Adapting large language models by training only a low-rank decomposition BA added to frozen weight matrices matches full fine-tuning while cutting trainable parameters by orders of magnitude and adding no inference latency.
Adam-SHANG is a convergent Adam variant for stochastic smooth convex optimization that uses a stable lagged-preconditioner update and a computable trace-ratio stepsize rule.
Diverse teacher-generated rationales improve MLLM visual persuasiveness prediction via supervised fine-tuning, while a new three-dimensional faithfulness framework shows that prediction accuracy alone does not ensure faithful reasoning and that decision sensitivity best matches human preferences.
NPN introduces a neural-network-based regularization that promotes reconstructions lying in a low-dimensional projection of the sensing operator's null-space, with claimed theoretical guarantees and improved empirical performance across compressive sensing, deblurring, super-resolution, CT, and MRI.
WhisperRT converts Whisper to a causal streaming ASR model via encoder causality, decoder synchronization on partial states, and fine-tuning, achieving better performance than non-fine-tuned streaming methods on sub-300ms chunks with lower complexity.
Prolonged RL training with KL control and reference policy resetting enables LLMs to develop novel reasoning strategies inaccessible to base models even under extensive sampling.
Nomic AI produced and open-sourced a reproducible 8192-context English text embedder that exceeds OpenAI Ada-002 and text-embedding-3-small performance on MTEB short-context and LoCo long-context benchmarks.
Contrastive learning trains unsupervised dense retrievers that beat BM25 on most BEIR datasets and support cross-lingual retrieval across scripts.
Polyformer generates sequence- and temperature-dependent conformational ensembles for proteins that agree with molecular dynamics simulations.
Flux Attention uses a context-aware Layer Router to dynamically assign full or sparse attention to each LLM layer, achieving up to 2.8x prefill and 2.0x decode speedups with competitive performance on long-context and reasoning tasks.
citing papers explorer
-
MapFormer: Self-Supervised Learning of Cognitive Maps with Input-Dependent Positional Embeddings
MapFormers learn cognitive maps via input-dependent Lie-algebra positional encodings and achieve near-perfect OOD generalization on cognitive tasks where standard transformers fail.
-
Mantis: A Foundation Model for Mechanistic Disease Forecasting
A foundation model trained only on disease simulations achieves top-ranked forecasting accuracy across 16 diseases and beats all CDC COVID-19 hub models on early unseen pandemic data.
-
LoRA: Low-Rank Adaptation of Large Language Models
Adapting large language models by training only a low-rank decomposition BA added to frozen weight matrices matches full fine-tuning while cutting trainable parameters by orders of magnitude and adding no inference latency.
-
Adam-SHANG: A Convergent Adam-Type Method for Stochastic Smooth Convex Optimization
Adam-SHANG is a convergent Adam variant for stochastic smooth convex optimization that uses a stable lagged-preconditioner update and a computable trace-ratio stepsize rule.
-
Can MLLMs Reason About Visual Persuasion? Evaluating the Efficacy and Faithfulness of Reasoning
Diverse teacher-generated rationales improve MLLM visual persuasiveness prediction via supervised fine-tuning, while a new three-dimensional faithfulness framework shows that prediction accuracy alone does not ensure faithful reasoning and that decision sensitivity best matches human preferences.
-
NPN: Non-Linear Projections of the Null-Space for Imaging Inverse Problems
NPN introduces a neural-network-based regularization that promotes reconstructions lying in a low-dimensional projection of the sensing operator's null-space, with claimed theoretical guarantees and improved empirical performance across compressive sensing, deblurring, super-resolution, CT, and MRI.
-
WhisperRT -- Turning Whisper into a Causal Streaming Model
WhisperRT converts Whisper to a causal streaming ASR model via encoder causality, decoder synchronization on partial states, and fine-tuning, achieving better performance than non-fine-tuned streaming methods on sub-300ms chunks with lower complexity.
-
ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models
Prolonged RL training with KL control and reference policy resetting enables LLMs to develop novel reasoning strategies inaccessible to base models even under extensive sampling.
-
Nomic Embed: Training a Reproducible Long Context Text Embedder
Nomic AI produced and open-sourced a reproducible 8192-context English text embedder that exceeds OpenAI Ada-002 and text-embedding-3-small performance on MTEB short-context and LoCo long-context benchmarks.
-
Unsupervised Dense Information Retrieval with Contrastive Learning
Contrastive learning trains unsupervised dense retrievers that beat BM25 on most BEIR datasets and support cross-lingual retrieval across scripts.
-
Polyformer: a generative framework for thermodynamic modeling of polymeric molecules
Polyformer generates sequence- and temperature-dependent conformational ensembles for proteins that agree with molecular dynamics simulations.
-
Flux Attention: Context-Aware Hybrid Attention for Efficient LLMs Inference
Flux Attention uses a context-aware Layer Router to dynamically assign full or sparse attention to each LLM layer, achieving up to 2.8x prefill and 2.0x decode speedups with competitive performance on long-context and reasoning tasks.
- Deep Learning of Solver-Aware Turbulence Closures from Nudged LES Dynamics