Chronicle is the first model jointly pretrained from scratch on text and time series in a unified transformer that matches a comparable language model on NLU tasks and sets new bars for time series classification and multimodal forecasting.
Title resolution pending
12 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
representative citing papers
A transformer with prediction-correction and hierarchical super-token merging unifies simulation of six physical dynamics categories on Lagrangian particles and generalizes to unseen conditions.
The paper introduces compositional interpretability as a category-theoretic framework that casts mechanistic explanations as commuting syntactic-semantic mappings optimized under faithfulness and complexity constraints derived from minimum description length.
LC-MAPF uses multi-round local communication between neighboring agents in a pre-trained model to outperform prior learning-based MAPF solvers on diverse unseen scenarios while preserving scalability.
HSTU-based generative recommenders with 1.5 trillion parameters scale as a power law with compute up to GPT-3 scale, outperform baselines by up to 65.8% NDCG, run 5-15x faster than FlashAttention2 on long sequences, and improve online A/B metrics by 12.4%.
NightVision recovers LLM hidden dimension to 23% average relative error (9% on MoE) and depth/parameter count to 53% on models >3B parameters using common-set prompting, spectral analysis, and TTFT under single-logit black-box access.
Bilinear autoencoders decompose neural activations into low-rank quadratic forms to discover interpretable multi-dimensional manifolds, improving reconstruction in language models and challenging linear representation assumptions.
InfoLaw models pretraining as information accumulation where quality sets information density and repetition causes scale-dependent diminishing returns, predicting loss with low error on unseen mixtures and larger scales up to 7B models and 425B tokens.
Falcon-180B is a 180B-parameter open decoder-only model trained on 3.5 trillion tokens that approaches PaLM-2-Large performance at lower cost and is released with dataset extracts.
ST-MoE introduces stability techniques for sparse expert models, allowing a 269B-parameter model to achieve state-of-the-art transfer learning results across reasoning, summarization, and QA tasks at the compute cost of a 32B dense model.
A unified training framework for mesh-based ML surrogates in CFD improves accuracy and long-horizon stability by enforcing spatial derivative consistency via multi-node prediction, using temporal cross-attention correction, and adding 3D rotary positional embeddings.
A survey classifying hallucination phenomena specific to large foundation models, establishing evaluation criteria, examining mitigation strategies, and discussing future directions.
citing papers explorer
-
Chronicle: A Multimodal Foundation Model for Joint Language and Time Series Understanding
Chronicle is the first model jointly pretrained from scratch on text and time series in a unified transformer that matches a comparable language model on NLU tasks and sets new bars for time series classification and multimodal forecasting.
-
WorldParticle: Unified World Simulation of Lagrangian Particle Dynamics via Transformer
A transformer with prediction-correction and hierarchical super-token merging unifies simulation of six physical dynamics categories on Lagrangian particles and generalizes to unseen conditions.
-
From Mechanistic to Compositional Interpretability
The paper introduces compositional interpretability as a category-theoretic framework that casts mechanistic explanations as commuting syntactic-semantic mappings optimized under faithfulness and complexity constraints derived from minimum description length.
-
Learning to Communicate Locally for Large-Scale Multi-Agent Pathfinding
LC-MAPF uses multi-round local communication between neighboring agents in a pre-trained model to outperform prior learning-based MAPF solvers on diverse unseen scenarios while preserving scalability.
-
Actions Speak Louder than Words: Trillion-Parameter Sequential Transducers for Generative Recommendations
HSTU-based generative recommenders with 1.5 trillion parameters scale as a power law with compute up to GPT-3 scale, outperform baselines by up to 65.8% NDCG, run 5-15x faster than FlashAttention2 on long sequences, and improve online A/B metrics by 12.4%.
-
Black-Box Inference of LLM Architectural Properties with Restrictive API Access
NightVision recovers LLM hidden dimension to 23% average relative error (9% on MoE) and depth/parameter count to 53% on models >3B parameters using common-set prompting, spectral analysis, and TTFT under single-logit black-box access.
-
Bilinear autoencoders find interpretable manifolds
Bilinear autoencoders decompose neural activations into low-rank quadratic forms to discover interpretable multi-dimensional manifolds, improving reconstruction in language models and challenging linear representation assumptions.
-
InfoLaw: Information Scaling Laws for Large Language Models with Quality-Weighted Mixture Data and Repetition
InfoLaw models pretraining as information accumulation where quality sets information density and repetition causes scale-dependent diminishing returns, predicting loss with low error on unseen mixtures and larger scales up to 7B models and 425B tokens.
-
ST-MoE: Designing Stable and Transferable Sparse Expert Models
ST-MoE introduces stability techniques for sparse expert models, allowing a 269B-parameter model to achieve state-of-the-art transfer learning results across reasoning, summarization, and QA tasks at the compute cost of a 32B dense model.
-
Mesh Based Simulations with Spatial and Temporal awareness
A unified training framework for mesh-based ML surrogates in CFD improves accuracy and long-horizon stability by enforcing spatial derivative consistency via multi-node prediction, using temporal cross-attention correction, and adding 3D rotary positional embeddings.