Recognition: unknown
Exploring the Potential of Probabilistic Transformer for Time Series Modeling: A Report on the ST-PT Framework
Pith reviewed 2026-05-07 10:52 UTC · model grok-4.3
The pith
The Probabilistic Transformer becomes a programmable factor graph for time series by equating self-attention to mean-field variational inference on a conditional random field.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
ST-PT serves as a shared cornerstone by lifting the PT equivalence to handle time series, allowing the three properties of programmable topology for prior injection, factor matrix programming for conditional generation, and MFVI iterations for principled latent AR transitions with CRF distillation to counter error buildup.
What carries the argument
The equivalence between the Transformer's self-attention plus feed-forward block and mean-field variational inference on a conditional random field, extended to ST-PT for time series with added channel and temporal semantics.
If this is right
- Injecting symbolic time-series priors via direct graph modifications improves modeling when data is scarce or noisy.
- Conditioning the CRF factor matrices externally on a per-sample basis enables structural rather than feature-based conditional generation.
- Converting latent-space autoregressive transitions to Bayesian posterior updates and distilling from a CRF teacher reduces cumulative errors in multi-step forecasting.
Where Pith is reading between the lines
- Similar adaptations could apply the programmable factor graph approach to other domains like video prediction or natural language generation.
- This framework might support more interpretable models by allowing inspection and editing of the explicit potentials and topology.
- Joint optimization of graph structure alongside the potentials could emerge as a new direction for architecture search in sequence models.
Load-bearing premise
The mathematical equivalence between the modified ST-PT attention blocks and mean-field variational inference on the CRF continues to hold, and the performance improvements arise from exploiting the programmable properties.
What would settle it
Demonstrating that the self-attention computations in ST-PT deviate from the corresponding mean-field updates, or finding that structural graph changes do not yield gains over baseline Transformers in data-scarce time series scenarios.
Figures
read the original abstract
The Probabilistic Transformer (PT) establishes that the Transformer's self-attention plus its feed-forward block is mathematically equivalent to Mean-Field Variational Inference (MFVI) on a Conditional Random Field (CRF). Under this equivalence the Transformer ceases to be a black-box neural network and becomes a programmable factor graph: graph topology, factor potentials, and the message-passing schedule are all explicit and inspectable primitives that can be engineered. PT was originally developed for natural language and in this report we investigate its potential for time series. We first lift PT into the Spatial-Temporal Probabilistic Transformer (ST-PT) to repair PT's missing channel axis and weak per-step semantics, and adopt ST-PT as a shared cornerstone backbone. We then identify three distinct properties that PT/ST-PT offers as a factor-graph model and derive three Research Questions, one per property, that probe how each property can be exploited in time series: RQ1. The graph topology and potentials are direct programmable primitives. Can this be used to inject symbolic time-series priors into ST-PT through structural graph modifications, especially under data scarcity and noise? RQ2. The CRF's factor matrices are the operator's potentials. Can an external condition program these factor matrices on a per-sample basis, so that conditional generation becomes structural rather than feature-level modulation of a fixed one? RQ3. Each MFVI iteration is a Bayesian posterior update on the factor graph. Can this turn the latent transition of latent-space AutoRegressive (AR) forecasting from an opaque MLP into a principled posterior update, and can a CRF teacher distill its latents into the AR student to counter cumulative error? We give one empirical study per question. Together, these three studies position ST-PT as a programmable framework for time-series modeling.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that the Probabilistic Transformer (PT) is mathematically equivalent to mean-field variational inference on a conditional random field, turning the architecture into an explicit programmable factor graph. It lifts PT to the Spatial-Temporal Probabilistic Transformer (ST-PT) by adding a channel axis and repairing per-step semantics, then uses this backbone to pose and empirically investigate three research questions: (RQ1) injecting symbolic time-series priors via structural graph modifications under data scarcity; (RQ2) per-sample conditional programming of factor matrices for structural rather than feature-level conditioning; and (RQ3) interpreting latent autoregressive transitions as principled Bayesian posterior updates with CRF-teacher distillation to mitigate cumulative error.
Significance. If the PT-to-MFVI-CRF equivalence is shown to survive the ST-PT modifications and the three studies isolate gains attributable to explicit graph topology, programmable potentials, and posterior-update semantics, the work would supply a concrete bridge between transformer architectures and probabilistic graphical models for time series, enabling more interpretable incorporation of domain priors and conditional generation.
major comments (2)
- [§3] §3: The lifting of PT to ST-PT is described as adding a channel axis and repairing per-step semantics, yet the manuscript supplies no re-derivation demonstrating that the modified self-attention and feed-forward blocks continue to implement the identical mean-field variational updates on the underlying CRF factor graph. Because the three RQs and the interpretation of all empirical results rest on inheriting the programmable-factor-graph properties, this omission is load-bearing.
- [Empirical studies] Empirical studies (one per RQ): the manuscript presents the studies as demonstrating exploitation of the factor-graph properties, but provides insufficient detail on baselines, metrics, ablation controls, and quantitative effect sizes that would isolate the contribution of explicit graph topology or posterior-update semantics from ordinary modeling improvements. Without such isolation the studies cannot be read as evidence for the claimed advantages of the equivalence.
minor comments (1)
- [Abstract] The abstract would be strengthened by a single sentence summarizing the key quantitative outcomes of the three studies (e.g., relative error reductions or statistical significance) so that readers can immediately gauge the practical magnitude of the reported gains.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback on our manuscript. The comments highlight important areas for strengthening the technical foundations and empirical evidence. We address each major comment below and will revise the manuscript accordingly.
read point-by-point responses
-
Referee: [§3] §3: The lifting of PT to ST-PT is described as adding a channel axis and repairing per-step semantics, yet the manuscript supplies no re-derivation demonstrating that the modified self-attention and feed-forward blocks continue to implement the identical mean-field variational updates on the underlying CRF factor graph. Because the three RQs and the interpretation of all empirical results rest on inheriting the programmable-factor-graph properties, this omission is load-bearing.
Authors: We agree that the manuscript would be strengthened by an explicit re-derivation confirming that the ST-PT modifications preserve the MFVI-CRF equivalence. The channel-axis addition and per-step semantic repairs are structural extensions that maintain the same variational update rules in self-attention and feed-forward blocks. In the revised version, we will add a dedicated subsection to §3 with the full re-derivation, explicitly showing that the modified blocks implement identical mean-field updates on the extended CRF factor graph. This will make the inheritance of programmable factor-graph properties transparent and directly support the three RQs. revision: yes
-
Referee: [Empirical studies] Empirical studies (one per RQ): the manuscript presents the studies as demonstrating exploitation of the factor-graph properties, but provides insufficient detail on baselines, metrics, ablation controls, and quantitative effect sizes that would isolate the contribution of explicit graph topology or posterior-update semantics from ordinary modeling improvements. Without such isolation the studies cannot be read as evidence for the claimed advantages of the equivalence.
Authors: We concur that the empirical sections require expanded controls to isolate the contributions of the factor-graph properties. In the revision, we will augment each RQ study with: (i) additional baselines including vanilla Transformers, standard probabilistic time-series models, and ablated variants without explicit graph structure; (ii) precise metric definitions and quantitative effect sizes; (iii) targeted ablations disabling graph topology modifications, per-sample factor programming, or CRF-teacher distillation; and (iv) statistical significance testing. These enhancements will more clearly attribute observed gains to the explicit topology, programmable potentials, and posterior-update semantics. revision: yes
Circularity Check
No significant circularity in derivation chain
full rationale
The manuscript cites the PT-to-MFVI-CRF equivalence as an established result from prior work and describes lifting PT to ST-PT via explicit modifications (channel axis and per-step semantics repairs) before deriving three RQs from the assumed factor-graph properties. No equation or claim in the abstract or described structure reduces a first-principles result or prediction to its own inputs by construction, nor does any load-bearing step rely on a self-citation chain whose authors overlap with the present paper. The empirical studies are presented as probes of the inherited properties rather than re-derivations or fits that rename inputs as outputs. The chain therefore remains self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Transformer's self-attention plus feed-forward block equals MFVI on a CRF
invented entities (1)
-
ST-PT (Spatial-Temporal Probabilistic Transformer)
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Time-series forecasting with deep learning: a survey.Philosophical transactions of the royal society a: mathematical, physical and engineering sciences, 379(2194), 2021
Bryan Lim and Stefan Zohren. Time-series forecasting with deep learning: a survey.Philosophical transactions of the royal society a: mathematical, physical and engineering sciences, 379(2194), 2021
2021
-
[2]
Stl: A seasonal-trend decomposi- tion.J
Robert B Cleveland, William S Cleveland, Jean E McRae, and Irma Terpenning. Stl: A seasonal-trend decomposi- tion.J. off. Stat, 6(1):3–73, 1990
1990
-
[3]
Arima models
Robert H Shumway and David S Stoffer. Arima models. InTime series analysis and its applications: with R examples, pages 75–163. Springer, 2017
2017
-
[4]
Autoformer: Decomposition transformers with auto- correlation for long-term series forecasting.Advances in neural information processing systems, 34:22419–22430, 2021
Haixu Wu, Jiehui Xu, Jianmin Wang, and Mingsheng Long. Autoformer: Decomposition transformers with auto- correlation for long-term series forecasting.Advances in neural information processing systems, 34:22419–22430, 2021
2021
-
[5]
Fedformer: Frequency enhanced decomposed transformer for long-term series forecasting
Tian Zhou, Ziqing Ma, Qingsong Wen, Xue Wang, Liang Sun, and Rong Jin. Fedformer: Frequency enhanced decomposed transformer for long-term series forecasting. InInternational conference on machine learning, pages 27268–27286. PMLR, 2022
2022
-
[6]
Informer: Beyond efficient transformer for long sequence time-series forecasting
Haoyi Zhou, Shanghang Zhang, Jieqi Peng, Shuai Zhang, Jianxin Li, Hui Xiong, and Wancai Zhang. Informer: Beyond efficient transformer for long sequence time-series forecasting. InProceedings of the AAAI conference on artificial intelligence, volume 35, pages 11106–11115, 2021
2021
-
[7]
A Time Series is Worth 64 Words: Long-term Forecasting with Transformers
Yuqi Nie, Nam H Nguyen, Phanwadee Sinthong, and Jayant Kalagnanam. A time series is worth 64 words: Long-term forecasting with transformers.arXiv preprint arXiv:2211.14730, 2022
work page internal anchor Pith review arXiv 2022
-
[8]
iTransformer: Inverted Transformers Are Effective for Time Series Forecasting
Yong Liu, Tengge Hu, Haoran Zhang, Haixu Wu, Shiyu Wang, Lintao Ma, and Mingsheng Long. itransformer: Inverted transformers are effective for time series forecasting.arXiv preprint arXiv:2310.06625, 2023
work page internal anchor Pith review arXiv 2023
-
[9]
Are transformers effective for time series forecasting? In Proceedings of the AAAI conference on artificial intelligence, volume 37, pages 11121–11128, 2023
Ailing Zeng, Muxi Chen, Lei Zhang, and Qiang Xu. Are transformers effective for time series forecasting? In Proceedings of the AAAI conference on artificial intelligence, volume 37, pages 11121–11128, 2023
2023
-
[10]
Why do transformers fail to forecast time series in-context?arXiv preprint arXiv:2510.09776, 2025
Yufa Zhou, Yixiao Wang, Surbhi Goel, and Anru R Zhang. Why do transformers fail to forecast time series in-context?arXiv preprint arXiv:2510.09776, 2025
-
[11]
arXiv preprint arXiv:2403.02682 , year=
Sai Shankar Narasimhan, Shubhankar Agarwal, Oguzhan Akcin, Sujay Sanghavi, and Sandeep Chinchali. Time weaver: A conditional time series generation model.arXiv preprint arXiv:2403.02682, 2024
-
[12]
Verbalts: Generating time series from texts
Shuqi Gu, Chuyue Li, Baoyu Jing, and Kan Ren. Verbalts: Generating time series from texts. InForty-second International Conference on Machine Learning, 2025
2025
-
[13]
Yunfeng Ge, Jiawei Li, Yiji Zhao, Haomin Wen, Zhao Li, Meikang Qiu, Hongyan Li, Ming Jin, and Shirui Pan. T2s: High-resolution time series generation with text-to-series diffusion models.arXiv preprint arXiv:2505.02417, 2025
-
[14]
Hao Li, Yu-Hao Huang, Chang Xu, Viktor Schlegel, Renhe Jiang, Riza Batista-Navarro, Goran Nenadic, and Jiang Bian. Bridge: Bootstrapping text to control time-series generation via multi-agent iterative optimization and diffusion modeling.arXiv preprint arXiv:2503.02445, 2025
-
[15]
DeepAR: Probabilistic Forecasting with Autoregressive Recurrent Networks
Valentin Flunkert, David Salinas, and Jan Gasthaus. Deepar: Probabilistic forecasting with autoregressive recurrent networks.CoRR, abs/1704.04110, 2017
work page Pith review arXiv 2017
-
[16]
Long short-term memory.Neural Computation, 9(8):1735–1780, 1997
Sepp Hochreiter and Jürgen Schmidhuber. Long short-term memory.Neural Computation, 9(8):1735–1780, 1997
1997
-
[17]
Modeling long-and short-term temporal patterns with deep neural networks
Guokun Lai, Wei-Cheng Chang, Yiming Yang, and Hanxiao Liu. Modeling long-and short-term temporal patterns with deep neural networks. InThe 41st international ACM SIGIR conference on research & development in information retrieval, pages 95–104, 2018
2018
-
[18]
Next-latent prediction transformers learn compact world models.arXiv preprint arXiv:2511.05963, 2025
Jayden Teoh, Manan Tomar, Kwangjun Ahn, Edward S Hu, Pratyusha Sharma, Riashat Islam, Alex Lamb, and John Langford. Next-latent prediction transformers learn compact world models.arXiv preprint arXiv:2511.05963, 2025
-
[19]
Probabilistic transformer: A probabilistic dependency model for contextual word representation
Haoyi Wu and Kewei Tu. Probabilistic transformer: A probabilistic dependency model for contextual word representation. InFindings of the Association for Computational Linguistics: ACL 2023, pages 7613–7636, 2023
2023
-
[20]
Yaguang Li, Rose Yu, Cyrus Shahabi, and Yan Liu. Diffusion convolutional recurrent neural network: Data-driven traffic forecasting.arXiv preprint arXiv:1707.01926, 2017
-
[21]
Relational inductive biases, deep learning, and graph networks
Peter W Battaglia, Jessica B Hamrick, Victor Bapst, Alvaro Sanchez-Gonzalez, Vinicius Zambaldi, Mateusz Malinowski, Andrea Tacchetti, David Raposo, Adam Santoro, Ryan Faulkner, et al. Relational inductive biases, deep learning, and graph networks.arXiv preprint arXiv:1806.01261, 2018. 16 Preprint
work page internal anchor Pith review arXiv 2018
-
[22]
Comparing prior and learned time representations in transformer models of timeseries, 2024
Natalia Koliou, Tatiana Boura, Stasinos Konstantopoulos, George Meramveliotakis, and George Kosmadakis. Comparing prior and learned time representations in transformer models of timeseries, 2024
2024
-
[23]
arXiv preprint arXiv:2603.04767 , year=
Shaocheng Lan, Shuqi Gu, Zhangzhi Xiong, and Kan Ren. Contsg-bench: A unified benchmark for conditional time series generation.arXiv preprint arXiv:2603.04767, 2026
-
[24]
Probabilistic transformer for time series analysis.Advances in neural information processing systems, 34:23592–23608, 2021
Binh Tang and David S Matteson. Probabilistic transformer for time series analysis.Advances in neural information processing systems, 34:23592–23608, 2021
2021
-
[25]
Enhancing the locality and breaking the memory bottleneck of transformer on time series forecasting.Advances in neural information processing systems, 32, 2019
Shiyang Li, Xiaoyong Jin, Yao Xuan, Xiyou Zhou, Wenhu Chen, Yu-Xiang Wang, and Xifeng Yan. Enhancing the locality and breaking the memory bottleneck of transformer on time series forecasting.Advances in neural information processing systems, 32, 2019
2019
-
[26]
Pyraformer: Low-complexity pyramidal attention for long-range time series modeling and forecasting
Shizhan Liu, Hang Yu, Cong Liao, Jianguo Li, Weiyao Lin, Alex X Liu, and Schahram Dustdar. Pyraformer: Low-complexity pyramidal attention for long-range time series modeling and forecasting. InInternational conference on learning representations, 2021
2021
-
[27]
Gerald Woo, Chenghao Liu, Doyen Sahoo, Akshat Kumar, and Steven Hoi. Etsformer: Exponential smoothing transformers for time-series forecasting.arXiv preprint arXiv:2202.01381, 2022
-
[28]
Film: Frequency improved legendre memory model for long-term time series forecasting.Advances in neural information processing systems, 35:12677–12690, 2022
Tian Zhou, Ziqing Ma, Qingsong Wen, Liang Sun, Tao Yao, Wotao Yin, Rong Jin, et al. Film: Frequency improved legendre memory model for long-term time series forecasting.Advances in neural information processing systems, 35:12677–12690, 2022
2022
-
[29]
arXiv preprint arXiv:2303.06053 , year=
Si-An Chen, Chun-Liang Li, Nate Yoder, Sercan O Arik, and Tomas Pfister. Tsmixer: An all-mlp architecture for time series forecasting.arXiv preprint arXiv:2303.06053, 2023
-
[30]
Timemixer++: A general time series pattern machine for universal predictive analysis
Shiyu Wang, Jiawei Li, Xiaoming Shi, Zhou Ye, Baichuan Mo, Wenze Lin, Shengtong Ju, Zhixuan Chu, and Ming Jin. Timemixer++: A general time series pattern machine for universal predictive analysis.arXiv preprint arXiv:2410.16032, 2024
-
[31]
Model-agnostic meta-learning for fast adaptation of deep networks
Chelsea Finn, Pieter Abbeel, and Sergey Levine. Model-agnostic meta-learning for fast adaptation of deep networks. InInternational conference on machine learning, pages 1126–1135. PMLR, 2017
2017
-
[32]
Few-shot learning for time-series forecasting.arXiv preprint arXiv:2009.14379, 2020
Tomoharu Iwata and Atsutoshi Kumagai. Few-shot learning for time-series forecasting.arXiv preprint arXiv:2009.14379, 2020
-
[33]
Timegpt-1.arXiv preprint arXiv:2310.03589, 2023
Azul Garza, Cristian Challu, and Max Mergenthaler-Canseco. Timegpt-1.arXiv preprint arXiv:2310.03589, 2023
-
[34]
arXiv preprint arXiv:2310.01728 , year=
Ming Jin, Shiyu Wang, Lintao Ma, Zhixuan Chu, James Y Zhang, Xiaoming Shi, Pin-Yu Chen, Yuxuan Liang, Yuan-Fang Li, Shirui Pan, et al. Time-llm: Time series forecasting by reprogramming large language models. arXiv preprint arXiv:2310.01728, 2023
-
[35]
One fits all: Power general time series analysis by pretrained lm.Advances in neural information processing systems, 36:43322–43355, 2023
Tian Zhou, Peisong Niu, Liang Sun, Rong Jin, et al. One fits all: Power general time series analysis by pretrained lm.Advances in neural information processing systems, 36:43322–43355, 2023
2023
-
[36]
In-context fine- tuning for time-series foundation models.arXiv preprint arXiv:2410.24087, 2024a
Abhimanyu Das, Matthew Faw, Rajat Sen, and Yichen Zhou. In-context fine-tuning for time-series foundation models.arXiv preprint arXiv:2410.24087, 2024
-
[37]
A survey of few-shot learning for biomedical time series.IEEE Reviews in Biomedical Engineering, 18:192–210, 2024
Chenqi Li, Timothy Denison, and Tingting Zhu. A survey of few-shot learning for biomedical time series.IEEE Reviews in Biomedical Engineering, 18:192–210, 2024
2024
-
[38]
Zhongkai Hao, Songming Liu, Yichi Zhang, Chengyang Ying, Yao Feng, Hang Su, and Jun Zhu. Physics-informed machine learning: A survey on problems, methods and applications.arXiv preprint arXiv:2211.08064, 2022
-
[39]
A prior-knowledge- based time series model for heat demand prediction of district heating systems.Applied Thermal Engineering, 252:123696, 2024
Yiwen Zhang, Xiangning Tian, Yazhou Zhao, Chaobo Zhang, Yang Zhao, and Jie Lu. A prior-knowledge- based time series model for heat demand prediction of district heating systems.Applied Thermal Engineering, 252:123696, 2024
2024
-
[40]
Xiaomin Li, Anne Hee Hiong Ngu, and Vangelis Metsis. Tts-cgan: A transformer time-series conditional gan for biosignal data augmentation.arXiv preprint arXiv:2206.13676, 2022
-
[41]
Daesoo Lee, Sara Malacarne, and Erlend Aune. Vector quantized time series generation with a bidirectional prior model.arXiv preprint arXiv:2303.04743, 2023
-
[42]
Wavestitch: Flexible and fast conditional time series generation with diffusion models.Proceedings of the ACM on Management of Data, 3(6):1–25, 2025
Aditya Shankar, Lydia Chen, Arie van Deursen, and Rihan Hai. Wavestitch: Flexible and fast conditional time series generation with diffusion models.Proceedings of the ACM on Management of Data, 3(6):1–25, 2025
2025
-
[43]
Towards editing time series.Advances in Neural Information Processing Systems, 37:37561–37593, 2024
Baoyu Jing, Shuqi Gu, Tianyu Chen, Zhiyu Yang, Dongsheng Li, Jingrui He, and Kan Ren. Towards editing time series.Advances in Neural Information Processing Systems, 37:37561–37593, 2024
2024
-
[44]
Diffusets: 12-lead ecg generation conditioned on clinical text reports and patient-specific information.Patterns, 6(10):101291, October 2025
Yongfan Lai, Jiabo Chen, Qinghao Zhao, Deyun Zhang, Yue Wang, Shijia Geng, Hongyan Li, and Shenda Hong. Diffusets: 12-lead ecg generation conditioned on clinical text reports and patient-specific information.Patterns, 6(10):101291, October 2025. 17 Preprint
2025
-
[45]
Mlp-mixer: An all-mlp architecture for vision
Ilya O Tolstikhin, Neil Houlsby, Alexander Kolesnikov, Lucas Beyer, Xiaohua Zhai, Thomas Unterthiner, Jessica Yung, Andreas Steiner, Daniel Keysers, Jakob Uszkoreit, et al. Mlp-mixer: An all-mlp architecture for vision. Advances in neural information processing systems, 34:24261–24272, 2021
2021
-
[46]
David Ha, Andrew Dai, and Quoc V Le. Hypernetworks.arXiv preprint arXiv:1609.09106, 2016
work page internal anchor Pith review arXiv 2016
-
[47]
arXiv preprint arXiv:2210.02186 , year=
Haixu Wu, Tengge Hu, Yong Liu, Hang Zhou, Jianmin Wang, and Mingsheng Long. Timesnet: Temporal 2d-variation modeling for general time series analysis.arXiv preprint arXiv:2210.02186, 2022
-
[48]
Crossformer: Transformer utilizing cross-dimension dependency for multivariate time series forecasting
Yunhao Zhang and Junchi Yan. Crossformer: Transformer utilizing cross-dimension dependency for multivariate time series forecasting. InThe eleventh international conference on learning representations, 2023
2023
-
[49]
Guy Tevet, Sigal Raab, Brian Gordon, Yonatan Shafir, Daniel Cohen-Or, and Amit H Bermano. Human motion diffusion model.arXiv preprint arXiv:2209.14916, 2022. 18 Preprint A Message Passing in ST-PT: Full Derivation This appendix expands the cornerstone message-passing of Section 3 with a full MFVI derivation, starting from the CRF potentials and ending wit...
work page internal anchor Pith review arXiv 2022
-
[50]
causal(G) or channel-independence (B.4)) enter here
Joint softmax.Concatenate the two vectors and take a single softmax over the combined (P−1)+(N−1) candidates (15), yielding q(H (c) i,t ); optional additive−∞ masks (e.g. causal(G) or channel-independence (B.4)) enter here
-
[51]
architectural trick
Backward messages.Form ˜mtime i,t and ˜mchan i,t by (18)–(19) (both are attention-weighted sums of keys from the admissible parents), then apply the per-head output projection implicit inU (·,c)⊤. 5.Topic message.Apply the binary-factor FFN to Z i,t. 6.DampedZupdate.Combine unary, both backward messages, and topic message by (20). Under the low-rank decom...
2048
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.