Time2Vec: Learning a Vector Representation of Time
Pith reviewed 2026-05-24 23:02 UTC · model grok-4.3
The pith
Time2Vec replaces raw time inputs with a learned vector that improves performance when added to existing models.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper claims that replacing the notion of time with its Time2Vec representation improves the performance of the final model on multiple problems and architectures.
What carries the argument
Time2Vec, a trainable vector representation of time that combines periodic and non-periodic components and can be imported into existing models.
If this is right
- Existing sequence or event models can incorporate Time2Vec without redesigning their layers.
- Performance gains appear on both synchronous and asynchronous event data.
- The representation works as a drop-in replacement for conventional time encodings.
- The same vector can be reused across different downstream tasks once learned.
Where Pith is reading between the lines
- The vector form might allow transfer of temporal knowledge between unrelated domains if the same Time2Vec module is shared.
- If the periodic components are fixed rather than learned, the method could become fully parameter-free for certain periodicities.
- The approach suggests testing whether similar vector encodings help in non-neural models such as decision trees or linear regressors on time-stamped data.
Load-bearing premise
A single learned vector form of time can be used across different models and problems to capture temporal information better than standard time features.
What would settle it
A controlled experiment in which the same set of models and temporal datasets are run once with raw time inputs and once with Time2Vec inputs, and the latter shows no consistent gain or shows loss.
Figures
read the original abstract
Time is an important feature in many applications involving events that occur synchronously and/or asynchronously. To effectively consume time information, recent studies have focused on designing new architectures. In this paper, we take an orthogonal but complementary approach by providing a model-agnostic vector representation for time, called Time2Vec, that can be easily imported into many existing and future architectures and improve their performances. We show on a range of models and problems that replacing the notion of time with its Time2Vec representation improves the performance of the final model.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces Time2Vec, a model-agnostic, learnable vector representation of time intended to be plugged into existing architectures for tasks involving synchronous or asynchronous events. The central claim is that replacing standard notions of time with this representation improves final model performance across a range of models and problems.
Significance. If the empirical results hold with proper validation, Time2Vec could serve as a lightweight, reusable component for incorporating temporal structure in machine learning pipelines, complementing architecture-specific innovations in time-series and event modeling.
major comments (1)
- Abstract: the central claim of performance improvement is stated without any experimental details, baselines, datasets, quantitative results, error bars, or implementation specifics, preventing verification of the claim that Time2Vec substitution improves performance.
Simulated Author's Rebuttal
We thank the referee for their review. We address the single major comment below.
read point-by-point responses
-
Referee: Abstract: the central claim of performance improvement is stated without any experimental details, baselines, datasets, quantitative results, error bars, or implementation specifics, preventing verification of the claim that Time2Vec substitution improves performance.
Authors: We agree that the abstract states the central claim at a high level without supporting experimental specifics. While abstracts are necessarily concise, the current wording does not adequately convey the scope of the evaluation. In the revised manuscript we will expand the abstract to include brief references to the models tested, the range of problems considered, and the nature of the observed improvements. revision: yes
Circularity Check
No significant circularity
full rationale
The paper presents Time2Vec as a learnable, model-agnostic vector embedding for time that is substituted into existing architectures, with performance gains demonstrated empirically across multiple models and tasks. No derivation chain, equations, or load-bearing steps are visible in the provided abstract or description that reduce by construction to fitted parameters, self-definitions, or self-citation chains; the representation is defined independently and validated externally rather than tautologically.
Axiom & Free-Parameter Ledger
Forward citations
Cited by 16 Pith papers
-
TRAJGANR: Trajectory-Centric Urban Multimodal Learning via Geospatially Aligned Neural Representations
TrajGANR learns continuous neural representations of trajectories to enable fine-grained alignment with street-view images and locations in a joint multimodal self-supervised objective, outperforming prior geospatial ...
-
NEST: Nested Event Stream Transformer for Sequences of Multisets
NEST is a nested transformer for sequences of multisets that uses masked set modeling to learn improved set-level representations from hierarchical event streams like EHRs.
-
Temporal Graph Networks for Deep Learning on Dynamic Graphs
Temporal Graph Networks combine memory modules and graph operators to learn on dynamic graphs as timed event sequences, outperforming prior methods on transductive and inductive tasks while unifying earlier models as ...
-
MILM: Large Language Models for Multimodal Irregular Time Series with Informative Sampling
MILM fine-tunes LLMs on XML-encoded multimodal irregular time series via a two-stage process that exploits informative sampling patterns to achieve top performance on EHR classification datasets.
-
A Benchmark of Classical and Deep Learning Models for Agricultural Commodity Price Forecasting on A Novel Bangladeshi Market Price Dataset
AgriPriceBD dataset of 1779 daily prices released; naive persistence outperforms deep models like Informer and Time2Vec-Transformer on heterogeneous Bangladeshi commodity series with statistical validation.
-
A Scalable Nonparametric Continuous-Time Survival Model through Numerical Quadrature
QSurv uses Gauss-Legendre numerical quadrature and time-conditioned low-rank adaptation to enable scalable nonparametric continuous-time survival modeling with theoretical error bounds.
-
EHR-RAGp: Retrieval-Augmented Prototype-Guided Foundation Model for Electronic Health Records
EHR-RAGp is a retrieval-augmented EHR foundation model that employs prototype-guided retrieval to dynamically integrate relevant historical patient context, outperforming prior models on clinical prediction tasks.
-
TraXion: Rethinking Pre-training Frameworks for Mobility and Beyond
TraXion supplies a unified pre-training approach for multi-entity spatiotemporal event streams that outperforms task-specific baselines on mobility tasks and transfers unchanged to authentication logs and ICU mortalit...
-
To Use AI as Dice of Possibilities with Timing Computation
Proposes verb-based paradigm with timing computation to enable data-driven discovery of patient trajectories and counterfactual timing from EHR data without domain knowledge.
-
A-THENA: Early Intrusion Detection for IoT with Time-Aware Hybrid Encoding and Network-Specific Augmentation
A-THENA improves averaged IoT intrusion detection accuracy by 3.69-6.88 percentage points over baselines on three datasets using time-aware hybrid encoding and network-specific augmentation, with near-zero false alarm...
-
ARMove: Learning to Predict Human Mobility through Agentic Reasoning
ARMove is a transferable framework for human mobility prediction that combines agentic LLM reasoning, feature management, and large-small model synergy to outperform baselines on several metrics while improving interp...
-
Representation Before Training: A Fixed-Budget Benchmark for Generative Medical Event Models
Fused code-value tokenization improves mortality AUROC from 0.891 to 0.915 and other clinical outcome predictions, while certain temporal encodings like event order match or exceed time tokens with shorter sequences.
-
DBGL: Decay-aware Bipartite Graph Learning for Irregular Medical Time Series Classification
DBGL models irregular medical time series via patient-variable bipartite graphs and node-specific temporal decay encoding to avoid artificial alignment and capture decay rates, outperforming baselines on four public datasets.
-
Capture Timing-Attention of Events in Clinical Time Series
LITT aligns individual clinical event sequences on a relative timeline to enable timing-aware attention and better prediction of personalized health trajectories.
-
Transformer-Based Wildlife Species Classification from Daily Movement Trajectories
Transformer models classify seven wildlife species from daily GPS trajectories, outperforming LSTM, CNN, and TCN baselines by 8-22 percentage points in balanced accuracy under region-holdout evaluation.
-
Empirical Assessment of Time-Series Foundation Models For Power System Forecasting Applications
The paper benchmarks foundation models like TimesFM and Chronos against baselines on eight forecasting capabilities for power system time series.
Reference graph
Works this paper leans on
-
[1]
Hirotugu Akaike. Fitting autoregressive models for prediction.Annals of the institute of Statistical Mathematics, 21(1):243–247, 1969
work page 1969
-
[2]
Feature representations for neuromorphic audio spike streams
Jithendar Anumula, Daniel Neil, Tobi Delbruck, and Shih-Chii Liu. Feature representations for neuromorphic audio spike streams. Frontiers in neuroscience, 12:23, 2018
work page 2018
-
[3]
Patient subtyping via time-aware lstm networks
Inci M Baytas, Cao Xiao, Xi Zhang, Fei Wang, Anil K Jain, and Jiayu Zhou. Patient subtyping via time-aware lstm networks. In ACM SIGKDD, pages 65–74, 2017
work page 2017
-
[4]
Long short-term memory and learning-to-learn in networks of spiking neurons
Guillaume Bellec, Darjan Salaj, Anand Subramoney, Robert Legenstein, and Wolfgang Maass. Long short-term memory and learning-to-learn in networks of spiking neurons. In NeurIPS, 2018
work page 2018
-
[5]
The F ourier transform and its applications
Ronald Newbold Bracewell and Ronald N Bracewell. The F ourier transform and its applications. McGraw-Hill New York, 1986
work page 1986
-
[6]
Skip rnn: Learning to skip state updates in recurrent neural networks
Víctor Campos, Brendan Jou, Xavier Giró-i Nieto, Jordi Torres, and Shih-Fu Chang. Skip rnn: Learning to skip state updates in recurrent neural networks. In ICLR, 2018
work page 2018
-
[7]
O. Celma. Music Recommendation and Discovery in the Long Tail . Springer, 2010
work page 2010
-
[8]
Neural ordinary differential equations
Tian Qi Chen, Yulia Rubanova, Jesse Bettencourt, and David Duvenaud. Neural ordinary differential equations. In Neural Information Processing Systems (NeurIPS) , 2018
work page 2018
-
[9]
Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation
Kyunghyun Cho, Bart Van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. Learning phrase representations using rnn encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078, 2014
work page internal anchor Pith review Pith/arXiv arXiv 2014
-
[10]
Doctor AI: Predicting clinical events via recurrent neural networks
Edward Choi, Mohammad Taha Bahadori, Andy Schuetz, Walter F Stewart, and Jimeng Sun. Doctor AI: Predicting clinical events via recurrent neural networks. In Machine Learning for Healthcare Conference, pages 301–318, 2016
work page 2016
-
[11]
Time-frequency analysis, volume 778
Leon Cohen. Time-frequency analysis, volume 778. Prentice hall, 1995
work page 1995
-
[12]
An introduction to the theory of point processes: volume II: general theory and structure
Daryl J Daley and David Vere-Jones. An introduction to the theory of point processes: volume II: general theory and structure . Springer Science & Business Media, 2007
work page 2007
-
[13]
Support vector regression machines
Harris Drucker, Christopher JC Burges, Linda Kaufman, Alex J Smola, and Vladimir Vapnik. Support vector regression machines. In NeurIPS, pages 155–161, 1997
work page 1997
-
[14]
Recurrent marked temporal point processes: Embedding event history to vector
Nan Du, Hanjun Dai, Rakshit Trivedi, Utkarsh Upadhyay, Manuel Gomez-Rodriguez, and Le Song. Recurrent marked temporal point processes: Embedding event history to vector. In ACM SIGKDD, pages 1555–1564. ACM, 2016
work page 2016
-
[15]
evt_MNIST: A spike based version of traditional MNIST
Mazdak Fatahi, Mahmood Ahmadi, Mahyar Shahsavari, Arash Ahmadi, and Philippe Devienne. evt_mnist: A spike based version of traditional mnist. arXiv preprint arXiv:1604.06751, 2016
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[16]
Modeling time series data with deep fourier neural networks
Michael S Gashler and Stephen C Ashmore. Modeling time series data with deep fourier neural networks. Neurocomputing, 188:3–11, 2016
work page 2016
-
[17]
Convolutional Sequence to Sequence Learning
Jonas Gehring, Michael Auli, David Grangier, Denis Yarats, and Yann N Dauphin. Convolutional sequence to sequence learning. arXiv preprint arXiv:1705.03122, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[18]
Recurrent nets that time and count
Felix A Gers and Jürgen Schmidhuber. Recurrent nets that time and count. In IJCNN, volume 3, pages 189–194. IEEE, 2000
work page 2000
-
[19]
Taming the waves: sine as activation function in deep neural networks
Tuomas Virtanen Giambattista Parascandolo, Heikki Huttunen. Taming the waves: sine as activation function in deep neural networks. 2017
work page 2017
-
[20]
Neural decomposition of time-series data for effective generalization
Luke B Godfrey and Michael S Gashler. Neural decomposition of time-series data for effective generalization. IEEE transactions on neural networks and learning systems , 29(7):2973–2985, 2018. 9
work page 2018
-
[21]
Klaus Greff, Rupesh K Srivastava, Jan Koutník, Bas R Steunebrink, and Jürgen Schmidhuber. Lstm: A search space odyssey. IEEE transactions on neural networks and learning systems , 28(10):2222–2232, 2017
work page 2017
-
[22]
node2vec: Scalable feature learning for networks
Aditya Grover and Jure Leskovec. node2vec: Scalable feature learning for networks. In ACM SIGKDD, pages 855–864, 2016
work page 2016
-
[23]
Sepp Hochreiter and Jürgen Schmidhuber. Long short-term memory. Neural computation, 9(8):1735–1780, 1997
work page 1997
-
[24]
State-frequency memory recurrent neural networks
Hao Hu and Guo-Jun Qi. State-frequency memory recurrent neural networks. In International Conference on Machine Learning, pages 1568–1577, 2017
work page 2017
-
[25]
SimplE embedding for link prediction in knowledge graphs
Seyed Mehran Kazemi and David Poole. SimplE embedding for link prediction in knowledge graphs. In NeurIPS, pages 4289–4300, 2018
work page 2018
-
[26]
Relational representation learning for dynamic (knowledge) graphs: A survey
Seyed Mehran Kazemi, Rishab Goel, Kshitij Jain, Ivan Kobyzev, Akshay Sethi, Peter Forsyth, and Pascal Poupart. Relational representation learning for dynamic (knowledge) graphs: A survey. arXiv preprint arXiv:1905.11485, 2019
-
[27]
Adam: A Method for Stochastic Optimization
Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014
work page internal anchor Pith review Pith/arXiv arXiv 2014
-
[28]
Learning Dynamic Embeddings from Temporal Interactions
Srijan Kumar, Xikun Zhang, and Jure Leskovec. Learning dynamic embedding from temporal interaction networks. arXiv preprint arXiv:1812.02289, 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[29]
Bum Chul Kwon, Min-Je Choi, Joanne Taery Kim, Edward Choi, Young Bin Kim, Soonwook Kwon, Jimeng Sun, and Jaegul Choo. Retainvis: Visual analytics with interpretable and interactive recurrent neural networks on electronic medical records. IEEE transactions on visualization and computer graphics, 25(1):299–309, 2019
work page 2019
-
[30]
Nonlinear signal processing using neural networks: Prediction and system modelling
Alan Lapedes and Robert Farber. Nonlinear signal processing using neural networks: Prediction and system modelling. Technical report, 1987
work page 1987
-
[31]
Patrick J Laub, Thomas Taimre, and Philip K Pollett. Hawkes processes. arXiv preprint arXiv:1507.02822, 2015
work page internal anchor Pith review Pith/arXiv arXiv 2015
- [32]
-
[33]
Time-Dependent Representation for Neural Event Sequence Prediction
Yang Li, Nan Du, and Samy Bengio. Time-dependent representation for neural event sequence prediction. arXiv preprint arXiv:1708.00065, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[34]
Learning temporal point processes via reinforcement learning
Shuang Li, Shuai Xiao, Shixiang Zhu, Nan Du, Yao Xie, and Le Song. Learning temporal point processes via reinforcement learning. In NeurIPS, pages 10804–10814, 2018
work page 2018
-
[35]
Time-dependent representation for neural event sequence prediction
Yang Li, Nan Du, and Samy Bengio. Time-dependent representation for neural event sequence prediction. 2018
work page 2018
-
[36]
Zachary C Lipton, David Kale, and Randall Wetzel. Directly modeling missing data in sequences with rnns: Improved classification of clinical time series. In Machine Learning for Healthcare Conference, pages 253–270, 2016
work page 2016
-
[37]
Peng Liu, Zhigang Zeng, and Jun Wang. Multistability of recurrent neural networks with nonmonotonic activation functions and mixed time delays. IEEE Transactions on Systems, Man, and Cybernetics: Systems , 46(4):512–523, 2016
work page 2016
-
[38]
Streaming Graph Neural Networks
Yao Ma, Ziyi Guo, Zhaochun Ren, Eric Zhao, Jiliang Tang, and Dawei Yin. Streaming graph neural networks. arXiv preprint arXiv:1810.10627, 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[39]
The neural hawkes process: A neurally self-modulating multivariate point process
Hongyuan Mei and Jason M Eisner. The neural hawkes process: A neurally self-modulating multivariate point process. In NeurIPS, pages 6754–6764, 2017
work page 2017
-
[40]
Distributed repre- sentations of words and phrases and their compositionality
Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. Distributed repre- sentations of words and phrases and their compositionality. In NeurIPS, 2013. 10
work page 2013
-
[41]
Fourier neural networks: An approach with sinusoidal activation functions
Luis Mingo, Levon Aslanyan, Juan Castellanos, Miguel Diaz, and Vladimir Riazanov. Fourier neural networks: An approach with sinusoidal activation functions. 2004
work page 2004
-
[42]
Dynamic bayesian networks: representation, inference and learning
Kevin Patrick Murphy and Stuart Russell. Dynamic bayesian networks: representation, inference and learning. 2002
work page 2002
-
[43]
Rectified linear units improve restricted boltzmann machines
Vinod Nair and Geoffrey E Hinton. Rectified linear units improve restricted boltzmann machines. In ICML, pages 807–814, 2010
work page 2010
-
[44]
Phased lstm: Accelerating recurrent network training for long or event-based sequences
Daniel Neil, Michael Pfeiffer, and Shih-Chii Liu. Phased lstm: Accelerating recurrent network training for long or event-based sequences. In NeurIPS, pages 3882–3890, 2016
work page 2016
-
[45]
The role of over-parametrization in generalization of neural networks
Behnam Neyshabur, Zhiyuan Li, Srinadh Bhojanapalli, Yann LeCun, and Nathan Srebro. The role of over-parametrization in generalization of neural networks. In ICLR, 2019
work page 2019
-
[46]
A review of relational machine learning for knowledge graphs
Maximilian Nickel, Kevin Murphy, V olker Tresp, and Evgeniy Gabrilovich. A review of relational machine learning for knowledge graphs. Proceedings of the IEEE, 104(1):11–33, 2016
work page 2016
-
[47]
Automatic differentiation in pytorch
Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. Automatic differentiation in pytorch. 2017
work page 2017
-
[48]
Glove: Global vectors for word representation
Jeffrey Pennington, Richard Socher, and Christopher Manning. Glove: Global vectors for word representation. In EMNLP, pages 1532–1543, 2014
work page 2014
-
[49]
Population size extrapolation in relational probabilistic modelling
David Poole, David Buchman, Seyed Mehran Kazemi, Kristian Kersting, and Sriraam Natarajan. Population size extrapolation in relational probabilistic modelling. In SUM. Springer, 2014
work page 2014
-
[50]
An introduction to hidden markov models
Lawrence R Rabiner and Biing-Hwang Juang. An introduction to hidden markov models. ieee assp magazine, 3(1):4–16, 1986
work page 1986
-
[51]
Gaussian processes in machine learning
Carl Edward Rasmussen. Gaussian processes in machine learning. In Advanced lectures on machine learning, pages 63–71. Springer, 2004
work page 2004
-
[52]
Josep M Sopena, Enrique Romero, and Rene Alquezar. Neural networks with periodic and monotonic activation functions: a comparative study in classification problems. 1999
work page 1999
-
[53]
Charles Sutton, Andrew McCallum, and Khashayar Rohanimanesh. Dynamic conditional random fields: Factorized probabilistic models for labeling and segmenting sequence data.Journal of Machine Learning Research, 8(Mar):693–723, 2007
work page 2007
-
[54]
Corentin Tallec and Yann Ollivier. Can recurrent neural networks warp time? In International Conference on Learning Representation (ICLR) , 2018
work page 2018
-
[55]
Know-evolve: Deep temporal reasoning for dynamic knowledge graphs
Rakshit Trivedi, Hanjun Dai, Yichen Wang, and Le Song. Know-evolve: Deep temporal reasoning for dynamic knowledge graphs. In ICML, pages 3462–3471, 2017
work page 2017
-
[56]
Deep reinforcement learning of marked temporal point processes
Utkarsh Upadhyay, Abir De, and Manuel Gomez-Rodriguez. Deep reinforcement learning of marked temporal point processes. In NeurIPS, 2018
work page 2018
-
[57]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need. In NeurIPS, 2017
work page 2017
-
[58]
Kwok-wo Wong, Chi-sing Leung, and Sheng-jiang Chang. Handwritten digit recognition using multilayer feedforward neural networks with periodic and monotonic activation functions. In Pattern Recognition, volume 3, pages 106–109. IEEE, 2002
work page 2002
-
[59]
Wasserstein learning of deep generative point process models
Shuai Xiao, Mehrdad Farajtabar, Xiaojing Ye, Junchi Yan, Le Song, and Hongyuan Zha. Wasserstein learning of deep generative point process models. In NeurIPS, 2017
work page 2017
-
[60]
Learning conditional generative models for temporal point processes
Shuai Xiao, Hongteng Xu, Junchi Yan, Mehrdad Farajtabar, Xiaokang Yang, Le Song, and Hongyuan Zha. Learning conditional generative models for temporal point processes. In AAAI, 2018
work page 2018
-
[61]
What to do next: Modeling user behaviors by time-lstm
Yu Zhu, Hao Li, Yikang Liao, Beidou Wang, Ziyu Guan, Haifeng Liu, and Deng Cai. What to do next: Modeling user behaviors by time-lstm. In IJCAI, pages 3602–3608, 2017. 11 0 200 400 600 800 1000 Epoch 0.10 0.15 0.20 0.25 0.30 0.35 0.40Accuracy LSTM+T LSTM+Time2Vec(l=16+1) LSTM+Time2Vec(l=32+1) LSTM+Time2Vec(l=64+1) Figure 6: Comparing LSTM+T and LSTM+Time2...
work page 2017
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.