Time2Vec: Learning a Vector Representation of Time

Cathal Smyth; Janahan Ramanan; Jaspreet Sahota; Marcus Brubaker; Pascal Poupart; Rishab Goel; Sanjay Thakur; Sepehr Eghbali; Seyed Mehran Kazemi; Stella Wu

arxiv: 1907.05321 · v1 · pith:K46YEPRKnew · submitted 2019-07-11 · 💻 cs.LG

Time2Vec: Learning a Vector Representation of Time

Seyed Mehran Kazemi , Rishab Goel , Sepehr Eghbali , Janahan Ramanan , Jaspreet Sahota , Sanjay Thakur , Stella Wu , Cathal Smyth

show 2 more authors

Pascal Poupart Marcus Brubaker

This is my paper

Pith reviewed 2026-05-24 23:02 UTC · model grok-4.3

classification 💻 cs.LG

keywords time representationtemporal embeddingmodel-agnosticevent modelingneural networksperiodic features

0 comments

The pith

Time2Vec replaces raw time inputs with a learned vector that improves performance when added to existing models.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces Time2Vec as a model-agnostic vector representation of time. It is designed to be plugged into many different neural architectures without changing their core structure. The authors demonstrate that swapping standard time features for this representation raises accuracy or other performance metrics on a range of temporal tasks. The approach focuses on capturing both periodic and non-periodic aspects of time in a single trainable vector. Because the method is orthogonal to architecture design, it can be applied to models that already handle synchronous or asynchronous events.

Core claim

The paper claims that replacing the notion of time with its Time2Vec representation improves the performance of the final model on multiple problems and architectures.

What carries the argument

Time2Vec, a trainable vector representation of time that combines periodic and non-periodic components and can be imported into existing models.

If this is right

Existing sequence or event models can incorporate Time2Vec without redesigning their layers.
Performance gains appear on both synchronous and asynchronous event data.
The representation works as a drop-in replacement for conventional time encodings.
The same vector can be reused across different downstream tasks once learned.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The vector form might allow transfer of temporal knowledge between unrelated domains if the same Time2Vec module is shared.
If the periodic components are fixed rather than learned, the method could become fully parameter-free for certain periodicities.
The approach suggests testing whether similar vector encodings help in non-neural models such as decision trees or linear regressors on time-stamped data.

Load-bearing premise

A single learned vector form of time can be used across different models and problems to capture temporal information better than standard time features.

What would settle it

A controlled experiment in which the same set of models and temporal datasets are run once with raw time inputs and once with Time2Vec inputs, and the latter shows no consistent gain or shows loss.

Figures

Figures reproduced from arXiv: 1907.05321 by Cathal Smyth, Janahan Ramanan, Jaspreet Sahota, Marcus Brubaker, Pascal Poupart, Rishab Goel, Sanjay Thakur, Sepehr Eghbali, Seyed Mehran Kazemi, Stella Wu.

**Figure 2.** Figure 2: Comparing TLSTM1 and TLSTM3 on Last.FM and CiteULike in terms of Recall@10 [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗

**Figure 3.** Figure 3: The models learned for our synthesized dataset explained in Subsection 5.2 before the final [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

**Figure 4.** Figure 4: (a) Initial vs. (b) learned weights and frequencies for our synthesized dataset. [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

**Figure 5.** Figure 5: An ablation study of several components in Time2Vec. (a) Comparing different activa [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗

**Figure 6.** Figure 6: Comparing LSTM+T and LSTM+Time2Vec on Event-MNIST. [PITH_FULL_IMAGE:figures/full_fig_p012_6.png] view at source ↗

**Figure 7.** Figure 7: Comparing LSTM+T and LSTM+Time2Vec on Event-MNIST and raw N_TIDIGITS18. [PITH_FULL_IMAGE:figures/full_fig_p012_7.png] view at source ↗

**Figure 8.** Figure 8: Comparing LSTM+T and LSTM+Time2Vec on SOF. [PITH_FULL_IMAGE:figures/full_fig_p013_8.png] view at source ↗

**Figure 9.** Figure 9: Comparing LSTM+T and LSTM+Time2Vec on Last.FM. [PITH_FULL_IMAGE:figures/full_fig_p013_9.png] view at source ↗

**Figure 10.** Figure 10: Comparing LSTM+T and LSTM+Time2Vec on CiteULike. [PITH_FULL_IMAGE:figures/full_fig_p014_10.png] view at source ↗

**Figure 11.** Figure 11: TLSTM1’s performance on Last.FM with and without Time2Vec. [PITH_FULL_IMAGE:figures/full_fig_p014_11.png] view at source ↗

**Figure 12.** Figure 12: TLSTM1’s performance on CiteULike with and without Time2Vec. [PITH_FULL_IMAGE:figures/full_fig_p015_12.png] view at source ↗

**Figure 13.** Figure 13: TLSTM3’s performance on Last.FM with and without Time2Vec. [PITH_FULL_IMAGE:figures/full_fig_p015_13.png] view at source ↗

**Figure 14.** Figure 14: TLSTM3’s performance on CiteULike with and without Time2Vec. [PITH_FULL_IMAGE:figures/full_fig_p015_14.png] view at source ↗

read the original abstract

Time is an important feature in many applications involving events that occur synchronously and/or asynchronously. To effectively consume time information, recent studies have focused on designing new architectures. In this paper, we take an orthogonal but complementary approach by providing a model-agnostic vector representation for time, called Time2Vec, that can be easily imported into many existing and future architectures and improve their performances. We show on a range of models and problems that replacing the notion of time with its Time2Vec representation improves the performance of the final model.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Time2Vec is a simple, model-agnostic embedding for time that extends positional ideas to irregular or event data, but the abstract leaves the empirical gains unverified.

read the letter

The main thing here is a vector representation for time that can be learned and swapped into existing models instead of using raw timestamps or hand-crafted features. It draws from word2vec-style embeddings and sinusoidal positional encodings but treats time as the input to a small network that produces a fixed-dimensional vector. That orthogonality to architecture changes is the useful part; it lets people keep their current RNNs or transformers and just replace the time input.

Referee Report

1 major / 0 minor

Summary. The paper introduces Time2Vec, a model-agnostic, learnable vector representation of time intended to be plugged into existing architectures for tasks involving synchronous or asynchronous events. The central claim is that replacing standard notions of time with this representation improves final model performance across a range of models and problems.

Significance. If the empirical results hold with proper validation, Time2Vec could serve as a lightweight, reusable component for incorporating temporal structure in machine learning pipelines, complementing architecture-specific innovations in time-series and event modeling.

major comments (1)

Abstract: the central claim of performance improvement is stated without any experimental details, baselines, datasets, quantitative results, error bars, or implementation specifics, preventing verification of the claim that Time2Vec substitution improves performance.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their review. We address the single major comment below.

read point-by-point responses

Referee: Abstract: the central claim of performance improvement is stated without any experimental details, baselines, datasets, quantitative results, error bars, or implementation specifics, preventing verification of the claim that Time2Vec substitution improves performance.

Authors: We agree that the abstract states the central claim at a high level without supporting experimental specifics. While abstracts are necessarily concise, the current wording does not adequately convey the scope of the evaluation. In the revised manuscript we will expand the abstract to include brief references to the models tested, the range of problems considered, and the nature of the observed improvements. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper presents Time2Vec as a learnable, model-agnostic vector embedding for time that is substituted into existing architectures, with performance gains demonstrated empirically across multiple models and tasks. No derivation chain, equations, or load-bearing steps are visible in the provided abstract or description that reduce by construction to fitted parameters, self-definitions, or self-citation chains; the representation is defined independently and validated externally rather than tautologically.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Only abstract available; no free parameters, axioms, or invented entities can be identified from provided text.

pith-pipeline@v0.9.0 · 5646 in / 755 out tokens · 24127 ms · 2026-05-24T23:02:59.238748+00:00 · methodology

discussion (0)

Forward citations

Cited by 16 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

TRAJGANR: Trajectory-Centric Urban Multimodal Learning via Geospatially Aligned Neural Representations
cs.CV 2026-05 unverdicted novelty 7.0

TrajGANR learns continuous neural representations of trajectories to enable fine-grained alignment with street-view images and locations in a joint multimodal self-supervised objective, outperforming prior geospatial ...
NEST: Nested Event Stream Transformer for Sequences of Multisets
cs.LG 2026-01 unverdicted novelty 7.0

NEST is a nested transformer for sequences of multisets that uses masked set modeling to learn improved set-level representations from hierarchical event streams like EHRs.
Temporal Graph Networks for Deep Learning on Dynamic Graphs
cs.LG 2020-06 unverdicted novelty 7.0

Temporal Graph Networks combine memory modules and graph operators to learn on dynamic graphs as timed event sequences, outperforming prior methods on transductive and inductive tasks while unifying earlier models as ...
MILM: Large Language Models for Multimodal Irregular Time Series with Informative Sampling
cs.LG 2026-05 unverdicted novelty 6.0

MILM fine-tunes LLMs on XML-encoded multimodal irregular time series via a two-stage process that exploits informative sampling patterns to achieve top performance on EHR classification datasets.
A Benchmark of Classical and Deep Learning Models for Agricultural Commodity Price Forecasting on A Novel Bangladeshi Market Price Dataset
cs.LG 2026-03 accept novelty 6.0

AgriPriceBD dataset of 1779 daily prices released; naive persistence outperforms deep models like Informer and Time2Vec-Transformer on heterogeneous Bangladeshi commodity series with statistical validation.
A Scalable Nonparametric Continuous-Time Survival Model through Numerical Quadrature
stat.ML 2026-05 unverdicted novelty 5.0

QSurv uses Gauss-Legendre numerical quadrature and time-conditioned low-rank adaptation to enable scalable nonparametric continuous-time survival modeling with theoretical error bounds.
EHR-RAGp: Retrieval-Augmented Prototype-Guided Foundation Model for Electronic Health Records
cs.IR 2026-05 unverdicted novelty 5.0

EHR-RAGp is a retrieval-augmented EHR foundation model that employs prototype-guided retrieval to dynamically integrate relevant historical patient context, outperforming prior models on clinical prediction tasks.
TraXion: Rethinking Pre-training Frameworks for Mobility and Beyond
cs.LG 2026-05 unverdicted novelty 5.0

TraXion supplies a unified pre-training approach for multi-entity spatiotemporal event streams that outperforms task-specific baselines on mobility tasks and transfers unchanged to authentication logs and ICU mortalit...
To Use AI as Dice of Possibilities with Timing Computation
cs.AI 2026-05 unverdicted novelty 5.0

Proposes verb-based paradigm with timing computation to enable data-driven discovery of patient trajectories and counterfactual timing from EHR data without domain knowledge.
A-THENA: Early Intrusion Detection for IoT with Time-Aware Hybrid Encoding and Network-Specific Augmentation
cs.CR 2026-04 unverdicted novelty 5.0

A-THENA improves averaged IoT intrusion detection accuracy by 3.69-6.88 percentage points over baselines on three datasets using time-aware hybrid encoding and network-specific augmentation, with near-zero false alarm...
ARMove: Learning to Predict Human Mobility through Agentic Reasoning
cs.MA 2026-04 unverdicted novelty 5.0

ARMove is a transferable framework for human mobility prediction that combines agentic LLM reasoning, feature management, and large-small model synergy to outperform baselines on several metrics while improving interp...
Representation Before Training: A Fixed-Budget Benchmark for Generative Medical Event Models
cs.LG 2026-04 unverdicted novelty 5.0

Fused code-value tokenization improves mortality AUROC from 0.891 to 0.915 and other clinical outcome predictions, while certain temporal encodings like event order match or exceed time tokens with shorter sequences.
DBGL: Decay-aware Bipartite Graph Learning for Irregular Medical Time Series Classification
cs.LG 2026-04 unverdicted novelty 5.0

DBGL models irregular medical time series via patient-variable bipartite graphs and node-specific temporal decay encoding to avoid artificial alignment and capture decay rates, outperforming baselines on four public datasets.
Capture Timing-Attention of Events in Clinical Time Series
cs.LG 2026-02 unverdicted novelty 5.0

LITT aligns individual clinical event sequences on a relative timeline to enable timing-aware attention and better prediction of personalized health trajectories.
Transformer-Based Wildlife Species Classification from Daily Movement Trajectories
cs.LG 2026-05 unverdicted novelty 4.0

Transformer models classify seven wildlife species from daily GPS trajectories, outperforming LSTM, CNN, and TCN baselines by 8-22 percentage points in balanced accuracy under region-holdout evaluation.
Empirical Assessment of Time-Series Foundation Models For Power System Forecasting Applications
eess.SY 2026-04 unverdicted novelty 4.0

The paper benchmarks foundation models like TimesFM and Chronos against baselines on eight forecasting capabilities for power system time series.

Reference graph

Works this paper leans on

61 extracted references · 61 canonical work pages · cited by 16 Pith papers · 8 internal anchors

[1]

Fitting autoregressive models for prediction.Annals of the institute of Statistical Mathematics, 21(1):243–247, 1969

Hirotugu Akaike. Fitting autoregressive models for prediction.Annals of the institute of Statistical Mathematics, 21(1):243–247, 1969

work page 1969
[2]

Feature representations for neuromorphic audio spike streams

Jithendar Anumula, Daniel Neil, Tobi Delbruck, and Shih-Chii Liu. Feature representations for neuromorphic audio spike streams. Frontiers in neuroscience, 12:23, 2018

work page 2018
[3]

Patient subtyping via time-aware lstm networks

Inci M Baytas, Cao Xiao, Xi Zhang, Fei Wang, Anil K Jain, and Jiayu Zhou. Patient subtyping via time-aware lstm networks. In ACM SIGKDD, pages 65–74, 2017

work page 2017
[4]

Long short-term memory and learning-to-learn in networks of spiking neurons

Guillaume Bellec, Darjan Salaj, Anand Subramoney, Robert Legenstein, and Wolfgang Maass. Long short-term memory and learning-to-learn in networks of spiking neurons. In NeurIPS, 2018

work page 2018
[5]

The F ourier transform and its applications

Ronald Newbold Bracewell and Ronald N Bracewell. The F ourier transform and its applications. McGraw-Hill New York, 1986

work page 1986
[6]

Skip rnn: Learning to skip state updates in recurrent neural networks

Víctor Campos, Brendan Jou, Xavier Giró-i Nieto, Jordi Torres, and Shih-Fu Chang. Skip rnn: Learning to skip state updates in recurrent neural networks. In ICLR, 2018

work page 2018
[7]

O. Celma. Music Recommendation and Discovery in the Long Tail . Springer, 2010

work page 2010
[8]

Neural ordinary differential equations

Tian Qi Chen, Yulia Rubanova, Jesse Bettencourt, and David Duvenaud. Neural ordinary differential equations. In Neural Information Processing Systems (NeurIPS) , 2018

work page 2018
[9]

Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation

Kyunghyun Cho, Bart Van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. Learning phrase representations using rnn encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014
[10]

Doctor AI: Predicting clinical events via recurrent neural networks

Edward Choi, Mohammad Taha Bahadori, Andy Schuetz, Walter F Stewart, and Jimeng Sun. Doctor AI: Predicting clinical events via recurrent neural networks. In Machine Learning for Healthcare Conference, pages 301–318, 2016

work page 2016
[11]

Time-frequency analysis, volume 778

Leon Cohen. Time-frequency analysis, volume 778. Prentice hall, 1995

work page 1995
[12]

An introduction to the theory of point processes: volume II: general theory and structure

Daryl J Daley and David Vere-Jones. An introduction to the theory of point processes: volume II: general theory and structure . Springer Science & Business Media, 2007

work page 2007
[13]

Support vector regression machines

Harris Drucker, Christopher JC Burges, Linda Kaufman, Alex J Smola, and Vladimir Vapnik. Support vector regression machines. In NeurIPS, pages 155–161, 1997

work page 1997
[14]

Recurrent marked temporal point processes: Embedding event history to vector

Nan Du, Hanjun Dai, Rakshit Trivedi, Utkarsh Upadhyay, Manuel Gomez-Rodriguez, and Le Song. Recurrent marked temporal point processes: Embedding event history to vector. In ACM SIGKDD, pages 1555–1564. ACM, 2016

work page 2016
[15]

evt_MNIST: A spike based version of traditional MNIST

Mazdak Fatahi, Mahmood Ahmadi, Mahyar Shahsavari, Arash Ahmadi, and Philippe Devienne. evt_mnist: A spike based version of traditional mnist. arXiv preprint arXiv:1604.06751, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016
[16]

Modeling time series data with deep fourier neural networks

Michael S Gashler and Stephen C Ashmore. Modeling time series data with deep fourier neural networks. Neurocomputing, 188:3–11, 2016

work page 2016
[17]

Convolutional Sequence to Sequence Learning

Jonas Gehring, Michael Auli, David Grangier, Denis Yarats, and Yann N Dauphin. Convolutional sequence to sequence learning. arXiv preprint arXiv:1705.03122, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[18]

Recurrent nets that time and count

Felix A Gers and Jürgen Schmidhuber. Recurrent nets that time and count. In IJCNN, volume 3, pages 189–194. IEEE, 2000

work page 2000
[19]

Taming the waves: sine as activation function in deep neural networks

Tuomas Virtanen Giambattista Parascandolo, Heikki Huttunen. Taming the waves: sine as activation function in deep neural networks. 2017

work page 2017
[20]

Neural decomposition of time-series data for effective generalization

Luke B Godfrey and Michael S Gashler. Neural decomposition of time-series data for effective generalization. IEEE transactions on neural networks and learning systems , 29(7):2973–2985, 2018. 9

work page 2018
[21]

Lstm: A search space odyssey

Klaus Greff, Rupesh K Srivastava, Jan Koutník, Bas R Steunebrink, and Jürgen Schmidhuber. Lstm: A search space odyssey. IEEE transactions on neural networks and learning systems , 28(10):2222–2232, 2017

work page 2017
[22]

node2vec: Scalable feature learning for networks

Aditya Grover and Jure Leskovec. node2vec: Scalable feature learning for networks. In ACM SIGKDD, pages 855–864, 2016

work page 2016
[23]

Long short-term memory

Sepp Hochreiter and Jürgen Schmidhuber. Long short-term memory. Neural computation, 9(8):1735–1780, 1997

work page 1997
[24]

State-frequency memory recurrent neural networks

Hao Hu and Guo-Jun Qi. State-frequency memory recurrent neural networks. In International Conference on Machine Learning, pages 1568–1577, 2017

work page 2017
[25]

SimplE embedding for link prediction in knowledge graphs

Seyed Mehran Kazemi and David Poole. SimplE embedding for link prediction in knowledge graphs. In NeurIPS, pages 4289–4300, 2018

work page 2018
[26]

Relational representation learning for dynamic (knowledge) graphs: A survey

Seyed Mehran Kazemi, Rishab Goel, Kshitij Jain, Ivan Kobyzev, Akshay Sethi, Peter Forsyth, and Pascal Poupart. Relational representation learning for dynamic (knowledge) graphs: A survey. arXiv preprint arXiv:1905.11485, 2019

work page arXiv 1905
[27]

Adam: A Method for Stochastic Optimization

Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014
[28]

Learning Dynamic Embeddings from Temporal Interactions

Srijan Kumar, Xikun Zhang, and Jure Leskovec. Learning dynamic embedding from temporal interaction networks. arXiv preprint arXiv:1812.02289, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[29]

Retainvis: Visual analytics with interpretable and interactive recurrent neural networks on electronic medical records

Bum Chul Kwon, Min-Je Choi, Joanne Taery Kim, Edward Choi, Young Bin Kim, Soonwook Kwon, Jimeng Sun, and Jaegul Choo. Retainvis: Visual analytics with interpretable and interactive recurrent neural networks on electronic medical records. IEEE transactions on visualization and computer graphics, 25(1):299–309, 2019

work page 2019
[30]

Nonlinear signal processing using neural networks: Prediction and system modelling

Alan Lapedes and Robert Farber. Nonlinear signal processing using neural networks: Prediction and system modelling. Technical report, 1987

work page 1987
[31]

Hawkes Processes

Patrick J Laub, Thomas Taimre, and Philip K Pollett. Hawkes processes. arXiv preprint arXiv:1507.02822, 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015
[32]

Tidigits

R Gary Leonard and George Doddington. Tidigits. Linguistic Data Consortium, Philadelphia , 1993

work page 1993
[33]

Time-Dependent Representation for Neural Event Sequence Prediction

Yang Li, Nan Du, and Samy Bengio. Time-dependent representation for neural event sequence prediction. arXiv preprint arXiv:1708.00065, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[34]

Learning temporal point processes via reinforcement learning

Shuang Li, Shuai Xiao, Shixiang Zhu, Nan Du, Yao Xie, and Le Song. Learning temporal point processes via reinforcement learning. In NeurIPS, pages 10804–10814, 2018

work page 2018
[35]

Time-dependent representation for neural event sequence prediction

Yang Li, Nan Du, and Samy Bengio. Time-dependent representation for neural event sequence prediction. 2018

work page 2018
[36]

Directly modeling missing data in sequences with rnns: Improved classiﬁcation of clinical time series

Zachary C Lipton, David Kale, and Randall Wetzel. Directly modeling missing data in sequences with rnns: Improved classiﬁcation of clinical time series. In Machine Learning for Healthcare Conference, pages 253–270, 2016

work page 2016
[37]

Multistability of recurrent neural networks with nonmonotonic activation functions and mixed time delays

Peng Liu, Zhigang Zeng, and Jun Wang. Multistability of recurrent neural networks with nonmonotonic activation functions and mixed time delays. IEEE Transactions on Systems, Man, and Cybernetics: Systems , 46(4):512–523, 2016

work page 2016
[38]

Streaming Graph Neural Networks

Yao Ma, Ziyi Guo, Zhaochun Ren, Eric Zhao, Jiliang Tang, and Dawei Yin. Streaming graph neural networks. arXiv preprint arXiv:1810.10627, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[39]

The neural hawkes process: A neurally self-modulating multivariate point process

Hongyuan Mei and Jason M Eisner. The neural hawkes process: A neurally self-modulating multivariate point process. In NeurIPS, pages 6754–6764, 2017

work page 2017
[40]

Distributed repre- sentations of words and phrases and their compositionality

Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. Distributed repre- sentations of words and phrases and their compositionality. In NeurIPS, 2013. 10

work page 2013
[41]

Fourier neural networks: An approach with sinusoidal activation functions

Luis Mingo, Levon Aslanyan, Juan Castellanos, Miguel Diaz, and Vladimir Riazanov. Fourier neural networks: An approach with sinusoidal activation functions. 2004

work page 2004
[42]

Dynamic bayesian networks: representation, inference and learning

Kevin Patrick Murphy and Stuart Russell. Dynamic bayesian networks: representation, inference and learning. 2002

work page 2002
[43]

Rectiﬁed linear units improve restricted boltzmann machines

Vinod Nair and Geoffrey E Hinton. Rectiﬁed linear units improve restricted boltzmann machines. In ICML, pages 807–814, 2010

work page 2010
[44]

Phased lstm: Accelerating recurrent network training for long or event-based sequences

Daniel Neil, Michael Pfeiffer, and Shih-Chii Liu. Phased lstm: Accelerating recurrent network training for long or event-based sequences. In NeurIPS, pages 3882–3890, 2016

work page 2016
[45]

The role of over-parametrization in generalization of neural networks

Behnam Neyshabur, Zhiyuan Li, Srinadh Bhojanapalli, Yann LeCun, and Nathan Srebro. The role of over-parametrization in generalization of neural networks. In ICLR, 2019

work page 2019
[46]

A review of relational machine learning for knowledge graphs

Maximilian Nickel, Kevin Murphy, V olker Tresp, and Evgeniy Gabrilovich. A review of relational machine learning for knowledge graphs. Proceedings of the IEEE, 104(1):11–33, 2016

work page 2016
[47]

Automatic differentiation in pytorch

Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. Automatic differentiation in pytorch. 2017

work page 2017
[48]

Glove: Global vectors for word representation

Jeffrey Pennington, Richard Socher, and Christopher Manning. Glove: Global vectors for word representation. In EMNLP, pages 1532–1543, 2014

work page 2014
[49]

Population size extrapolation in relational probabilistic modelling

David Poole, David Buchman, Seyed Mehran Kazemi, Kristian Kersting, and Sriraam Natarajan. Population size extrapolation in relational probabilistic modelling. In SUM. Springer, 2014

work page 2014
[50]

An introduction to hidden markov models

Lawrence R Rabiner and Biing-Hwang Juang. An introduction to hidden markov models. ieee assp magazine, 3(1):4–16, 1986

work page 1986
[51]

Gaussian processes in machine learning

Carl Edward Rasmussen. Gaussian processes in machine learning. In Advanced lectures on machine learning, pages 63–71. Springer, 2004

work page 2004
[52]

Neural networks with periodic and monotonic activation functions: a comparative study in classiﬁcation problems

Josep M Sopena, Enrique Romero, and Rene Alquezar. Neural networks with periodic and monotonic activation functions: a comparative study in classiﬁcation problems. 1999

work page 1999
[53]

Dynamic conditional random ﬁelds: Factorized probabilistic models for labeling and segmenting sequence data.Journal of Machine Learning Research, 8(Mar):693–723, 2007

Charles Sutton, Andrew McCallum, and Khashayar Rohanimanesh. Dynamic conditional random ﬁelds: Factorized probabilistic models for labeling and segmenting sequence data.Journal of Machine Learning Research, 8(Mar):693–723, 2007

work page 2007
[54]

Can recurrent neural networks warp time? In International Conference on Learning Representation (ICLR) , 2018

Corentin Tallec and Yann Ollivier. Can recurrent neural networks warp time? In International Conference on Learning Representation (ICLR) , 2018

work page 2018
[55]

Know-evolve: Deep temporal reasoning for dynamic knowledge graphs

Rakshit Trivedi, Hanjun Dai, Yichen Wang, and Le Song. Know-evolve: Deep temporal reasoning for dynamic knowledge graphs. In ICML, pages 3462–3471, 2017

work page 2017
[56]

Deep reinforcement learning of marked temporal point processes

Utkarsh Upadhyay, Abir De, and Manuel Gomez-Rodriguez. Deep reinforcement learning of marked temporal point processes. In NeurIPS, 2018

work page 2018
[57]

Attention is all you need

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need. In NeurIPS, 2017

work page 2017
[58]

Handwritten digit recognition using multilayer feedforward neural networks with periodic and monotonic activation functions

Kwok-wo Wong, Chi-sing Leung, and Sheng-jiang Chang. Handwritten digit recognition using multilayer feedforward neural networks with periodic and monotonic activation functions. In Pattern Recognition, volume 3, pages 106–109. IEEE, 2002

work page 2002
[59]

Wasserstein learning of deep generative point process models

Shuai Xiao, Mehrdad Farajtabar, Xiaojing Ye, Junchi Yan, Le Song, and Hongyuan Zha. Wasserstein learning of deep generative point process models. In NeurIPS, 2017

work page 2017
[60]

Learning conditional generative models for temporal point processes

Shuai Xiao, Hongteng Xu, Junchi Yan, Mehrdad Farajtabar, Xiaokang Yang, Le Song, and Hongyuan Zha. Learning conditional generative models for temporal point processes. In AAAI, 2018

work page 2018
[61]

What to do next: Modeling user behaviors by time-lstm

Yu Zhu, Hao Li, Yikang Liao, Beidou Wang, Ziyu Guan, Haifeng Liu, and Deng Cai. What to do next: Modeling user behaviors by time-lstm. In IJCAI, pages 3602–3608, 2017. 11 0 200 400 600 800 1000 Epoch 0.10 0.15 0.20 0.25 0.30 0.35 0.40Accuracy LSTM+T LSTM+Time2Vec(l=16+1) LSTM+Time2Vec(l=32+1) LSTM+Time2Vec(l=64+1) Figure 6: Comparing LSTM+T and LSTM+Time2...

work page 2017

[1] [1]

Fitting autoregressive models for prediction.Annals of the institute of Statistical Mathematics, 21(1):243–247, 1969

Hirotugu Akaike. Fitting autoregressive models for prediction.Annals of the institute of Statistical Mathematics, 21(1):243–247, 1969

work page 1969

[2] [2]

Feature representations for neuromorphic audio spike streams

Jithendar Anumula, Daniel Neil, Tobi Delbruck, and Shih-Chii Liu. Feature representations for neuromorphic audio spike streams. Frontiers in neuroscience, 12:23, 2018

work page 2018

[3] [3]

Patient subtyping via time-aware lstm networks

Inci M Baytas, Cao Xiao, Xi Zhang, Fei Wang, Anil K Jain, and Jiayu Zhou. Patient subtyping via time-aware lstm networks. In ACM SIGKDD, pages 65–74, 2017

work page 2017

[4] [4]

Long short-term memory and learning-to-learn in networks of spiking neurons

Guillaume Bellec, Darjan Salaj, Anand Subramoney, Robert Legenstein, and Wolfgang Maass. Long short-term memory and learning-to-learn in networks of spiking neurons. In NeurIPS, 2018

work page 2018

[5] [5]

The F ourier transform and its applications

Ronald Newbold Bracewell and Ronald N Bracewell. The F ourier transform and its applications. McGraw-Hill New York, 1986

work page 1986

[6] [6]

Skip rnn: Learning to skip state updates in recurrent neural networks

Víctor Campos, Brendan Jou, Xavier Giró-i Nieto, Jordi Torres, and Shih-Fu Chang. Skip rnn: Learning to skip state updates in recurrent neural networks. In ICLR, 2018

work page 2018

[7] [7]

O. Celma. Music Recommendation and Discovery in the Long Tail . Springer, 2010

work page 2010

[8] [8]

Neural ordinary differential equations

Tian Qi Chen, Yulia Rubanova, Jesse Bettencourt, and David Duvenaud. Neural ordinary differential equations. In Neural Information Processing Systems (NeurIPS) , 2018

work page 2018

[9] [9]

Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation

Kyunghyun Cho, Bart Van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. Learning phrase representations using rnn encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014

[10] [10]

Doctor AI: Predicting clinical events via recurrent neural networks

Edward Choi, Mohammad Taha Bahadori, Andy Schuetz, Walter F Stewart, and Jimeng Sun. Doctor AI: Predicting clinical events via recurrent neural networks. In Machine Learning for Healthcare Conference, pages 301–318, 2016

work page 2016

[11] [11]

Time-frequency analysis, volume 778

Leon Cohen. Time-frequency analysis, volume 778. Prentice hall, 1995

work page 1995

[12] [12]

An introduction to the theory of point processes: volume II: general theory and structure

Daryl J Daley and David Vere-Jones. An introduction to the theory of point processes: volume II: general theory and structure . Springer Science & Business Media, 2007

work page 2007

[13] [13]

Support vector regression machines

Harris Drucker, Christopher JC Burges, Linda Kaufman, Alex J Smola, and Vladimir Vapnik. Support vector regression machines. In NeurIPS, pages 155–161, 1997

work page 1997

[14] [14]

Recurrent marked temporal point processes: Embedding event history to vector

Nan Du, Hanjun Dai, Rakshit Trivedi, Utkarsh Upadhyay, Manuel Gomez-Rodriguez, and Le Song. Recurrent marked temporal point processes: Embedding event history to vector. In ACM SIGKDD, pages 1555–1564. ACM, 2016

work page 2016

[15] [15]

evt_MNIST: A spike based version of traditional MNIST

Mazdak Fatahi, Mahmood Ahmadi, Mahyar Shahsavari, Arash Ahmadi, and Philippe Devienne. evt_mnist: A spike based version of traditional mnist. arXiv preprint arXiv:1604.06751, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016

[16] [16]

Modeling time series data with deep fourier neural networks

Michael S Gashler and Stephen C Ashmore. Modeling time series data with deep fourier neural networks. Neurocomputing, 188:3–11, 2016

work page 2016

[17] [17]

Convolutional Sequence to Sequence Learning

Jonas Gehring, Michael Auli, David Grangier, Denis Yarats, and Yann N Dauphin. Convolutional sequence to sequence learning. arXiv preprint arXiv:1705.03122, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[18] [18]

Recurrent nets that time and count

Felix A Gers and Jürgen Schmidhuber. Recurrent nets that time and count. In IJCNN, volume 3, pages 189–194. IEEE, 2000

work page 2000

[19] [19]

Taming the waves: sine as activation function in deep neural networks

Tuomas Virtanen Giambattista Parascandolo, Heikki Huttunen. Taming the waves: sine as activation function in deep neural networks. 2017

work page 2017

[20] [20]

Neural decomposition of time-series data for effective generalization

Luke B Godfrey and Michael S Gashler. Neural decomposition of time-series data for effective generalization. IEEE transactions on neural networks and learning systems , 29(7):2973–2985, 2018. 9

work page 2018

[21] [21]

Lstm: A search space odyssey

Klaus Greff, Rupesh K Srivastava, Jan Koutník, Bas R Steunebrink, and Jürgen Schmidhuber. Lstm: A search space odyssey. IEEE transactions on neural networks and learning systems , 28(10):2222–2232, 2017

work page 2017

[22] [22]

node2vec: Scalable feature learning for networks

Aditya Grover and Jure Leskovec. node2vec: Scalable feature learning for networks. In ACM SIGKDD, pages 855–864, 2016

work page 2016

[23] [23]

Long short-term memory

Sepp Hochreiter and Jürgen Schmidhuber. Long short-term memory. Neural computation, 9(8):1735–1780, 1997

work page 1997

[24] [24]

State-frequency memory recurrent neural networks

Hao Hu and Guo-Jun Qi. State-frequency memory recurrent neural networks. In International Conference on Machine Learning, pages 1568–1577, 2017

work page 2017

[25] [25]

SimplE embedding for link prediction in knowledge graphs

Seyed Mehran Kazemi and David Poole. SimplE embedding for link prediction in knowledge graphs. In NeurIPS, pages 4289–4300, 2018

work page 2018

[26] [26]

Relational representation learning for dynamic (knowledge) graphs: A survey

Seyed Mehran Kazemi, Rishab Goel, Kshitij Jain, Ivan Kobyzev, Akshay Sethi, Peter Forsyth, and Pascal Poupart. Relational representation learning for dynamic (knowledge) graphs: A survey. arXiv preprint arXiv:1905.11485, 2019

work page arXiv 1905

[27] [27]

Adam: A Method for Stochastic Optimization

Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014

[28] [28]

Learning Dynamic Embeddings from Temporal Interactions

Srijan Kumar, Xikun Zhang, and Jure Leskovec. Learning dynamic embedding from temporal interaction networks. arXiv preprint arXiv:1812.02289, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[29] [29]

Retainvis: Visual analytics with interpretable and interactive recurrent neural networks on electronic medical records

Bum Chul Kwon, Min-Je Choi, Joanne Taery Kim, Edward Choi, Young Bin Kim, Soonwook Kwon, Jimeng Sun, and Jaegul Choo. Retainvis: Visual analytics with interpretable and interactive recurrent neural networks on electronic medical records. IEEE transactions on visualization and computer graphics, 25(1):299–309, 2019

work page 2019

[30] [30]

Nonlinear signal processing using neural networks: Prediction and system modelling

Alan Lapedes and Robert Farber. Nonlinear signal processing using neural networks: Prediction and system modelling. Technical report, 1987

work page 1987

[31] [31]

Hawkes Processes

Patrick J Laub, Thomas Taimre, and Philip K Pollett. Hawkes processes. arXiv preprint arXiv:1507.02822, 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015

[32] [32]

Tidigits

R Gary Leonard and George Doddington. Tidigits. Linguistic Data Consortium, Philadelphia , 1993

work page 1993

[33] [33]

Time-Dependent Representation for Neural Event Sequence Prediction

Yang Li, Nan Du, and Samy Bengio. Time-dependent representation for neural event sequence prediction. arXiv preprint arXiv:1708.00065, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[34] [34]

Learning temporal point processes via reinforcement learning

Shuang Li, Shuai Xiao, Shixiang Zhu, Nan Du, Yao Xie, and Le Song. Learning temporal point processes via reinforcement learning. In NeurIPS, pages 10804–10814, 2018

work page 2018

[35] [35]

Time-dependent representation for neural event sequence prediction

Yang Li, Nan Du, and Samy Bengio. Time-dependent representation for neural event sequence prediction. 2018

work page 2018

[36] [36]

Directly modeling missing data in sequences with rnns: Improved classiﬁcation of clinical time series

Zachary C Lipton, David Kale, and Randall Wetzel. Directly modeling missing data in sequences with rnns: Improved classiﬁcation of clinical time series. In Machine Learning for Healthcare Conference, pages 253–270, 2016

work page 2016

[37] [37]

Multistability of recurrent neural networks with nonmonotonic activation functions and mixed time delays

Peng Liu, Zhigang Zeng, and Jun Wang. Multistability of recurrent neural networks with nonmonotonic activation functions and mixed time delays. IEEE Transactions on Systems, Man, and Cybernetics: Systems , 46(4):512–523, 2016

work page 2016

[38] [38]

Streaming Graph Neural Networks

Yao Ma, Ziyi Guo, Zhaochun Ren, Eric Zhao, Jiliang Tang, and Dawei Yin. Streaming graph neural networks. arXiv preprint arXiv:1810.10627, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[39] [39]

The neural hawkes process: A neurally self-modulating multivariate point process

Hongyuan Mei and Jason M Eisner. The neural hawkes process: A neurally self-modulating multivariate point process. In NeurIPS, pages 6754–6764, 2017

work page 2017

[40] [40]

Distributed repre- sentations of words and phrases and their compositionality

Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. Distributed repre- sentations of words and phrases and their compositionality. In NeurIPS, 2013. 10

work page 2013

[41] [41]

Fourier neural networks: An approach with sinusoidal activation functions

Luis Mingo, Levon Aslanyan, Juan Castellanos, Miguel Diaz, and Vladimir Riazanov. Fourier neural networks: An approach with sinusoidal activation functions. 2004

work page 2004

[42] [42]

Dynamic bayesian networks: representation, inference and learning

Kevin Patrick Murphy and Stuart Russell. Dynamic bayesian networks: representation, inference and learning. 2002

work page 2002

[43] [43]

Rectiﬁed linear units improve restricted boltzmann machines

Vinod Nair and Geoffrey E Hinton. Rectiﬁed linear units improve restricted boltzmann machines. In ICML, pages 807–814, 2010

work page 2010

[44] [44]

Phased lstm: Accelerating recurrent network training for long or event-based sequences

Daniel Neil, Michael Pfeiffer, and Shih-Chii Liu. Phased lstm: Accelerating recurrent network training for long or event-based sequences. In NeurIPS, pages 3882–3890, 2016

work page 2016

[45] [45]

The role of over-parametrization in generalization of neural networks

Behnam Neyshabur, Zhiyuan Li, Srinadh Bhojanapalli, Yann LeCun, and Nathan Srebro. The role of over-parametrization in generalization of neural networks. In ICLR, 2019

work page 2019

[46] [46]

A review of relational machine learning for knowledge graphs

Maximilian Nickel, Kevin Murphy, V olker Tresp, and Evgeniy Gabrilovich. A review of relational machine learning for knowledge graphs. Proceedings of the IEEE, 104(1):11–33, 2016

work page 2016

[47] [47]

Automatic differentiation in pytorch

Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. Automatic differentiation in pytorch. 2017

work page 2017

[48] [48]

Glove: Global vectors for word representation

Jeffrey Pennington, Richard Socher, and Christopher Manning. Glove: Global vectors for word representation. In EMNLP, pages 1532–1543, 2014

work page 2014

[49] [49]

Population size extrapolation in relational probabilistic modelling

David Poole, David Buchman, Seyed Mehran Kazemi, Kristian Kersting, and Sriraam Natarajan. Population size extrapolation in relational probabilistic modelling. In SUM. Springer, 2014

work page 2014

[50] [50]

An introduction to hidden markov models

Lawrence R Rabiner and Biing-Hwang Juang. An introduction to hidden markov models. ieee assp magazine, 3(1):4–16, 1986

work page 1986

[51] [51]

Gaussian processes in machine learning

Carl Edward Rasmussen. Gaussian processes in machine learning. In Advanced lectures on machine learning, pages 63–71. Springer, 2004

work page 2004

[52] [52]

Neural networks with periodic and monotonic activation functions: a comparative study in classiﬁcation problems

Josep M Sopena, Enrique Romero, and Rene Alquezar. Neural networks with periodic and monotonic activation functions: a comparative study in classiﬁcation problems. 1999

work page 1999

[53] [53]

Dynamic conditional random ﬁelds: Factorized probabilistic models for labeling and segmenting sequence data.Journal of Machine Learning Research, 8(Mar):693–723, 2007

Charles Sutton, Andrew McCallum, and Khashayar Rohanimanesh. Dynamic conditional random ﬁelds: Factorized probabilistic models for labeling and segmenting sequence data.Journal of Machine Learning Research, 8(Mar):693–723, 2007

work page 2007

[54] [54]

Can recurrent neural networks warp time? In International Conference on Learning Representation (ICLR) , 2018

Corentin Tallec and Yann Ollivier. Can recurrent neural networks warp time? In International Conference on Learning Representation (ICLR) , 2018

work page 2018

[55] [55]

Know-evolve: Deep temporal reasoning for dynamic knowledge graphs

Rakshit Trivedi, Hanjun Dai, Yichen Wang, and Le Song. Know-evolve: Deep temporal reasoning for dynamic knowledge graphs. In ICML, pages 3462–3471, 2017

work page 2017

[56] [56]

Deep reinforcement learning of marked temporal point processes

Utkarsh Upadhyay, Abir De, and Manuel Gomez-Rodriguez. Deep reinforcement learning of marked temporal point processes. In NeurIPS, 2018

work page 2018

[57] [57]

Attention is all you need

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need. In NeurIPS, 2017

work page 2017

[58] [58]

Handwritten digit recognition using multilayer feedforward neural networks with periodic and monotonic activation functions

Kwok-wo Wong, Chi-sing Leung, and Sheng-jiang Chang. Handwritten digit recognition using multilayer feedforward neural networks with periodic and monotonic activation functions. In Pattern Recognition, volume 3, pages 106–109. IEEE, 2002

work page 2002

[59] [59]

Wasserstein learning of deep generative point process models

Shuai Xiao, Mehrdad Farajtabar, Xiaojing Ye, Junchi Yan, Le Song, and Hongyuan Zha. Wasserstein learning of deep generative point process models. In NeurIPS, 2017

work page 2017

[60] [60]

Learning conditional generative models for temporal point processes

Shuai Xiao, Hongteng Xu, Junchi Yan, Mehrdad Farajtabar, Xiaokang Yang, Le Song, and Hongyuan Zha. Learning conditional generative models for temporal point processes. In AAAI, 2018

work page 2018

[61] [61]

What to do next: Modeling user behaviors by time-lstm

Yu Zhu, Hao Li, Yikang Liao, Beidou Wang, Ziyu Guan, Haifeng Liu, and Deng Cai. What to do next: Modeling user behaviors by time-lstm. In IJCAI, pages 3602–3608, 2017. 11 0 200 400 600 800 1000 Epoch 0.10 0.15 0.20 0.25 0.30 0.35 0.40Accuracy LSTM+T LSTM+Time2Vec(l=16+1) LSTM+Time2Vec(l=32+1) LSTM+Time2Vec(l=64+1) Figure 6: Comparing LSTM+T and LSTM+Time2...

work page 2017