pith. sign in

arxiv: 1907.05321 · v1 · pith:K46YEPRKnew · submitted 2019-07-11 · 💻 cs.LG

Time2Vec: Learning a Vector Representation of Time

Pith reviewed 2026-05-24 23:02 UTC · model grok-4.3

classification 💻 cs.LG
keywords time representationtemporal embeddingmodel-agnosticevent modelingneural networksperiodic features
0
0 comments X

The pith

Time2Vec replaces raw time inputs with a learned vector that improves performance when added to existing models.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces Time2Vec as a model-agnostic vector representation of time. It is designed to be plugged into many different neural architectures without changing their core structure. The authors demonstrate that swapping standard time features for this representation raises accuracy or other performance metrics on a range of temporal tasks. The approach focuses on capturing both periodic and non-periodic aspects of time in a single trainable vector. Because the method is orthogonal to architecture design, it can be applied to models that already handle synchronous or asynchronous events.

Core claim

The paper claims that replacing the notion of time with its Time2Vec representation improves the performance of the final model on multiple problems and architectures.

What carries the argument

Time2Vec, a trainable vector representation of time that combines periodic and non-periodic components and can be imported into existing models.

If this is right

  • Existing sequence or event models can incorporate Time2Vec without redesigning their layers.
  • Performance gains appear on both synchronous and asynchronous event data.
  • The representation works as a drop-in replacement for conventional time encodings.
  • The same vector can be reused across different downstream tasks once learned.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The vector form might allow transfer of temporal knowledge between unrelated domains if the same Time2Vec module is shared.
  • If the periodic components are fixed rather than learned, the method could become fully parameter-free for certain periodicities.
  • The approach suggests testing whether similar vector encodings help in non-neural models such as decision trees or linear regressors on time-stamped data.

Load-bearing premise

A single learned vector form of time can be used across different models and problems to capture temporal information better than standard time features.

What would settle it

A controlled experiment in which the same set of models and temporal datasets are run once with raw time inputs and once with Time2Vec inputs, and the latter shows no consistent gain or shows loss.

Figures

Figures reproduced from arXiv: 1907.05321 by Cathal Smyth, Janahan Ramanan, Jaspreet Sahota, Marcus Brubaker, Pascal Poupart, Rishab Goel, Sanjay Thakur, Sepehr Eghbali, Seyed Mehran Kazemi, Stella Wu.

Figure 1
Figure 1. Figure 1: Comparing LSTM+T and LSTM+Time2Vec on several datasets. [PITH_FULL_IMAGE:figures/full_fig_p005_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Comparing TLSTM1 and TLSTM3 on Last.FM and CiteULike in terms of Recall@10 [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: The models learned for our synthesized dataset explained in Subsection 5.2 before the final [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: (a) Initial vs. (b) learned weights and frequencies for our synthesized dataset. [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: An ablation study of several components in Time2Vec. (a) Comparing different activa [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Comparing LSTM+T and LSTM+Time2Vec on Event-MNIST. [PITH_FULL_IMAGE:figures/full_fig_p012_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Comparing LSTM+T and LSTM+Time2Vec on Event-MNIST and raw N_TIDIGITS18. [PITH_FULL_IMAGE:figures/full_fig_p012_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Comparing LSTM+T and LSTM+Time2Vec on SOF. [PITH_FULL_IMAGE:figures/full_fig_p013_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Comparing LSTM+T and LSTM+Time2Vec on Last.FM. [PITH_FULL_IMAGE:figures/full_fig_p013_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Comparing LSTM+T and LSTM+Time2Vec on CiteULike. [PITH_FULL_IMAGE:figures/full_fig_p014_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: TLSTM1’s performance on Last.FM with and without Time2Vec. [PITH_FULL_IMAGE:figures/full_fig_p014_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: TLSTM1’s performance on CiteULike with and without Time2Vec. [PITH_FULL_IMAGE:figures/full_fig_p015_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: TLSTM3’s performance on Last.FM with and without Time2Vec. [PITH_FULL_IMAGE:figures/full_fig_p015_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: TLSTM3’s performance on CiteULike with and without Time2Vec. [PITH_FULL_IMAGE:figures/full_fig_p015_14.png] view at source ↗
read the original abstract

Time is an important feature in many applications involving events that occur synchronously and/or asynchronously. To effectively consume time information, recent studies have focused on designing new architectures. In this paper, we take an orthogonal but complementary approach by providing a model-agnostic vector representation for time, called Time2Vec, that can be easily imported into many existing and future architectures and improve their performances. We show on a range of models and problems that replacing the notion of time with its Time2Vec representation improves the performance of the final model.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 0 minor

Summary. The paper introduces Time2Vec, a model-agnostic, learnable vector representation of time intended to be plugged into existing architectures for tasks involving synchronous or asynchronous events. The central claim is that replacing standard notions of time with this representation improves final model performance across a range of models and problems.

Significance. If the empirical results hold with proper validation, Time2Vec could serve as a lightweight, reusable component for incorporating temporal structure in machine learning pipelines, complementing architecture-specific innovations in time-series and event modeling.

major comments (1)
  1. Abstract: the central claim of performance improvement is stated without any experimental details, baselines, datasets, quantitative results, error bars, or implementation specifics, preventing verification of the claim that Time2Vec substitution improves performance.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their review. We address the single major comment below.

read point-by-point responses
  1. Referee: Abstract: the central claim of performance improvement is stated without any experimental details, baselines, datasets, quantitative results, error bars, or implementation specifics, preventing verification of the claim that Time2Vec substitution improves performance.

    Authors: We agree that the abstract states the central claim at a high level without supporting experimental specifics. While abstracts are necessarily concise, the current wording does not adequately convey the scope of the evaluation. In the revised manuscript we will expand the abstract to include brief references to the models tested, the range of problems considered, and the nature of the observed improvements. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper presents Time2Vec as a learnable, model-agnostic vector embedding for time that is substituted into existing architectures, with performance gains demonstrated empirically across multiple models and tasks. No derivation chain, equations, or load-bearing steps are visible in the provided abstract or description that reduce by construction to fitted parameters, self-definitions, or self-citation chains; the representation is defined independently and validated externally rather than tautologically.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Only abstract available; no free parameters, axioms, or invented entities can be identified from provided text.

pith-pipeline@v0.9.0 · 5646 in / 755 out tokens · 24127 ms · 2026-05-24T23:02:59.238748+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 16 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. TRAJGANR: Trajectory-Centric Urban Multimodal Learning via Geospatially Aligned Neural Representations

    cs.CV 2026-05 unverdicted novelty 7.0

    TrajGANR learns continuous neural representations of trajectories to enable fine-grained alignment with street-view images and locations in a joint multimodal self-supervised objective, outperforming prior geospatial ...

  2. NEST: Nested Event Stream Transformer for Sequences of Multisets

    cs.LG 2026-01 unverdicted novelty 7.0

    NEST is a nested transformer for sequences of multisets that uses masked set modeling to learn improved set-level representations from hierarchical event streams like EHRs.

  3. Temporal Graph Networks for Deep Learning on Dynamic Graphs

    cs.LG 2020-06 unverdicted novelty 7.0

    Temporal Graph Networks combine memory modules and graph operators to learn on dynamic graphs as timed event sequences, outperforming prior methods on transductive and inductive tasks while unifying earlier models as ...

  4. MILM: Large Language Models for Multimodal Irregular Time Series with Informative Sampling

    cs.LG 2026-05 unverdicted novelty 6.0

    MILM fine-tunes LLMs on XML-encoded multimodal irregular time series via a two-stage process that exploits informative sampling patterns to achieve top performance on EHR classification datasets.

  5. A Benchmark of Classical and Deep Learning Models for Agricultural Commodity Price Forecasting on A Novel Bangladeshi Market Price Dataset

    cs.LG 2026-03 accept novelty 6.0

    AgriPriceBD dataset of 1779 daily prices released; naive persistence outperforms deep models like Informer and Time2Vec-Transformer on heterogeneous Bangladeshi commodity series with statistical validation.

  6. A Scalable Nonparametric Continuous-Time Survival Model through Numerical Quadrature

    stat.ML 2026-05 unverdicted novelty 5.0

    QSurv uses Gauss-Legendre numerical quadrature and time-conditioned low-rank adaptation to enable scalable nonparametric continuous-time survival modeling with theoretical error bounds.

  7. EHR-RAGp: Retrieval-Augmented Prototype-Guided Foundation Model for Electronic Health Records

    cs.IR 2026-05 unverdicted novelty 5.0

    EHR-RAGp is a retrieval-augmented EHR foundation model that employs prototype-guided retrieval to dynamically integrate relevant historical patient context, outperforming prior models on clinical prediction tasks.

  8. TraXion: Rethinking Pre-training Frameworks for Mobility and Beyond

    cs.LG 2026-05 unverdicted novelty 5.0

    TraXion supplies a unified pre-training approach for multi-entity spatiotemporal event streams that outperforms task-specific baselines on mobility tasks and transfers unchanged to authentication logs and ICU mortalit...

  9. To Use AI as Dice of Possibilities with Timing Computation

    cs.AI 2026-05 unverdicted novelty 5.0

    Proposes verb-based paradigm with timing computation to enable data-driven discovery of patient trajectories and counterfactual timing from EHR data without domain knowledge.

  10. A-THENA: Early Intrusion Detection for IoT with Time-Aware Hybrid Encoding and Network-Specific Augmentation

    cs.CR 2026-04 unverdicted novelty 5.0

    A-THENA improves averaged IoT intrusion detection accuracy by 3.69-6.88 percentage points over baselines on three datasets using time-aware hybrid encoding and network-specific augmentation, with near-zero false alarm...

  11. ARMove: Learning to Predict Human Mobility through Agentic Reasoning

    cs.MA 2026-04 unverdicted novelty 5.0

    ARMove is a transferable framework for human mobility prediction that combines agentic LLM reasoning, feature management, and large-small model synergy to outperform baselines on several metrics while improving interp...

  12. Representation Before Training: A Fixed-Budget Benchmark for Generative Medical Event Models

    cs.LG 2026-04 unverdicted novelty 5.0

    Fused code-value tokenization improves mortality AUROC from 0.891 to 0.915 and other clinical outcome predictions, while certain temporal encodings like event order match or exceed time tokens with shorter sequences.

  13. DBGL: Decay-aware Bipartite Graph Learning for Irregular Medical Time Series Classification

    cs.LG 2026-04 unverdicted novelty 5.0

    DBGL models irregular medical time series via patient-variable bipartite graphs and node-specific temporal decay encoding to avoid artificial alignment and capture decay rates, outperforming baselines on four public datasets.

  14. Capture Timing-Attention of Events in Clinical Time Series

    cs.LG 2026-02 unverdicted novelty 5.0

    LITT aligns individual clinical event sequences on a relative timeline to enable timing-aware attention and better prediction of personalized health trajectories.

  15. Transformer-Based Wildlife Species Classification from Daily Movement Trajectories

    cs.LG 2026-05 unverdicted novelty 4.0

    Transformer models classify seven wildlife species from daily GPS trajectories, outperforming LSTM, CNN, and TCN baselines by 8-22 percentage points in balanced accuracy under region-holdout evaluation.

  16. Empirical Assessment of Time-Series Foundation Models For Power System Forecasting Applications

    eess.SY 2026-04 unverdicted novelty 4.0

    The paper benchmarks foundation models like TimesFM and Chronos against baselines on eight forecasting capabilities for power system time series.

Reference graph

Works this paper leans on

61 extracted references · 61 canonical work pages · cited by 16 Pith papers · 8 internal anchors

  1. [1]

    Fitting autoregressive models for prediction.Annals of the institute of Statistical Mathematics, 21(1):243–247, 1969

    Hirotugu Akaike. Fitting autoregressive models for prediction.Annals of the institute of Statistical Mathematics, 21(1):243–247, 1969

  2. [2]

    Feature representations for neuromorphic audio spike streams

    Jithendar Anumula, Daniel Neil, Tobi Delbruck, and Shih-Chii Liu. Feature representations for neuromorphic audio spike streams. Frontiers in neuroscience, 12:23, 2018

  3. [3]

    Patient subtyping via time-aware lstm networks

    Inci M Baytas, Cao Xiao, Xi Zhang, Fei Wang, Anil K Jain, and Jiayu Zhou. Patient subtyping via time-aware lstm networks. In ACM SIGKDD, pages 65–74, 2017

  4. [4]

    Long short-term memory and learning-to-learn in networks of spiking neurons

    Guillaume Bellec, Darjan Salaj, Anand Subramoney, Robert Legenstein, and Wolfgang Maass. Long short-term memory and learning-to-learn in networks of spiking neurons. In NeurIPS, 2018

  5. [5]

    The F ourier transform and its applications

    Ronald Newbold Bracewell and Ronald N Bracewell. The F ourier transform and its applications. McGraw-Hill New York, 1986

  6. [6]

    Skip rnn: Learning to skip state updates in recurrent neural networks

    Víctor Campos, Brendan Jou, Xavier Giró-i Nieto, Jordi Torres, and Shih-Fu Chang. Skip rnn: Learning to skip state updates in recurrent neural networks. In ICLR, 2018

  7. [7]

    O. Celma. Music Recommendation and Discovery in the Long Tail . Springer, 2010

  8. [8]

    Neural ordinary differential equations

    Tian Qi Chen, Yulia Rubanova, Jesse Bettencourt, and David Duvenaud. Neural ordinary differential equations. In Neural Information Processing Systems (NeurIPS) , 2018

  9. [9]

    Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation

    Kyunghyun Cho, Bart Van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. Learning phrase representations using rnn encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078, 2014

  10. [10]

    Doctor AI: Predicting clinical events via recurrent neural networks

    Edward Choi, Mohammad Taha Bahadori, Andy Schuetz, Walter F Stewart, and Jimeng Sun. Doctor AI: Predicting clinical events via recurrent neural networks. In Machine Learning for Healthcare Conference, pages 301–318, 2016

  11. [11]

    Time-frequency analysis, volume 778

    Leon Cohen. Time-frequency analysis, volume 778. Prentice hall, 1995

  12. [12]

    An introduction to the theory of point processes: volume II: general theory and structure

    Daryl J Daley and David Vere-Jones. An introduction to the theory of point processes: volume II: general theory and structure . Springer Science & Business Media, 2007

  13. [13]

    Support vector regression machines

    Harris Drucker, Christopher JC Burges, Linda Kaufman, Alex J Smola, and Vladimir Vapnik. Support vector regression machines. In NeurIPS, pages 155–161, 1997

  14. [14]

    Recurrent marked temporal point processes: Embedding event history to vector

    Nan Du, Hanjun Dai, Rakshit Trivedi, Utkarsh Upadhyay, Manuel Gomez-Rodriguez, and Le Song. Recurrent marked temporal point processes: Embedding event history to vector. In ACM SIGKDD, pages 1555–1564. ACM, 2016

  15. [15]

    evt_MNIST: A spike based version of traditional MNIST

    Mazdak Fatahi, Mahmood Ahmadi, Mahyar Shahsavari, Arash Ahmadi, and Philippe Devienne. evt_mnist: A spike based version of traditional mnist. arXiv preprint arXiv:1604.06751, 2016

  16. [16]

    Modeling time series data with deep fourier neural networks

    Michael S Gashler and Stephen C Ashmore. Modeling time series data with deep fourier neural networks. Neurocomputing, 188:3–11, 2016

  17. [17]

    Convolutional Sequence to Sequence Learning

    Jonas Gehring, Michael Auli, David Grangier, Denis Yarats, and Yann N Dauphin. Convolutional sequence to sequence learning. arXiv preprint arXiv:1705.03122, 2017

  18. [18]

    Recurrent nets that time and count

    Felix A Gers and Jürgen Schmidhuber. Recurrent nets that time and count. In IJCNN, volume 3, pages 189–194. IEEE, 2000

  19. [19]

    Taming the waves: sine as activation function in deep neural networks

    Tuomas Virtanen Giambattista Parascandolo, Heikki Huttunen. Taming the waves: sine as activation function in deep neural networks. 2017

  20. [20]

    Neural decomposition of time-series data for effective generalization

    Luke B Godfrey and Michael S Gashler. Neural decomposition of time-series data for effective generalization. IEEE transactions on neural networks and learning systems , 29(7):2973–2985, 2018. 9

  21. [21]

    Lstm: A search space odyssey

    Klaus Greff, Rupesh K Srivastava, Jan Koutník, Bas R Steunebrink, and Jürgen Schmidhuber. Lstm: A search space odyssey. IEEE transactions on neural networks and learning systems , 28(10):2222–2232, 2017

  22. [22]

    node2vec: Scalable feature learning for networks

    Aditya Grover and Jure Leskovec. node2vec: Scalable feature learning for networks. In ACM SIGKDD, pages 855–864, 2016

  23. [23]

    Long short-term memory

    Sepp Hochreiter and Jürgen Schmidhuber. Long short-term memory. Neural computation, 9(8):1735–1780, 1997

  24. [24]

    State-frequency memory recurrent neural networks

    Hao Hu and Guo-Jun Qi. State-frequency memory recurrent neural networks. In International Conference on Machine Learning, pages 1568–1577, 2017

  25. [25]

    SimplE embedding for link prediction in knowledge graphs

    Seyed Mehran Kazemi and David Poole. SimplE embedding for link prediction in knowledge graphs. In NeurIPS, pages 4289–4300, 2018

  26. [26]

    Relational representation learning for dynamic (knowledge) graphs: A survey

    Seyed Mehran Kazemi, Rishab Goel, Kshitij Jain, Ivan Kobyzev, Akshay Sethi, Peter Forsyth, and Pascal Poupart. Relational representation learning for dynamic (knowledge) graphs: A survey. arXiv preprint arXiv:1905.11485, 2019

  27. [27]

    Adam: A Method for Stochastic Optimization

    Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014

  28. [28]

    Learning Dynamic Embeddings from Temporal Interactions

    Srijan Kumar, Xikun Zhang, and Jure Leskovec. Learning dynamic embedding from temporal interaction networks. arXiv preprint arXiv:1812.02289, 2018

  29. [29]

    Retainvis: Visual analytics with interpretable and interactive recurrent neural networks on electronic medical records

    Bum Chul Kwon, Min-Je Choi, Joanne Taery Kim, Edward Choi, Young Bin Kim, Soonwook Kwon, Jimeng Sun, and Jaegul Choo. Retainvis: Visual analytics with interpretable and interactive recurrent neural networks on electronic medical records. IEEE transactions on visualization and computer graphics, 25(1):299–309, 2019

  30. [30]

    Nonlinear signal processing using neural networks: Prediction and system modelling

    Alan Lapedes and Robert Farber. Nonlinear signal processing using neural networks: Prediction and system modelling. Technical report, 1987

  31. [31]

    Hawkes Processes

    Patrick J Laub, Thomas Taimre, and Philip K Pollett. Hawkes processes. arXiv preprint arXiv:1507.02822, 2015

  32. [32]

    Tidigits

    R Gary Leonard and George Doddington. Tidigits. Linguistic Data Consortium, Philadelphia , 1993

  33. [33]

    Time-Dependent Representation for Neural Event Sequence Prediction

    Yang Li, Nan Du, and Samy Bengio. Time-dependent representation for neural event sequence prediction. arXiv preprint arXiv:1708.00065, 2017

  34. [34]

    Learning temporal point processes via reinforcement learning

    Shuang Li, Shuai Xiao, Shixiang Zhu, Nan Du, Yao Xie, and Le Song. Learning temporal point processes via reinforcement learning. In NeurIPS, pages 10804–10814, 2018

  35. [35]

    Time-dependent representation for neural event sequence prediction

    Yang Li, Nan Du, and Samy Bengio. Time-dependent representation for neural event sequence prediction. 2018

  36. [36]

    Directly modeling missing data in sequences with rnns: Improved classification of clinical time series

    Zachary C Lipton, David Kale, and Randall Wetzel. Directly modeling missing data in sequences with rnns: Improved classification of clinical time series. In Machine Learning for Healthcare Conference, pages 253–270, 2016

  37. [37]

    Multistability of recurrent neural networks with nonmonotonic activation functions and mixed time delays

    Peng Liu, Zhigang Zeng, and Jun Wang. Multistability of recurrent neural networks with nonmonotonic activation functions and mixed time delays. IEEE Transactions on Systems, Man, and Cybernetics: Systems , 46(4):512–523, 2016

  38. [38]

    Streaming Graph Neural Networks

    Yao Ma, Ziyi Guo, Zhaochun Ren, Eric Zhao, Jiliang Tang, and Dawei Yin. Streaming graph neural networks. arXiv preprint arXiv:1810.10627, 2018

  39. [39]

    The neural hawkes process: A neurally self-modulating multivariate point process

    Hongyuan Mei and Jason M Eisner. The neural hawkes process: A neurally self-modulating multivariate point process. In NeurIPS, pages 6754–6764, 2017

  40. [40]

    Distributed repre- sentations of words and phrases and their compositionality

    Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. Distributed repre- sentations of words and phrases and their compositionality. In NeurIPS, 2013. 10

  41. [41]

    Fourier neural networks: An approach with sinusoidal activation functions

    Luis Mingo, Levon Aslanyan, Juan Castellanos, Miguel Diaz, and Vladimir Riazanov. Fourier neural networks: An approach with sinusoidal activation functions. 2004

  42. [42]

    Dynamic bayesian networks: representation, inference and learning

    Kevin Patrick Murphy and Stuart Russell. Dynamic bayesian networks: representation, inference and learning. 2002

  43. [43]

    Rectified linear units improve restricted boltzmann machines

    Vinod Nair and Geoffrey E Hinton. Rectified linear units improve restricted boltzmann machines. In ICML, pages 807–814, 2010

  44. [44]

    Phased lstm: Accelerating recurrent network training for long or event-based sequences

    Daniel Neil, Michael Pfeiffer, and Shih-Chii Liu. Phased lstm: Accelerating recurrent network training for long or event-based sequences. In NeurIPS, pages 3882–3890, 2016

  45. [45]

    The role of over-parametrization in generalization of neural networks

    Behnam Neyshabur, Zhiyuan Li, Srinadh Bhojanapalli, Yann LeCun, and Nathan Srebro. The role of over-parametrization in generalization of neural networks. In ICLR, 2019

  46. [46]

    A review of relational machine learning for knowledge graphs

    Maximilian Nickel, Kevin Murphy, V olker Tresp, and Evgeniy Gabrilovich. A review of relational machine learning for knowledge graphs. Proceedings of the IEEE, 104(1):11–33, 2016

  47. [47]

    Automatic differentiation in pytorch

    Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. Automatic differentiation in pytorch. 2017

  48. [48]

    Glove: Global vectors for word representation

    Jeffrey Pennington, Richard Socher, and Christopher Manning. Glove: Global vectors for word representation. In EMNLP, pages 1532–1543, 2014

  49. [49]

    Population size extrapolation in relational probabilistic modelling

    David Poole, David Buchman, Seyed Mehran Kazemi, Kristian Kersting, and Sriraam Natarajan. Population size extrapolation in relational probabilistic modelling. In SUM. Springer, 2014

  50. [50]

    An introduction to hidden markov models

    Lawrence R Rabiner and Biing-Hwang Juang. An introduction to hidden markov models. ieee assp magazine, 3(1):4–16, 1986

  51. [51]

    Gaussian processes in machine learning

    Carl Edward Rasmussen. Gaussian processes in machine learning. In Advanced lectures on machine learning, pages 63–71. Springer, 2004

  52. [52]

    Neural networks with periodic and monotonic activation functions: a comparative study in classification problems

    Josep M Sopena, Enrique Romero, and Rene Alquezar. Neural networks with periodic and monotonic activation functions: a comparative study in classification problems. 1999

  53. [53]

    Dynamic conditional random fields: Factorized probabilistic models for labeling and segmenting sequence data.Journal of Machine Learning Research, 8(Mar):693–723, 2007

    Charles Sutton, Andrew McCallum, and Khashayar Rohanimanesh. Dynamic conditional random fields: Factorized probabilistic models for labeling and segmenting sequence data.Journal of Machine Learning Research, 8(Mar):693–723, 2007

  54. [54]

    Can recurrent neural networks warp time? In International Conference on Learning Representation (ICLR) , 2018

    Corentin Tallec and Yann Ollivier. Can recurrent neural networks warp time? In International Conference on Learning Representation (ICLR) , 2018

  55. [55]

    Know-evolve: Deep temporal reasoning for dynamic knowledge graphs

    Rakshit Trivedi, Hanjun Dai, Yichen Wang, and Le Song. Know-evolve: Deep temporal reasoning for dynamic knowledge graphs. In ICML, pages 3462–3471, 2017

  56. [56]

    Deep reinforcement learning of marked temporal point processes

    Utkarsh Upadhyay, Abir De, and Manuel Gomez-Rodriguez. Deep reinforcement learning of marked temporal point processes. In NeurIPS, 2018

  57. [57]

    Attention is all you need

    Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need. In NeurIPS, 2017

  58. [58]

    Handwritten digit recognition using multilayer feedforward neural networks with periodic and monotonic activation functions

    Kwok-wo Wong, Chi-sing Leung, and Sheng-jiang Chang. Handwritten digit recognition using multilayer feedforward neural networks with periodic and monotonic activation functions. In Pattern Recognition, volume 3, pages 106–109. IEEE, 2002

  59. [59]

    Wasserstein learning of deep generative point process models

    Shuai Xiao, Mehrdad Farajtabar, Xiaojing Ye, Junchi Yan, Le Song, and Hongyuan Zha. Wasserstein learning of deep generative point process models. In NeurIPS, 2017

  60. [60]

    Learning conditional generative models for temporal point processes

    Shuai Xiao, Hongteng Xu, Junchi Yan, Mehrdad Farajtabar, Xiaokang Yang, Le Song, and Hongyuan Zha. Learning conditional generative models for temporal point processes. In AAAI, 2018

  61. [61]

    What to do next: Modeling user behaviors by time-lstm

    Yu Zhu, Hao Li, Yikang Liao, Beidou Wang, Ziyu Guan, Haifeng Liu, and Deng Cai. What to do next: Modeling user behaviors by time-lstm. In IJCAI, pages 3602–3608, 2017. 11 0 200 400 600 800 1000 Epoch 0.10 0.15 0.20 0.25 0.30 0.35 0.40Accuracy LSTM+T LSTM+Time2Vec(l=16+1) LSTM+Time2Vec(l=32+1) LSTM+Time2Vec(l=64+1) Figure 6: Comparing LSTM+T and LSTM+Time2...