Deep Mixture Point Processes: Spatio-temporal Event Prediction with Rich Contextual Information

Hiroyuki Toda; Maya Okawa; Naonori Ueda; Takeshi Kurashima; Tomoharu Iwata; Yusuke Tanaka

arxiv: 1906.08952 · v1 · pith:ASAKX5XHnew · submitted 2019-06-21 · 📊 stat.ML · cs.LG

Deep Mixture Point Processes: Spatio-temporal Event Prediction with Rich Contextual Information

Maya Okawa , Tomoharu Iwata , Takeshi Kurashima , Yusuke Tanaka , Hiroyuki Toda , Naonori Ueda This is my paper

Pith reviewed 2026-05-25 18:59 UTC · model grok-4.3

classification 📊 stat.ML cs.LG

keywords point processesspatio-temporal predictiondeep neural networksmixture modelsevent predictioncontextual informationintensity functionurban data

0 comments

The pith

A deep neural network produces the weights for a mixture of kernels that defines the intensity of a spatio-temporal point process, incorporating image and text context while keeping the intensity integral analytically tractable.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes Deep Mixture Point Processes to predict the timing and location of events such as taxi pick-ups, crimes, and collisions. It defines the intensity function as a mixture of kernels whose weights are generated by a deep neural network that processes heterogeneous contextual inputs including images and text. This design lets the model capture complex nonlinear influences from weather, traffic, and social factors. The mixture structure ensures the integral of the intensity can be computed exactly, which is required for likelihood-based parameter estimation. Existing point process models either omit such rich context or lose tractability when adding it.

Core claim

The central claim is that expressing the intensity of a spatio-temporal point process as a mixture of kernels with mixture weights produced by a deep neural network allows automatic learning of nonlinear effects from high-dimensional contextual data such as images and text, while the mixture formulation keeps analytical integration over the intensity tractable for maximum-likelihood estimation.

What carries the argument

The intensity function expressed as a mixture of kernels whose weights are the output of a deep neural network processing contextual inputs.

If this is right

The model can incorporate high-dimensional image and text data without sacrificing the ability to compute the likelihood exactly.
Parameters are estimated by standard maximum-likelihood methods because the mixture keeps the intensity integral tractable.
Predictive performance improves on urban event data compared with methods that cannot use the same contextual inputs.
The approach applies directly to tasks such as urban planning and transportation optimization that rely on accurate spatio-temporal forecasts.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same mixture construction could be tested on other event domains such as earthquakes or disease outbreaks if suitable image or text context is available.
Mixture models may offer a general route to keep deep point processes tractable when the intensity must integrate in closed form.
Performance may depend on the specific choice of kernel family, suggesting controlled experiments that vary only the kernels while holding the neural weighting fixed.

Load-bearing premise

The intensity can be written as a mixture of kernels with neural-network weights such that the integral of the intensity stays analytically computable.

What would settle it

A real-world spatio-temporal dataset on which DMPP yields no higher log-likelihood or predictive accuracy than standard point process baselines that ignore image and text context, or on which the required intensity integral has no closed form.

read the original abstract

Predicting when and where events will occur in cities, like taxi pick-ups, crimes, and vehicle collisions, is a challenging and important problem with many applications in fields such as urban planning, transportation optimization and location-based marketing. Though many point processes have been proposed to model events in a continuous spatio-temporal space, none of them allow for the consideration of the rich contextual factors that affect event occurrence, such as weather, social activities, geographical characteristics, and traffic. In this paper, we propose \textsf{DMPP} (Deep Mixture Point Processes), a point process model for predicting spatio-temporal events with the use of rich contextual information; a key advance is its incorporation of the heterogeneous and high-dimensional context available in image and text data. Specifically, we design the intensity of our point process model as a mixture of kernels, where the mixture weights are modeled by a deep neural network. This formulation allows us to automatically learn the complex nonlinear effects of the contextual factors on event occurrence. At the same time, this formulation makes analytical integration over the intensity, which is required for point process estimation, tractable. We use real-world data sets from different domains to demonstrate that DMPP has better predictive performance than existing methods.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

DMPP shows how to let a DNN set mixture weights on kernels so high-dim context enters the intensity while the integral stays closed-form.

read the letter

The useful piece is the construction: intensity as a sum of kernels with weights from a neural net that ingests images or text. Since the weights do not depend on the space-time coordinates, the integral over the intensity reduces to a weighted sum of the individual kernel integrals and remains analytic. That removes the usual barrier to likelihood-based fitting when context is rich and heterogeneous. The paper demonstrates this on taxi, crime, and collision data and reports better predictive scores than the baselines it tests against. The math for tractability holds up by design, and the experiments use real urban datasets rather than synthetic ones. The kernels themselves are conventional, so the contribution lives in the weighting mechanism rather than in new kernel families. A minor limitation is that the results could be more sensitive to the choice of base kernels or the number of mixture components than the current tables show; a short ablation on those choices would have strengthened the case. The work is aimed at people who already use point processes for city-scale event data and need a route to include side information without losing closed-form likelihoods. It is coherent on its own terms and the central claim is supported by the construction and the reported experiments. I would send it to peer review.

Referee Report

0 major / 3 minor

Summary. The manuscript proposes Deep Mixture Point Processes (DMPP) for spatio-temporal event prediction. The intensity function is defined as a mixture of kernels whose weights are outputs of a deep neural network that ingests high-dimensional contextual inputs (images and text). The authors claim that this construction captures complex nonlinear effects of context while keeping the integral of the intensity analytically tractable, enabling exact likelihood-based estimation, and that DMPP outperforms existing point-process baselines on real-world datasets from multiple domains.

Significance. If the empirical results and the tractability claim hold under the full experimental protocol, the work supplies a concrete mechanism for folding rich, heterogeneous context into continuous-space point processes without sacrificing closed-form integration. This addresses a practical limitation of many existing spatio-temporal models and could be directly useful in urban analytics and transportation applications.

minor comments (3)

The abstract asserts superior predictive performance but supplies neither quantitative metrics, error bars, dataset sizes, nor baseline names; the results section should include these details with explicit comparison tables.
The precise functional form of the mixture kernels and the DNN architecture (number of layers, activation functions, output parameterization of the weights) are not stated in the provided abstract; these should be given explicitly, preferably with an equation block, so that the tractability argument can be verified by inspection.
No mention is made of how the model handles the non-negativity constraint on mixture weights or of any regularization used to prevent degenerate mixtures; this implementation detail should be clarified.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive summary, recognition of the significance of incorporating rich contextual information into spatio-temporal point processes, and recommendation for minor revision. No major comments were provided in the report.

Circularity Check

0 steps flagged

No significant circularity; tractability follows directly from mixture construction

full rationale

The paper defines the intensity as a mixture of kernels whose weights are outputs of a DNN taking external context (images/text) as input. Because the weights are independent of the integration variables (space-time), the integral of the intensity is exactly a weighted sum of the per-kernel integrals. This property is a direct algebraic consequence of the stated functional form and requires no fitted parameters, self-referential definitions, or load-bearing self-citations. No equations in the provided text equate a derived quantity to its own inputs by construction, and the central modeling choice is presented as an explicit design decision rather than a prediction or theorem derived from prior results. The derivation chain is therefore self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Ledger constructed from abstract only; no explicit free parameters, invented entities, or additional axioms are stated beyond the core modeling choice.

axioms (1)

domain assumption The intensity function can be expressed as a mixture of kernels whose weights are produced by a deep neural network from contextual data, preserving analytical tractability of the integral.
This premise is required for both the claimed context incorporation and the tractable estimation highlighted in the abstract.

pith-pipeline@v0.9.0 · 5766 in / 1219 out tokens · 28593 ms · 2026-05-25T18:59:03.627302+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

we design the intensity of our point process model as a mixture of kernels, where the mixture weights are modeled by a deep neural network... this formulation makes analytical integration over the intensity... tractable
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

the intensity as a function of contextual features: λ(x|D) = ∫ f(u, Z(u; D);θ) k(x, u) du

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

43 extracted references · 43 canonical work pages · 5 internal anchors

[1]

Yacine Aït-Sahalia, Julio Cacho-Diaz, and Roger JA Laeven. 2015. Modeling financial contagion using mutually exciting jump processes. Journal of Financial Economics 117, 3 (2015), 585–606. (a) NYC Taxi (b) NYC Collision Abbildung 6: Word cloud of top 15 words by attention weight; larger size denotes higher attention

work page 2015
[2]

Emmanuel Bacry, Iacopo Mastromatteo, and Jean-François Muzy. 2015. Hawkes processes in finance. Market Microstructure and Liquidity 1, 01 (2015), 1550005

work page 2015
[3]

Brantingham and Paul J

Patricia L. Brantingham and Paul J. Brantingham. 1981. Mobility, Notorie- ty, and Crime: A Study in the Crime Patterns of Urban Nodal Points. Jour- nal of Environmental Systems 11, 1 (1981), 89–99. https://doi.org/10.2190/ DTHJ-ERNN-HVCV-6K5T

work page 1981
[4]

Longbiao Chen, Daqing Zhang, Leye Wang, Dingqi Yang, Xiaojuan Ma, Shijian Li, Zhaohui Wu, Gang Pan, Thi-Mai-Trang Nguyen, and Jérémie Jakubowicz

work page
[5]

In Proceedings of the 18th Ubicomp

Dynamic cluster-based over-demand prediction in bike sharing systems. In Proceedings of the 18th Ubicomp . ACM, 841–852

work page
[6]

Edward Choi, Nan Du, Robert Chen, Le Song, and Jimeng Sun. 2015. Constructing disease network and temporal progression model via context-sensitive hawkes process. In Proceedings of the 15th ICDM . IEEE, 721–726

work page 2015
[7]

David R Cox. 1992. Regression models and life-tables. In Breakthroughs in statistics. Springer, 527–541

work page 1992
[8]

Peter J Diggle, Paula Moraga, Barry Rowlingson, Benjamin M Taylor, et al. 2013. Spatial and spatio-temporal log-Gaussian Cox processes: extending the geostati- stical paradigm. Statist. Sci. 28, 4 (2013), 542–563

work page 2013
[9]

Nan Du, Hanjun Dai, Rakshit Trivedi, Utkarsh Upadhyay, Manuel Gomez- Rodriguez, and Le Song. 2016. Recurrent marked temporal point processes: Embedding event history to vector. In Proceedings of the 22nd KDD . ACM, 1555– 1564

work page 2016
[10]

Reactive point processes: A new approach to predicting power failures in underground electrical systems

Seyda Ertekin, Cynthia Rudin, and Tyler H. McCormick. 2015. Reactive point processes: A new approach to predicting power failures in underground electrical systems. Annals of Applied Statistics 9, 1 (2015), 122–144. https://doi.org/10.1214/ 14-AOAS789 arXiv:1505.07661

work page internal anchor Pith review Pith/arXiv arXiv 2015
[11]

Mehrdad Farajtabar, Nan Du, Manuel Gomez Rodriguez, Isabel Valera, Hongyuan Zha, and Le Song. 2014. Shaping social activity by incentivizing users. InAdvances in Neural Information Processing Systems . 2474–2482

work page 2014
[12]

Song Gao, Yaoli Wang, Yong Gao, and Yu Liu. 2013. Understanding urban traffic- flow characteristics: a rethinking of betweenness centrality. Environment and Planning B: Planning and Design 40, 1 (2013), 135–153

work page 2013
[13]

Roch Giorgi et al . 2003. A relative survival regression model using B-spline functions to model non-proportional hazards. Statistics in medicine 22, 17 (2003), 2767–2784

work page 2003
[14]

Alan G Hawkes. 1971. Spectra of some self-exciting and mutually exciting point processes. Biometrika 58, 1 (1971), 83–90

work page 1971
[15]

Junichiro Hayano et al. 2011. Increased non-gaussianity of heart rate variability predicts cardiac mortality after an acute myocardial infarction. Frontiers in physiology 2 (2011), 65

work page 2011
[16]

Dave Higdon. 2002. Space and space-time modeling using process convolutions. In Quantitative methods for current environmental issues . Springer, 37–56

work page 2002
[17]

Minh X Hoang, Yu Zheng, and Ambuj K Singh. 2016. FCCF: forecasting citywide crowd flows based on big data. InProceedings of the 24th ACM SIGSPATIAL. ACM, 6

work page 2016
[18]

Tomoharu Iwata, Amar Shah, and Zoubin Ghahramani. 2013. Discovering latent influence in online social activities via shared cascade poisson processes. In Proceedings of the 19th ACM SIGKDD international conference on Knowledge Discovery and Data mining . ACM, 266–274

work page 2013
[19]

Yoon Kim. 2014. Convolutional neural networks for sentence classification.arXiv preprint arXiv:1408.5882 (2014)

work page internal anchor Pith review Pith/arXiv arXiv 2014
[20]

Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimi- zation. arXiv preprint arXiv:1412.6980 (2014)

work page internal anchor Pith review Pith/arXiv arXiv 2014
[21]

Herbert K H Lee, Bruno Sanso, Weining Zhou, and David M Higdon. 2008. Infe- rence for a proton accelerator using convolution models. J. Amer. Statist. Assoc. 103, 482 (2008), 604–613

work page 2008
[22]

Ricardo T Lemos and Bruno Sansó. 2009. A spatio-temporal model for mean, anomaly, and trend fields of North Atlantic sea surface temperature. J. Amer. Statist. Assoc. 104, 485 (2009), 5–18

work page 2009
[23]

PA W Lewis and Gerald S Shedler. 1979. Simulation of nonhomogeneous Poisson processes by thinning. Naval research logistics quarterly 26, 3 (1979), 403–413

work page 1979
[24]

Liangda Li, Hongbo Deng, Anlei Dong, Yi Chang, and Hongyuan Zha. 2014. Identifying and labeling search tasks via query-based hawkes processes. In Pro- ceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 731–740

work page 2014
[25]

Zhouhan Lin, Minwei Feng, Mo Yu, Bing Xiang, Bowen Zhou, and Yoshua Ben- gio. 2017. A structured self-attentive sentence embedding. arXiv preprint ar- Xiv:1703.03130 (2017), 1–15. arXiv:arXiv:1703.03130v1

work page internal anchor Pith review Pith/arXiv arXiv 2017
[26]

Jiasen Lu, Caiming Xiong, Devi Parikh, and Richard Socher. 2017. Knowing when to look: Adaptive attention via a visual sentinel for image captioning. In Proceedings of the 30th CVPR , Vol. 6. 2

work page 2017
[27]

George O Mohler, Martin B Short, P Jeffrey Brantingham, Frederic Paik Schoen- berg, and George E Tita. 2011. Self-exciting point process modeling of crime. J. Amer. Statist. Assoc. 106, 493 (2011), 100–108

work page 2011
[28]

Yosihiko Ogata. 1998. Space-time point-process models for earthquake occur- rences. Annals of the Institute of Statistical Mathematics 50, 2 (1998), 379–402

work page 1998
[29]

Michael D Porter, Gentry White, et al . 2012. Self-exciting hurdle models for terrorist activity. The Annals of Applied Statistics 6, 1 (2012), 106–124

work page 2012
[30]

Frederic Schoenberg, Marc Hoffmann, and Ryan Harrigan. 2017. A recursive point process model for infectious diseases. arXiv preprint arXiv:1703.08202 (2017)

work page internal anchor Pith review Pith/arXiv arXiv 2017
[31]

Laura Serra, Marc Saez, Jorge Mateu, Diego Varga, Pablo Juan, Carlos Díaz- Ávalos, and Håvard Rue. 2014. Spatio-temporal log-Gaussian Cox processes for modelling wildfire occurrence: the case of Catalonia, 1994–2008. Environmental and Ecological Statistics 21, 3 (2014), 531–563

work page 2014
[32]

Masamichi Shimosaka, Keisuke Maeda, Takeshi Tsukiji, and Kota Tsubouchi. 2015. Forecasting urban dynamics with mobility logs by bilinear Poisson regression. In Proceedings of the 17th Ubicomp . ACM, 535–546

work page 2015
[33]

Benjamin M Taylor, Tilman M Davies, Barry S Rowlingson, Peter J Diggle, et al

work page
[34]

Journal of Statistical Software 52, 4 (2013), 1–40

lgcp: an R package for inference with spatial and spatio-temporal log- Gaussian Cox processes. Journal of Statistical Software 52, 4 (2013), 1–40

work page 2013
[35]

Seiya Tokui, Kenta Oono, Shohei Hido, and Justin Clayton. 2015. Chainer: a next-generation open source framework for deep learning. In Proceedings of workshop on Machine Learning Systems (LearningSys) in the 22th NIPS, Vol. 5. 1–6

work page 2015
[36]

Pengfei Wang, Yanjie Fu, Guannan Liu, Wenqing Hu, and Charu Aggarwal. 2017. Human mobility synchronization and trip purpose detection with mixture of hawkes processes. In Proceedings of the 23rd KDD . ACM, 495–503

work page 2017
[37]

Senzhang Wang, Lifang He, Leon Stenneth, Philip S Yu, and Zhoujun Li. 2015. Citywide traffic congestion estimation with social media. In Proceedings of the 23rd SIGSPATIAL. ACM, 34

work page 2015
[38]

Holger Wendland. 1995. Piecewise polynomial, positive definite and compact- ly supported radial functions of minimal degree. Advances in Computational Mathematics 4, 1 (1995), 389–396

work page 1995
[39]

Shuai Xiao, Mehrdad Farajtabar, Xiaojing Ye, Junchi Yan, Le Song, and Hongyuan Zha. 2017. Wasserstein learning of deep generative point process models. In Proceedings of the 30th NIPS . 3247–3257

work page 2017
[40]

Shuai Xiao, Junchi Yan, Xiaokang Yang, Hongyuan Zha, and Stephen M Chu

work page
[41]

In Proceedings of the 31th AAAI , Vol

Modeling the Intensity Function of Point Process Via Recurrent Neural Networks.. In Proceedings of the 31th AAAI , Vol. 17. 1597–1603

work page
[42]

Junbo Zhang, Yu Zheng, and Dekang Qi. 2017. Deep Spatio-Temporal Residual Networks for Citywide Crowd Flows Prediction.. In Proceedings of the 30th AAAI . 1655–1661

work page 2017
[43]

Tian Zhou, Lixin Gao, and Daiheng Ni. 2014. Road traffic prediction by incorpo- rating online information. In Proceedings of the 23rd WWW . ACM, 1235–1240. Abbildung 7: The architecture of the neural network used in the proposed method. APPENDIX A NEURAL NETWORK ARCHITECTURE This section details the architecture of the neural network used in our experimen...

work page 2014

[1] [1]

Yacine Aït-Sahalia, Julio Cacho-Diaz, and Roger JA Laeven. 2015. Modeling financial contagion using mutually exciting jump processes. Journal of Financial Economics 117, 3 (2015), 585–606. (a) NYC Taxi (b) NYC Collision Abbildung 6: Word cloud of top 15 words by attention weight; larger size denotes higher attention

work page 2015

[2] [2]

Emmanuel Bacry, Iacopo Mastromatteo, and Jean-François Muzy. 2015. Hawkes processes in finance. Market Microstructure and Liquidity 1, 01 (2015), 1550005

work page 2015

[3] [3]

Brantingham and Paul J

Patricia L. Brantingham and Paul J. Brantingham. 1981. Mobility, Notorie- ty, and Crime: A Study in the Crime Patterns of Urban Nodal Points. Jour- nal of Environmental Systems 11, 1 (1981), 89–99. https://doi.org/10.2190/ DTHJ-ERNN-HVCV-6K5T

work page 1981

[4] [4]

Longbiao Chen, Daqing Zhang, Leye Wang, Dingqi Yang, Xiaojuan Ma, Shijian Li, Zhaohui Wu, Gang Pan, Thi-Mai-Trang Nguyen, and Jérémie Jakubowicz

work page

[5] [5]

In Proceedings of the 18th Ubicomp

Dynamic cluster-based over-demand prediction in bike sharing systems. In Proceedings of the 18th Ubicomp . ACM, 841–852

work page

[6] [6]

Edward Choi, Nan Du, Robert Chen, Le Song, and Jimeng Sun. 2015. Constructing disease network and temporal progression model via context-sensitive hawkes process. In Proceedings of the 15th ICDM . IEEE, 721–726

work page 2015

[7] [7]

David R Cox. 1992. Regression models and life-tables. In Breakthroughs in statistics. Springer, 527–541

work page 1992

[8] [8]

Peter J Diggle, Paula Moraga, Barry Rowlingson, Benjamin M Taylor, et al. 2013. Spatial and spatio-temporal log-Gaussian Cox processes: extending the geostati- stical paradigm. Statist. Sci. 28, 4 (2013), 542–563

work page 2013

[9] [9]

Nan Du, Hanjun Dai, Rakshit Trivedi, Utkarsh Upadhyay, Manuel Gomez- Rodriguez, and Le Song. 2016. Recurrent marked temporal point processes: Embedding event history to vector. In Proceedings of the 22nd KDD . ACM, 1555– 1564

work page 2016

[10] [10]

Reactive point processes: A new approach to predicting power failures in underground electrical systems

Seyda Ertekin, Cynthia Rudin, and Tyler H. McCormick. 2015. Reactive point processes: A new approach to predicting power failures in underground electrical systems. Annals of Applied Statistics 9, 1 (2015), 122–144. https://doi.org/10.1214/ 14-AOAS789 arXiv:1505.07661

work page internal anchor Pith review Pith/arXiv arXiv 2015

[11] [11]

Mehrdad Farajtabar, Nan Du, Manuel Gomez Rodriguez, Isabel Valera, Hongyuan Zha, and Le Song. 2014. Shaping social activity by incentivizing users. InAdvances in Neural Information Processing Systems . 2474–2482

work page 2014

[12] [12]

Song Gao, Yaoli Wang, Yong Gao, and Yu Liu. 2013. Understanding urban traffic- flow characteristics: a rethinking of betweenness centrality. Environment and Planning B: Planning and Design 40, 1 (2013), 135–153

work page 2013

[13] [13]

Roch Giorgi et al . 2003. A relative survival regression model using B-spline functions to model non-proportional hazards. Statistics in medicine 22, 17 (2003), 2767–2784

work page 2003

[14] [14]

Alan G Hawkes. 1971. Spectra of some self-exciting and mutually exciting point processes. Biometrika 58, 1 (1971), 83–90

work page 1971

[15] [15]

Junichiro Hayano et al. 2011. Increased non-gaussianity of heart rate variability predicts cardiac mortality after an acute myocardial infarction. Frontiers in physiology 2 (2011), 65

work page 2011

[16] [16]

Dave Higdon. 2002. Space and space-time modeling using process convolutions. In Quantitative methods for current environmental issues . Springer, 37–56

work page 2002

[17] [17]

Minh X Hoang, Yu Zheng, and Ambuj K Singh. 2016. FCCF: forecasting citywide crowd flows based on big data. InProceedings of the 24th ACM SIGSPATIAL. ACM, 6

work page 2016

[18] [18]

Tomoharu Iwata, Amar Shah, and Zoubin Ghahramani. 2013. Discovering latent influence in online social activities via shared cascade poisson processes. In Proceedings of the 19th ACM SIGKDD international conference on Knowledge Discovery and Data mining . ACM, 266–274

work page 2013

[19] [19]

Yoon Kim. 2014. Convolutional neural networks for sentence classification.arXiv preprint arXiv:1408.5882 (2014)

work page internal anchor Pith review Pith/arXiv arXiv 2014

[20] [20]

Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimi- zation. arXiv preprint arXiv:1412.6980 (2014)

work page internal anchor Pith review Pith/arXiv arXiv 2014

[21] [21]

Herbert K H Lee, Bruno Sanso, Weining Zhou, and David M Higdon. 2008. Infe- rence for a proton accelerator using convolution models. J. Amer. Statist. Assoc. 103, 482 (2008), 604–613

work page 2008

[22] [22]

Ricardo T Lemos and Bruno Sansó. 2009. A spatio-temporal model for mean, anomaly, and trend fields of North Atlantic sea surface temperature. J. Amer. Statist. Assoc. 104, 485 (2009), 5–18

work page 2009

[23] [23]

PA W Lewis and Gerald S Shedler. 1979. Simulation of nonhomogeneous Poisson processes by thinning. Naval research logistics quarterly 26, 3 (1979), 403–413

work page 1979

[24] [24]

Liangda Li, Hongbo Deng, Anlei Dong, Yi Chang, and Hongyuan Zha. 2014. Identifying and labeling search tasks via query-based hawkes processes. In Pro- ceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 731–740

work page 2014

[25] [25]

Zhouhan Lin, Minwei Feng, Mo Yu, Bing Xiang, Bowen Zhou, and Yoshua Ben- gio. 2017. A structured self-attentive sentence embedding. arXiv preprint ar- Xiv:1703.03130 (2017), 1–15. arXiv:arXiv:1703.03130v1

work page internal anchor Pith review Pith/arXiv arXiv 2017

[26] [26]

Jiasen Lu, Caiming Xiong, Devi Parikh, and Richard Socher. 2017. Knowing when to look: Adaptive attention via a visual sentinel for image captioning. In Proceedings of the 30th CVPR , Vol. 6. 2

work page 2017

[27] [27]

George O Mohler, Martin B Short, P Jeffrey Brantingham, Frederic Paik Schoen- berg, and George E Tita. 2011. Self-exciting point process modeling of crime. J. Amer. Statist. Assoc. 106, 493 (2011), 100–108

work page 2011

[28] [28]

Yosihiko Ogata. 1998. Space-time point-process models for earthquake occur- rences. Annals of the Institute of Statistical Mathematics 50, 2 (1998), 379–402

work page 1998

[29] [29]

Michael D Porter, Gentry White, et al . 2012. Self-exciting hurdle models for terrorist activity. The Annals of Applied Statistics 6, 1 (2012), 106–124

work page 2012

[30] [30]

Frederic Schoenberg, Marc Hoffmann, and Ryan Harrigan. 2017. A recursive point process model for infectious diseases. arXiv preprint arXiv:1703.08202 (2017)

work page internal anchor Pith review Pith/arXiv arXiv 2017

[31] [31]

Laura Serra, Marc Saez, Jorge Mateu, Diego Varga, Pablo Juan, Carlos Díaz- Ávalos, and Håvard Rue. 2014. Spatio-temporal log-Gaussian Cox processes for modelling wildfire occurrence: the case of Catalonia, 1994–2008. Environmental and Ecological Statistics 21, 3 (2014), 531–563

work page 2014

[32] [32]

Masamichi Shimosaka, Keisuke Maeda, Takeshi Tsukiji, and Kota Tsubouchi. 2015. Forecasting urban dynamics with mobility logs by bilinear Poisson regression. In Proceedings of the 17th Ubicomp . ACM, 535–546

work page 2015

[33] [33]

Benjamin M Taylor, Tilman M Davies, Barry S Rowlingson, Peter J Diggle, et al

work page

[34] [34]

Journal of Statistical Software 52, 4 (2013), 1–40

lgcp: an R package for inference with spatial and spatio-temporal log- Gaussian Cox processes. Journal of Statistical Software 52, 4 (2013), 1–40

work page 2013

[35] [35]

Seiya Tokui, Kenta Oono, Shohei Hido, and Justin Clayton. 2015. Chainer: a next-generation open source framework for deep learning. In Proceedings of workshop on Machine Learning Systems (LearningSys) in the 22th NIPS, Vol. 5. 1–6

work page 2015

[36] [36]

Pengfei Wang, Yanjie Fu, Guannan Liu, Wenqing Hu, and Charu Aggarwal. 2017. Human mobility synchronization and trip purpose detection with mixture of hawkes processes. In Proceedings of the 23rd KDD . ACM, 495–503

work page 2017

[37] [37]

Senzhang Wang, Lifang He, Leon Stenneth, Philip S Yu, and Zhoujun Li. 2015. Citywide traffic congestion estimation with social media. In Proceedings of the 23rd SIGSPATIAL. ACM, 34

work page 2015

[38] [38]

Holger Wendland. 1995. Piecewise polynomial, positive definite and compact- ly supported radial functions of minimal degree. Advances in Computational Mathematics 4, 1 (1995), 389–396

work page 1995

[39] [39]

Shuai Xiao, Mehrdad Farajtabar, Xiaojing Ye, Junchi Yan, Le Song, and Hongyuan Zha. 2017. Wasserstein learning of deep generative point process models. In Proceedings of the 30th NIPS . 3247–3257

work page 2017

[40] [40]

Shuai Xiao, Junchi Yan, Xiaokang Yang, Hongyuan Zha, and Stephen M Chu

work page

[41] [41]

In Proceedings of the 31th AAAI , Vol

Modeling the Intensity Function of Point Process Via Recurrent Neural Networks.. In Proceedings of the 31th AAAI , Vol. 17. 1597–1603

work page

[42] [42]

Junbo Zhang, Yu Zheng, and Dekang Qi. 2017. Deep Spatio-Temporal Residual Networks for Citywide Crowd Flows Prediction.. In Proceedings of the 30th AAAI . 1655–1661

work page 2017

[43] [43]

Tian Zhou, Lixin Gao, and Daiheng Ni. 2014. Road traffic prediction by incorpo- rating online information. In Proceedings of the 23rd WWW . ACM, 1235–1240. Abbildung 7: The architecture of the neural network used in the proposed method. APPENDIX A NEURAL NETWORK ARCHITECTURE This section details the architecture of the neural network used in our experimen...

work page 2014