pith. sign in

arxiv: 1906.08952 · v1 · pith:ASAKX5XHnew · submitted 2019-06-21 · 📊 stat.ML · cs.LG

Deep Mixture Point Processes: Spatio-temporal Event Prediction with Rich Contextual Information

Pith reviewed 2026-05-25 18:59 UTC · model grok-4.3

classification 📊 stat.ML cs.LG
keywords point processesspatio-temporal predictiondeep neural networksmixture modelsevent predictioncontextual informationintensity functionurban data
0
0 comments X

The pith

A deep neural network produces the weights for a mixture of kernels that defines the intensity of a spatio-temporal point process, incorporating image and text context while keeping the intensity integral analytically tractable.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes Deep Mixture Point Processes to predict the timing and location of events such as taxi pick-ups, crimes, and collisions. It defines the intensity function as a mixture of kernels whose weights are generated by a deep neural network that processes heterogeneous contextual inputs including images and text. This design lets the model capture complex nonlinear influences from weather, traffic, and social factors. The mixture structure ensures the integral of the intensity can be computed exactly, which is required for likelihood-based parameter estimation. Existing point process models either omit such rich context or lose tractability when adding it.

Core claim

The central claim is that expressing the intensity of a spatio-temporal point process as a mixture of kernels with mixture weights produced by a deep neural network allows automatic learning of nonlinear effects from high-dimensional contextual data such as images and text, while the mixture formulation keeps analytical integration over the intensity tractable for maximum-likelihood estimation.

What carries the argument

The intensity function expressed as a mixture of kernels whose weights are the output of a deep neural network processing contextual inputs.

If this is right

  • The model can incorporate high-dimensional image and text data without sacrificing the ability to compute the likelihood exactly.
  • Parameters are estimated by standard maximum-likelihood methods because the mixture keeps the intensity integral tractable.
  • Predictive performance improves on urban event data compared with methods that cannot use the same contextual inputs.
  • The approach applies directly to tasks such as urban planning and transportation optimization that rely on accurate spatio-temporal forecasts.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same mixture construction could be tested on other event domains such as earthquakes or disease outbreaks if suitable image or text context is available.
  • Mixture models may offer a general route to keep deep point processes tractable when the intensity must integrate in closed form.
  • Performance may depend on the specific choice of kernel family, suggesting controlled experiments that vary only the kernels while holding the neural weighting fixed.

Load-bearing premise

The intensity can be written as a mixture of kernels with neural-network weights such that the integral of the intensity stays analytically computable.

What would settle it

A real-world spatio-temporal dataset on which DMPP yields no higher log-likelihood or predictive accuracy than standard point process baselines that ignore image and text context, or on which the required intensity integral has no closed form.

read the original abstract

Predicting when and where events will occur in cities, like taxi pick-ups, crimes, and vehicle collisions, is a challenging and important problem with many applications in fields such as urban planning, transportation optimization and location-based marketing. Though many point processes have been proposed to model events in a continuous spatio-temporal space, none of them allow for the consideration of the rich contextual factors that affect event occurrence, such as weather, social activities, geographical characteristics, and traffic. In this paper, we propose \textsf{DMPP} (Deep Mixture Point Processes), a point process model for predicting spatio-temporal events with the use of rich contextual information; a key advance is its incorporation of the heterogeneous and high-dimensional context available in image and text data. Specifically, we design the intensity of our point process model as a mixture of kernels, where the mixture weights are modeled by a deep neural network. This formulation allows us to automatically learn the complex nonlinear effects of the contextual factors on event occurrence. At the same time, this formulation makes analytical integration over the intensity, which is required for point process estimation, tractable. We use real-world data sets from different domains to demonstrate that DMPP has better predictive performance than existing methods.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 3 minor

Summary. The manuscript proposes Deep Mixture Point Processes (DMPP) for spatio-temporal event prediction. The intensity function is defined as a mixture of kernels whose weights are outputs of a deep neural network that ingests high-dimensional contextual inputs (images and text). The authors claim that this construction captures complex nonlinear effects of context while keeping the integral of the intensity analytically tractable, enabling exact likelihood-based estimation, and that DMPP outperforms existing point-process baselines on real-world datasets from multiple domains.

Significance. If the empirical results and the tractability claim hold under the full experimental protocol, the work supplies a concrete mechanism for folding rich, heterogeneous context into continuous-space point processes without sacrificing closed-form integration. This addresses a practical limitation of many existing spatio-temporal models and could be directly useful in urban analytics and transportation applications.

minor comments (3)
  1. The abstract asserts superior predictive performance but supplies neither quantitative metrics, error bars, dataset sizes, nor baseline names; the results section should include these details with explicit comparison tables.
  2. The precise functional form of the mixture kernels and the DNN architecture (number of layers, activation functions, output parameterization of the weights) are not stated in the provided abstract; these should be given explicitly, preferably with an equation block, so that the tractability argument can be verified by inspection.
  3. No mention is made of how the model handles the non-negativity constraint on mixture weights or of any regularization used to prevent degenerate mixtures; this implementation detail should be clarified.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive summary, recognition of the significance of incorporating rich contextual information into spatio-temporal point processes, and recommendation for minor revision. No major comments were provided in the report.

Circularity Check

0 steps flagged

No significant circularity; tractability follows directly from mixture construction

full rationale

The paper defines the intensity as a mixture of kernels whose weights are outputs of a DNN taking external context (images/text) as input. Because the weights are independent of the integration variables (space-time), the integral of the intensity is exactly a weighted sum of the per-kernel integrals. This property is a direct algebraic consequence of the stated functional form and requires no fitted parameters, self-referential definitions, or load-bearing self-citations. No equations in the provided text equate a derived quantity to its own inputs by construction, and the central modeling choice is presented as an explicit design decision rather than a prediction or theorem derived from prior results. The derivation chain is therefore self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Ledger constructed from abstract only; no explicit free parameters, invented entities, or additional axioms are stated beyond the core modeling choice.

axioms (1)
  • domain assumption The intensity function can be expressed as a mixture of kernels whose weights are produced by a deep neural network from contextual data, preserving analytical tractability of the integral.
    This premise is required for both the claimed context incorporation and the tractable estimation highlighted in the abstract.

pith-pipeline@v0.9.0 · 5766 in / 1219 out tokens · 28593 ms · 2026-05-25T18:59:03.627302+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

43 extracted references · 43 canonical work pages · 5 internal anchors

  1. [1]

    Yacine Aït-Sahalia, Julio Cacho-Diaz, and Roger JA Laeven. 2015. Modeling financial contagion using mutually exciting jump processes. Journal of Financial Economics 117, 3 (2015), 585–606. (a) NYC Taxi (b) NYC Collision Abbildung 6: Word cloud of top 15 words by attention weight; larger size denotes higher attention

  2. [2]

    Emmanuel Bacry, Iacopo Mastromatteo, and Jean-François Muzy. 2015. Hawkes processes in finance. Market Microstructure and Liquidity 1, 01 (2015), 1550005

  3. [3]

    Brantingham and Paul J

    Patricia L. Brantingham and Paul J. Brantingham. 1981. Mobility, Notorie- ty, and Crime: A Study in the Crime Patterns of Urban Nodal Points. Jour- nal of Environmental Systems 11, 1 (1981), 89–99. https://doi.org/10.2190/ DTHJ-ERNN-HVCV-6K5T

  4. [4]

    Longbiao Chen, Daqing Zhang, Leye Wang, Dingqi Yang, Xiaojuan Ma, Shijian Li, Zhaohui Wu, Gang Pan, Thi-Mai-Trang Nguyen, and Jérémie Jakubowicz

  5. [5]

    In Proceedings of the 18th Ubicomp

    Dynamic cluster-based over-demand prediction in bike sharing systems. In Proceedings of the 18th Ubicomp . ACM, 841–852

  6. [6]

    Edward Choi, Nan Du, Robert Chen, Le Song, and Jimeng Sun. 2015. Constructing disease network and temporal progression model via context-sensitive hawkes process. In Proceedings of the 15th ICDM . IEEE, 721–726

  7. [7]

    David R Cox. 1992. Regression models and life-tables. In Breakthroughs in statistics. Springer, 527–541

  8. [8]

    Peter J Diggle, Paula Moraga, Barry Rowlingson, Benjamin M Taylor, et al. 2013. Spatial and spatio-temporal log-Gaussian Cox processes: extending the geostati- stical paradigm. Statist. Sci. 28, 4 (2013), 542–563

  9. [9]

    Nan Du, Hanjun Dai, Rakshit Trivedi, Utkarsh Upadhyay, Manuel Gomez- Rodriguez, and Le Song. 2016. Recurrent marked temporal point processes: Embedding event history to vector. In Proceedings of the 22nd KDD . ACM, 1555– 1564

  10. [10]

    Reactive point processes: A new approach to predicting power failures in underground electrical systems

    Seyda Ertekin, Cynthia Rudin, and Tyler H. McCormick. 2015. Reactive point processes: A new approach to predicting power failures in underground electrical systems. Annals of Applied Statistics 9, 1 (2015), 122–144. https://doi.org/10.1214/ 14-AOAS789 arXiv:1505.07661

  11. [11]

    Mehrdad Farajtabar, Nan Du, Manuel Gomez Rodriguez, Isabel Valera, Hongyuan Zha, and Le Song. 2014. Shaping social activity by incentivizing users. InAdvances in Neural Information Processing Systems . 2474–2482

  12. [12]

    Song Gao, Yaoli Wang, Yong Gao, and Yu Liu. 2013. Understanding urban traffic- flow characteristics: a rethinking of betweenness centrality. Environment and Planning B: Planning and Design 40, 1 (2013), 135–153

  13. [13]

    Roch Giorgi et al . 2003. A relative survival regression model using B-spline functions to model non-proportional hazards. Statistics in medicine 22, 17 (2003), 2767–2784

  14. [14]

    Alan G Hawkes. 1971. Spectra of some self-exciting and mutually exciting point processes. Biometrika 58, 1 (1971), 83–90

  15. [15]

    Junichiro Hayano et al. 2011. Increased non-gaussianity of heart rate variability predicts cardiac mortality after an acute myocardial infarction. Frontiers in physiology 2 (2011), 65

  16. [16]

    Dave Higdon. 2002. Space and space-time modeling using process convolutions. In Quantitative methods for current environmental issues . Springer, 37–56

  17. [17]

    Minh X Hoang, Yu Zheng, and Ambuj K Singh. 2016. FCCF: forecasting citywide crowd flows based on big data. InProceedings of the 24th ACM SIGSPATIAL. ACM, 6

  18. [18]

    Tomoharu Iwata, Amar Shah, and Zoubin Ghahramani. 2013. Discovering latent influence in online social activities via shared cascade poisson processes. In Proceedings of the 19th ACM SIGKDD international conference on Knowledge Discovery and Data mining . ACM, 266–274

  19. [19]

    Yoon Kim. 2014. Convolutional neural networks for sentence classification.arXiv preprint arXiv:1408.5882 (2014)

  20. [20]

    Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimi- zation. arXiv preprint arXiv:1412.6980 (2014)

  21. [21]

    Herbert K H Lee, Bruno Sanso, Weining Zhou, and David M Higdon. 2008. Infe- rence for a proton accelerator using convolution models. J. Amer. Statist. Assoc. 103, 482 (2008), 604–613

  22. [22]

    Ricardo T Lemos and Bruno Sansó. 2009. A spatio-temporal model for mean, anomaly, and trend fields of North Atlantic sea surface temperature. J. Amer. Statist. Assoc. 104, 485 (2009), 5–18

  23. [23]

    PA W Lewis and Gerald S Shedler. 1979. Simulation of nonhomogeneous Poisson processes by thinning. Naval research logistics quarterly 26, 3 (1979), 403–413

  24. [24]

    Liangda Li, Hongbo Deng, Anlei Dong, Yi Chang, and Hongyuan Zha. 2014. Identifying and labeling search tasks via query-based hawkes processes. In Pro- ceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 731–740

  25. [25]

    Zhouhan Lin, Minwei Feng, Mo Yu, Bing Xiang, Bowen Zhou, and Yoshua Ben- gio. 2017. A structured self-attentive sentence embedding. arXiv preprint ar- Xiv:1703.03130 (2017), 1–15. arXiv:arXiv:1703.03130v1

  26. [26]

    Jiasen Lu, Caiming Xiong, Devi Parikh, and Richard Socher. 2017. Knowing when to look: Adaptive attention via a visual sentinel for image captioning. In Proceedings of the 30th CVPR , Vol. 6. 2

  27. [27]

    George O Mohler, Martin B Short, P Jeffrey Brantingham, Frederic Paik Schoen- berg, and George E Tita. 2011. Self-exciting point process modeling of crime. J. Amer. Statist. Assoc. 106, 493 (2011), 100–108

  28. [28]

    Yosihiko Ogata. 1998. Space-time point-process models for earthquake occur- rences. Annals of the Institute of Statistical Mathematics 50, 2 (1998), 379–402

  29. [29]

    Michael D Porter, Gentry White, et al . 2012. Self-exciting hurdle models for terrorist activity. The Annals of Applied Statistics 6, 1 (2012), 106–124

  30. [30]

    Frederic Schoenberg, Marc Hoffmann, and Ryan Harrigan. 2017. A recursive point process model for infectious diseases. arXiv preprint arXiv:1703.08202 (2017)

  31. [31]

    Laura Serra, Marc Saez, Jorge Mateu, Diego Varga, Pablo Juan, Carlos Díaz- Ávalos, and Håvard Rue. 2014. Spatio-temporal log-Gaussian Cox processes for modelling wildfire occurrence: the case of Catalonia, 1994–2008. Environmental and Ecological Statistics 21, 3 (2014), 531–563

  32. [32]

    Masamichi Shimosaka, Keisuke Maeda, Takeshi Tsukiji, and Kota Tsubouchi. 2015. Forecasting urban dynamics with mobility logs by bilinear Poisson regression. In Proceedings of the 17th Ubicomp . ACM, 535–546

  33. [33]

    Benjamin M Taylor, Tilman M Davies, Barry S Rowlingson, Peter J Diggle, et al

  34. [34]

    Journal of Statistical Software 52, 4 (2013), 1–40

    lgcp: an R package for inference with spatial and spatio-temporal log- Gaussian Cox processes. Journal of Statistical Software 52, 4 (2013), 1–40

  35. [35]

    Seiya Tokui, Kenta Oono, Shohei Hido, and Justin Clayton. 2015. Chainer: a next-generation open source framework for deep learning. In Proceedings of workshop on Machine Learning Systems (LearningSys) in the 22th NIPS, Vol. 5. 1–6

  36. [36]

    Pengfei Wang, Yanjie Fu, Guannan Liu, Wenqing Hu, and Charu Aggarwal. 2017. Human mobility synchronization and trip purpose detection with mixture of hawkes processes. In Proceedings of the 23rd KDD . ACM, 495–503

  37. [37]

    Senzhang Wang, Lifang He, Leon Stenneth, Philip S Yu, and Zhoujun Li. 2015. Citywide traffic congestion estimation with social media. In Proceedings of the 23rd SIGSPATIAL. ACM, 34

  38. [38]

    Holger Wendland. 1995. Piecewise polynomial, positive definite and compact- ly supported radial functions of minimal degree. Advances in Computational Mathematics 4, 1 (1995), 389–396

  39. [39]

    Shuai Xiao, Mehrdad Farajtabar, Xiaojing Ye, Junchi Yan, Le Song, and Hongyuan Zha. 2017. Wasserstein learning of deep generative point process models. In Proceedings of the 30th NIPS . 3247–3257

  40. [40]

    Shuai Xiao, Junchi Yan, Xiaokang Yang, Hongyuan Zha, and Stephen M Chu

  41. [41]

    In Proceedings of the 31th AAAI , Vol

    Modeling the Intensity Function of Point Process Via Recurrent Neural Networks.. In Proceedings of the 31th AAAI , Vol. 17. 1597–1603

  42. [42]

    Junbo Zhang, Yu Zheng, and Dekang Qi. 2017. Deep Spatio-Temporal Residual Networks for Citywide Crowd Flows Prediction.. In Proceedings of the 30th AAAI . 1655–1661

  43. [43]

    Tian Zhou, Lixin Gao, and Daiheng Ni. 2014. Road traffic prediction by incorpo- rating online information. In Proceedings of the 23rd WWW . ACM, 1235–1240. Abbildung 7: The architecture of the neural network used in the proposed method. APPENDIX A NEURAL NETWORK ARCHITECTURE This section details the architecture of the neural network used in our experimen...