Latent Block-Diffusion Temporal Point Processes: A Semi-Autoregressive Framework for Asynchronous Event Sequence Generation
Pith reviewed 2026-06-26 00:33 UTC · model grok-4.3
The pith
LBDTPP generates variable-length event sequences by autoregressing over latent blocks while diffusing inside them.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
LBDTPP defines an autoregressive probability distribution over event blocks in latent space and performs Gaussian diffusion within each block, enabling sequential generation of blocks with simultaneous event sampling inside each block; Wasserstein error bounds are derived to show that this block-wise approach reduces error accumulation compared with event-wise autoregressive generation under local approximation and prefix-stability assumptions.
What carries the argument
The latent block diffusion mechanism, which autoregresses over blocks in latent space while diffusing events inside each block.
If this is right
- Block-wise generation reduces error accumulation relative to event-wise autoregressive generation.
- The model produces variable-length sequences while enabling parallel sampling within blocks.
- Latent-space operation supports high-quality generation comparable to diffusion models.
- Performance gains appear on six benchmark datasets for both unconditional and conditional tasks.
- Generation quality trades off against chosen block size.
Where Pith is reading between the lines
- Similar block structures might apply to other sequence models where error accumulation limits long-horizon generation.
- The identified quality-block-size trade-off could guide design choices in related diffusion or autoregressive hybrids.
- Prefix-stability assumptions may be examined in forecasting settings outside temporal point processes.
Load-bearing premise
The Wasserstein error bounds hold under suitable local approximation and prefix-stability assumptions.
What would settle it
A controlled experiment that measures accumulated generation error over increasingly long sequences for block-wise versus fully event-wise autoregressive sampling and checks whether the measured growth matches the predicted reduction from the bounds.
Figures
read the original abstract
Modeling and sampling from the underlying distribution of asynchronous event sequences are crucial in various real-world applications, including social networks, medical diagnosis, and financial transactions. Existing autoregressive methods suffer from error accumulation during multi-step generation, while non-autoregressive diffusion methods are typically limited to fixed-length output sequences. In this paper, we propose Latent Block-Diffusion Temporal Point Processes (LBDTPP), a novel semi-autoregressive TPP framework that introduces a latent block diffusion mechanism for high-quality and variable-length event sequence generation. The core idea is to define an autoregressive probability distribution over event blocks in latent space and perform Gaussian diffusion within each block. By sequentially generating blocks while simultaneously sampling events in each block, LBDTPP preserves the length flexibility of autoregressive TPPs and inherits the parallel high-quality generation capability of diffusion models. Theoretically, we derive Wasserstein error bounds showing that, under suitable local approximation and prefix-stability assumptions, block-wise generation can reduce error accumulation compared with event-wise autoregressive generation. Extensive experiments on six real-world benchmark datasets demonstrate that LBDTPP outperforms state-of-the-art TPP baselines in both unconditional and conditional generation tasks. Further empirical analyses verify the benefits of latent-space diffusion and block-wise generation, and reveal the trade-off between generation quality and block size. Our code is available at https://github.com/Zh-Shuai/LBDTPP.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes Latent Block-Diffusion Temporal Point Processes (LBDTPP), a semi-autoregressive TPP framework that defines an autoregressive distribution over latent event blocks and applies Gaussian diffusion within each block. This is claimed to combine the variable-length flexibility of autoregressive TPPs with the parallel sampling quality of diffusion models. The central theoretical result is a derivation of Wasserstein error bounds showing that, under local approximation and prefix-stability assumptions, block-wise generation reduces error accumulation relative to event-wise autoregression. Experiments on six real-world datasets are reported to show outperformance over SOTA baselines in unconditional and conditional generation, with additional analyses of latent-space diffusion benefits and block-size trade-offs.
Significance. If the Wasserstein bounds hold under the stated assumptions and the empirical gains are reproducible, the work would offer a principled semi-autoregressive alternative for long asynchronous sequences, addressing a known limitation of pure autoregressive TPPs while retaining length flexibility. Code release aids verification.
major comments (2)
- [Abstract / theoretical contribution] Abstract / theoretical contribution paragraph: the Wasserstein error bounds are derived conditional on 'suitable local approximation and prefix-stability assumptions,' yet neither assumption is defined, nor is their scope or validity demonstrated for typical TPP intensity functions or the learned latent blocks. This makes the central claim that block-wise generation reduces error accumulation rest on unverified premises whose failure would invalidate the stated advantage.
- [Experiments] Experiments section (as summarized in abstract): superiority is asserted on six datasets without any reported metrics, baseline names, or controls visible in the provided description, preventing assessment of whether the empirical results actually support the framework's claimed benefits over autoregressive and diffusion baselines.
minor comments (1)
- The abstract mentions a trade-off between generation quality and block size; a more quantitative characterization of this trade-off (e.g., via a dedicated figure or table) would strengthen the empirical analysis.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback. We address each major comment below and indicate planned revisions.
read point-by-point responses
-
Referee: [Abstract / theoretical contribution] Abstract / theoretical contribution paragraph: the Wasserstein error bounds are derived conditional on 'suitable local approximation and prefix-stability assumptions,' yet neither assumption is defined, nor is their scope or validity demonstrated for typical TPP intensity functions or the learned latent blocks. This makes the central claim that block-wise generation reduces error accumulation rest on unverified premises whose failure would invalidate the stated advantage.
Authors: The assumptions are formally defined in Section 3.2 of the manuscript: local approximation requires that the latent block model approximates the conditional intensity over bounded time intervals, while prefix-stability requires that the distribution over subsequent blocks remains Lipschitz-continuous in the generated prefix under the learned latent dynamics. The Wasserstein bounds are derived directly from these. We agree the abstract omits the definitions and will revise it to include concise definitions of both assumptions. The derivation itself does not include explicit verification of the assumptions on real intensity functions; we will add a short discussion paragraph in Section 5 addressing applicability to common TPPs such as Hawkes processes. revision: partial
-
Referee: [Experiments] Experiments section (as summarized in abstract): superiority is asserted on six datasets without any reported metrics, baseline names, or controls visible in the provided description, preventing assessment of whether the empirical results actually support the framework's claimed benefits over autoregressive and diffusion baselines.
Authors: The abstract is intentionally high-level. Section 4 and the associated tables provide the requested details: quantitative results (negative log-likelihood, event-type prediction accuracy, and sequence-level Wasserstein distance) on the six datasets, with explicit comparisons against RMTPP, Neural Hawkes, CTMC, and diffusion baselines, plus ablation controls for block size and latent versus data-space diffusion. We will revise the abstract to name the primary baselines and state that full metrics appear in Section 4. revision: partial
Circularity Check
Wasserstein error bounds derived from explicit assumptions; no reduction to inputs by construction
full rationale
The paper's theoretical contribution consists of deriving Wasserstein error bounds that establish reduced error accumulation for block-wise generation relative to event-wise autoregression, conditional on the explicitly stated premises of local approximation and prefix-stability. These premises function as independent assumptions rather than quantities defined in terms of the bound itself or obtained via fitting. No equations, mechanisms, or claims in the provided text reduce by construction to self-definitions, fitted parameters renamed as predictions, or load-bearing self-citations. The latent block diffusion framework is introduced as an architectural novelty without re-expressing prior results. The derivation chain is therefore self-contained against external benchmarks and does not match any enumerated circularity pattern.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption local approximation assumption
- domain assumption prefix-stability assumption
invented entities (1)
-
Latent block diffusion mechanism
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Hawkes processes with stochastic exogenous effects for continuous-time interaction modelling,
X. Fan, Y . Li, L. Chen, B. Li, and S. A. Sisson, “Hawkes processes with stochastic exogenous effects for continuous-time interaction modelling,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 2, pp. 1848–1861, 2022
2022
-
[2]
Easydgl: Encode, train and interpret for continuous-time dynamic graph learning,
C. Chen, H. Geng, N. Yang, X. Yang, and J. Yan, “Easydgl: Encode, train and interpret for continuous-time dynamic graph learning,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 46, no. 12, pp. 10 845–10 862, 2024
2024
-
[3]
Learning network-structured dependence from non-stationary multivariate point process data,
M. Gao, C. Zhang, and J. Zhou, “Learning network-structured dependence from non-stationary multivariate point process data,”IEEE Transactions on Information Theory, vol. 70, no. 8, pp. 5935–5968, 2024
2024
-
[4]
Neural temporal point processes for modelling electronic health records,
J. Enguehard, D. Busbridge, A. Bozson, C. Woodcock, and N. Hammerla, “Neural temporal point processes for modelling electronic health records,” inMachine Learning for Health. PMLR, 2020, pp. 85–113
2020
-
[5]
Event outlier detection in continuous time,
S. Liu and M. Hauskrecht, “Event outlier detection in continuous time,” inInternational Conference on Machine Learning, 2021
2021
-
[6]
Detecting anomalous event sequences with temporal point processes,
O. Shchur, A. C. Turkmen, T. Januschowski, J. Gasthaus, and S. G ¨unnemann, “Detecting anomalous event sequences with temporal point processes,”Advances in Neural Information Processing Systems, 2021
2021
-
[7]
Language models can improve event prediction by few-shot abductive reasoning,
X. Shi, S. Xue, K. Wang, F. Zhou, J. Zhang, J. Zhou, C. Tan, and H. Mei, “Language models can improve event prediction by few-shot abductive reasoning,”Advances in Neural Information Processing Systems, 2023
2023
-
[8]
Interacting diffusion processes for event sequence forecasting,
M. Zeng, F. Regol, and M. Coates, “Interacting diffusion processes for event sequence forecasting,” inInternational Conference on Machine Learning, 2024
2024
-
[9]
Danmakutpp- bench: A multi-modal benchmark for temporal point process modeling and understanding,
Y . Jiang, J. Li, Y . Liu, D. Yang, F. Zhou, and Q. Kong, “Danmakutpp- bench: A multi-modal benchmark for temporal point process modeling and understanding,”Advances in Neural Information Processing Systems, 2025
2025
-
[10]
On lewis’ simulation method for point processes,
Y . Ogata, “On lewis’ simulation method for point processes,”IEEE Transactions on Information Theory, 1981
1981
-
[11]
Add and thin: Diffusion for temporal point processes,
D. L ¨udke, M. Bilo ˇs, O. Shchur, M. Lienen, and S. G ¨unnemann, “Add and thin: Diffusion for temporal point processes,”Advances in Neural Information Processing Systems, 2023
2023
-
[12]
Unlocking point processes through point set diffusion,
D. L ¨udke, E. R. Ravent ´os, M. Kollovieh, and S. G ¨unnemann, “Unlocking point processes through point set diffusion,” inInternational Conference on Learning Representations, 2025
2025
-
[13]
Recurrent marked temporal point processes: Embedding event history to vector,
N. Du, H. Dai, R. Trivedi, U. Upadhyay, M. Gomez-Rodriguez, and L. Song, “Recurrent marked temporal point processes: Embedding event history to vector,” inProceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2016
2016
-
[14]
Transformer hawkes process,
S. Zuo, H. Jiang, Z. Li, T. Zhao, and H. Zha, “Transformer hawkes process,” inInternational Conference on Machine Learning, 2020
2020
-
[15]
Easytpp: Towards open benchmarking the temporal point processes,
S. Xue, X. Shi, Z. Chu, Y . Wang, F. Zhou, H. Hao, C. Jiang, C. Pan, Y . Xu, J. Y . Zhanget al., “Easytpp: Towards open benchmarking the temporal point processes,”International Conference on Learning Representations, 2024
2024
-
[16]
D. J. Daley, D. Vere-Joneset al.,An introduction to the theory of point processes: volume I: elementary theory and methods. Springer, 2003
2003
-
[17]
D. L. Snyder and M. I. Miller,Random point processes in time and space. Springer Science & Business Media, 2012
2012
-
[18]
Point processes for unsuper- vised line network extraction in remote sensing,
C. Lacoste, X. Descombes, and J. Zerubia, “Point processes for unsuper- vised line network extraction in remote sensing,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 27, no. 10, pp. 1568–1579, 2005
2005
-
[19]
A marked point process of rectangles and segments for automatic analysis of digital elevation mod- els,
M. Ortner, X. Descombes, and J. Zerubia, “A marked point process of rectangles and segments for automatic analysis of digital elevation mod- els,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 30, no. 1, pp. 105–119, 2008
2008
-
[20]
J. F. C. Kingman,Poisson processes. Clarendon Press, 1992, vol. 3
1992
-
[21]
A self-correcting point process,
V . Isham and M. Westcott, “A self-correcting point process,”Stochastic processes and their applications, vol. 8, no. 3, pp. 335–347, 1979
1979
-
[22]
Spectra of some self-exciting and mutually exciting point processes,
A. G. Hawkes, “Spectra of some self-exciting and mutually exciting point processes,”Biometrika, vol. 58, no. 1, pp. 83–90, 1971
1971
-
[23]
Self-attentive hawkes process,
Q. Zhang, A. Lipani, O. Kirnap, and E. Yilmaz, “Self-attentive hawkes process,” inInternational Conference on Machine Learning, 2020
2020
-
[24]
Transformer embeddings of irregularly spaced events and their participants,
C. Yang, H. Mei, and J. Eisner, “Transformer embeddings of irregularly spaced events and their participants,” inInternational Conference on Learning Representations, 2022
2022
-
[25]
Eventflow: Forecasting continuous-time event data with flow matching,
G. Kerrigan, K. Nelson, and P. Smyth, “Eventflow: Forecasting continuous-time event data with flow matching,”arXiv e-prints, pp. arXiv–2410, 2024
2024
-
[26]
Hypro: A hybridly normalized probabilistic model for long-horizon prediction of event sequences,
S. Xue, X. Shi, J. Zhang, and H. Mei, “Hypro: A hybridly normalized probabilistic model for long-horizon prediction of event sequences,” Advances in Neural Information Processing Systems, 2022
2022
-
[27]
Deep unsupervised learning using nonequilibrium thermodynamics,
J. Sohl-Dickstein, E. Weiss, N. Maheswaranathan, and S. Ganguli, “Deep unsupervised learning using nonequilibrium thermodynamics,” in International Conference on Machine Learning, 2015
2015
-
[28]
Denoising diffusion probabilistic models,
J. Ho, A. Jain, and P. Abbeel, “Denoising diffusion probabilistic models,” Advances in Neural Information Processing Systems, 2020
2020
-
[29]
Denoising diffusion implicit models,
J. Song, C. Meng, and S. Ermon, “Denoising diffusion implicit models,” inInternational Conference on Learning Representations, 2021
2021
-
[30]
Score-based generative modeling through stochastic differential equations,
Y . Song, J. Sohl-Dickstein, D. P. Kingma, A. Kumar, S. Ermon, and B. Poole, “Score-based generative modeling through stochastic differential equations,” inInternational Conference on Learning Representations, 2021
2021
-
[31]
High-resolution image synthesis with latent diffusion models,
R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer, “High-resolution image synthesis with latent diffusion models,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022. JOURNAL OF LATEX CLASS FILES, VOL. 18, NO. 9, SEPTEMBER 2020 16
2022
-
[32]
Dpm-solver: A fast ode solver for diffusion probabilistic model sampling in around 10 steps,
C. Lu, Y . Zhou, F. Bao, J. Chen, C. Li, and J. Zhu, “Dpm-solver: A fast ode solver for diffusion probabilistic model sampling in around 10 steps,”Advances in Neural Information Processing Systems, 2022
2022
-
[33]
All are worth words: A vit backbone for diffusion models,
F. Bao, S. Nie, K. Xue, Y . Cao, C. Li, H. Su, and J. Zhu, “All are worth words: A vit backbone for diffusion models,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023
2023
-
[34]
Diffusion-lm improves controllable text generation,
X. Li, J. Thickstun, I. Gulrajani, P. S. Liang, and T. B. Hashimoto, “Diffusion-lm improves controllable text generation,”Advances in Neural Information Processing Systems, 2022
2022
-
[35]
Large language diffusion models,
S. Nie, F. Zhu, Z. You, X. Zhang, J. Ou, J. Hu, J. Zhou, Y . Lin, J.-R. Wen, and C. Li, “Large language diffusion models,”Advances in Neural Information Processing Systems, 2025
2025
-
[36]
Diffusion llms can do faster-than-ar inference via discrete diffusion forcing,
X. Wang, C. Xu, Y . Jin, J. Jin, H. Zhang, and Z. Deng, “Diffusion llms can do faster-than-ar inference via discrete diffusion forcing,”International Conference on Learning Representations, 2026
2026
-
[37]
Non-autoregressive diffusion-based temporal point processes for continuous-time long-term event prediction,
W.-T. Zhou, Z. Kang, L. Tian, J. Zhang, and Y . Liu, “Non-autoregressive diffusion-based temporal point processes for continuous-time long-term event prediction,”Expert Systems with Applications, 2025
2025
-
[38]
Block diffusion: Interpolating between autoregressive and diffusion language models,
M. Arriola, A. Gokaslan, J. T. Chiu, Z. Yang, Z. Qi, J. Han, S. S. Sahoo, and V . Kuleshov, “Block diffusion: Interpolating between autoregressive and diffusion language models,” inInternational Conference on Learning Representations, 2025
2025
-
[39]
Extracting geometric structures in images with delaunay point processes,
J.-D. Favreau, F. Lafarge, A. Bousseau, and A. Auvolat, “Extracting geometric structures in images with delaunay point processes,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 42, no. 4, pp. 837–850, 2019
2019
-
[40]
An extensive survey with empirical studies on deep temporal point process,
H. Lin, C. Tan, L. Wu, Z. Liu, Z. Gao, and S. Z. Li, “An extensive survey with empirical studies on deep temporal point process,”IEEE Transactions on Knowledge and Data Engineering, vol. 37, no. 4, pp. 1599–1619, 2024
2024
-
[41]
Advances in temporal point processes: Bayesian, neural, and llm approaches,
F. Zhou, Q. Kong, J. Qiao, C. Wan, Y . Zhang, and R. Cai, “Advances in temporal point processes: Bayesian, neural, and llm approaches,” Transactions on Machine Learning Research, 2026
2026
-
[42]
Coevolve: A joint point process model for information diffusion and network evolution,
M. Farajtabar, Y . Wang, M. Gomez-Rodriguez, S. Li, H. Zha, and L. Song, “Coevolve: A joint point process model for information diffusion and network evolution,”Journal of Machine Learning Research, vol. 18, no. 41, pp. 1–49, 2017
2017
-
[43]
Modeling the intensity function of point process via recurrent neural networks,
S. Xiao, J. Yan, X. Yang, H. Zha, and S. Chu, “Modeling the intensity function of point process via recurrent neural networks,” inProceedings of the AAAI Conference on Artificial Intelligence, 2017
2017
-
[44]
The neural hawkes process: A neurally self- modulating multivariate point process,
H. Mei and J. M. Eisner, “The neural hawkes process: A neurally self- modulating multivariate point process,”Advances in Neural Information Processing Systems, 2017
2017
-
[45]
Neural jump-diffusion temporal point processes,
S. Zhang, C. Zhou, Y . A. Liu, P. Zhang, X. Lin, and Z.-M. Ma, “Neural jump-diffusion temporal point processes,” inInternational Conference on Machine Learning, 2024
2024
-
[46]
Intensity-free learning of temporal point processes,
O. Shchur, M. Bilo ˇs, and S. G ¨unnemann, “Intensity-free learning of temporal point processes,” inInternational Conference on Learning Representations, 2020
2020
-
[47]
Multiple hypothesis testing for anomaly detection in multi-type event sequences,
S. Zhang, C. Zhou, P. Zhang, Y . Liu, Z. Li, and H. Chen, “Multiple hypothesis testing for anomaly detection in multi-type event sequences,” in2023 IEEE International Conference on Data Mining, 2023
2023
-
[48]
Decomposable transformer point processes,
A. Panos, “Decomposable transformer point processes,”Advances in Neural Information Processing Systems, 2024
2024
-
[49]
Conformal anomaly detection in event sequences,
S. Zhang, C. Zhou, Y . Liu, P. Zhang, X. Lin, and S. Pan, “Conformal anomaly detection in event sequences,” inInternational Conference on Machine Learning, 2025
2025
-
[50]
Spatio-temporal diffusion point processes,
Y . Yuan, J. Ding, C. Shao, D. Jin, and Y . Li, “Spatio-temporal diffusion point processes,” inProceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2023
2023
-
[51]
Edit-based flow matching for temporal point processes,
D. L ¨udke, M. Lienen, M. Kollovieh, and S. G ¨unnemann, “Edit-based flow matching for temporal point processes,” inInternational Conference on Learning Representations, 2026
2026
-
[52]
Flow matching for generative modeling,
Y . Lipman, R. T. Chen, H. Ben-Hamu, M. Nickel, and M. Le, “Flow matching for generative modeling,” inInternational Conference on Learning Representations, 2023
2023
-
[53]
Edit flows: Variable length discrete flow matching with sequence-level edit operations,
M. Havasi, B. Karrer, I. Gat, and R. T. Chen, “Edit flows: Variable length discrete flow matching with sequence-level edit operations,” in The Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025
2025
-
[54]
Ssd-lm: Semi-autoregressive simplex-based diffusion language model for text generation and modular control,
X. Han, S. Kumar, and Y . Tsvetkov, “Ssd-lm: Semi-autoregressive simplex-based diffusion language model for text generation and modular control,” inProceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023
2023
-
[55]
Encoder- decoder diffusion language models for efficient training and inference,
M. Arriola, Y . Schiff, H. Phung, A. Gokaslan, and V . Kuleshov, “Encoder- decoder diffusion language models for efficient training and inference,” inThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025
2025
-
[56]
Argmax flows and multinomial diffusion: Learning categorical distributions,
E. Hoogeboom, D. Nielsen, P. Jaini, P. Forr ´e, and M. Welling, “Argmax flows and multinomial diffusion: Learning categorical distributions,” Advances in Neural Information Processing Systems, 2021
2021
-
[57]
Structured denoising diffusion models in discrete state-spaces,
J. Austin, D. D. Johnson, J. Ho, D. Tarlow, and R. Van Den Berg, “Structured denoising diffusion models in discrete state-spaces,”Advances in Neural Information Processing Systems, 2021
2021
-
[58]
Next block prediction: Video genera- tion via semi-autoregressive modeling,
S. Ren, S. Ma, X. Sun, and F. Wei, “Next block prediction: Video genera- tion via semi-autoregressive modeling,”arXiv preprint arXiv:2502.07737, 2025
arXiv 2025
-
[59]
Sana-video: Efficient video generation with block linear diffusion transformer,
J. Chen, Y . Zhao, J. Yu, R. Chu, J. Chen, S. Yang, X. Wang, Y . Pan, D. Zhou, H. Linget al., “Sana-video: Efficient video generation with block linear diffusion transformer,”arXiv preprint arXiv:2509.24695, 2025
arXiv 2025
-
[60]
Blockvid: Block diffusion for high-quality and consistent minute-long video generation,
Z. Zhang, S. Chang, Y . He, Y . Han, J. Tang, F. Wang, and B. Zhuang, “Blockvid: Block diffusion for high-quality and consistent minute-long video generation,”arXiv preprint arXiv:2511.22973, 2025
Pith/arXiv arXiv 2025
-
[61]
Fast-dllm: Training-free acceleration of diffusion llm by enabling kv cache and parallel decoding,
C. Wu, H. Zhang, S. Xue, Z. Liu, S. Diao, L. Zhu, P. Luo, S. Han, and E. Xie, “Fast-dllm: Training-free acceleration of diffusion llm by enabling kv cache and parallel decoding,”International Conference on Learning Representations, 2026
2026
-
[62]
Fast-dllm v2: Efficient block-diffusion llm,
C. Wu, H. Zhang, S. Xue, S. Diao, Y . Fu, Z. Liu, P. Molchanov, P. Luo, S. Han, and E. Xie, “Fast-dllm v2: Efficient block-diffusion llm,”International Conference on Learning Representations, 2026
2026
-
[63]
Lecture notes: Temporal point processes and the conditional intensity function,
J. G. Rasmussen, “Lecture notes: Temporal point processes and the conditional intensity function,”arXiv preprint arXiv:1806.00221, 2018
Pith/arXiv arXiv 2018
-
[64]
Probabilistic querying of continuous-time event sequences,
A. Boyd, Y . Chang, S. Mandt, and P. Smyth, “Probabilistic querying of continuous-time event sequences,” inInternational Conference on Artificial Intelligence and Statistics, 2023
2023
-
[65]
Understanding diffusion models: A unified perspective,
C. Luo, “Understanding diffusion models: A unified perspective,”arXiv preprint arXiv:2208.11970, 2022
arXiv 2022
-
[66]
Attention is all you need,
A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,”Advances in Neural Information Processing Systems, 2017
2017
-
[67]
FOILing NYC’s taxi trip data,
C. Whong, “FOILing NYC’s taxi trip data,” 2014
2014
-
[68]
Learning tree-based deep model for recommender systems,
H. Zhu, X. Li, P. Zhang, G. Li, J. He, H. Li, and K. Gai, “Learning tree-based deep model for recommender systems,” inProceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2018
2018
-
[69]
Snap datasets: Stanford large network dataset collection,
J. Leskovec and A. Krevl, “Snap datasets: Stanford large network dataset collection,” 2014
2014
-
[70]
Learning triggering kernels for multi- dimensional hawkes processes,
K. Zhou, H. Zha, and L. Song, “Learning triggering kernels for multi- dimensional hawkes processes,” inInternational Conference on Machine Learning, 2013
2013
-
[71]
Predicting dynamic embedding trajectory in temporal interaction networks,
S. Kumar, X. Zhang, and J. Leskovec, “Predicting dynamic embedding trajectory in temporal interaction networks,” inProceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2019
2019
-
[72]
On the predictive accuracy of neural temporal point process models for continuous-time event data,
T. Bosser and S. B. Taieb, “On the predictive accuracy of neural temporal point process models for continuous-time event data,”Transactions on Machine Learning Research, 2023
2023
-
[73]
Justifying recommendations using distantly- labeled reviews and fine-grained aspects,
J. Ni, J. Li, and J. McAuley, “Justifying recommendations using distantly- labeled reviews and fine-grained aspects,” inProceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), 2019
2019
-
[74]
Deep continuous-time state-space models for marked event sequences,
Y . Chang, A. J. Boyd, C. Xiao, T. Kass-Hout, P. Bhatia, P. Smyth, and A. Warrington, “Deep continuous-time state-space models for marked event sequences,” inThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025
2025
-
[75]
Long horizon forecasting with temporal point processes,
P. Deshpande, K. Marathe, A. De, and S. Sarawagi, “Long horizon forecasting with temporal point processes,” inInternational Conference on Web Search and Data Mining, 2021
2021
-
[76]
Exploring generative neural temporal point process,
H. Lin, L. Wu, G. Zhao, L. Pai, and S. Z. Li, “Exploring generative neural temporal point process,”Transactions on Machine Learning Research, 2022
2022
-
[77]
Imputing missing events in continuous- time event streams,
H. Mei, G. Qin, and J. Eisner, “Imputing missing events in continuous- time event streams,” inInternational Conference on Machine Learning, 2019
2019
-
[78]
Automatic differentiation in pytorch,
A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, and A. Lerer, “Automatic differentiation in pytorch,”NIPS-W, 2017
2017
-
[79]
Adam: A method for stochastic optimization,
D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” International Conference on Learning Representations, 2015. JOURNAL OF LATEX CLASS FILES, VOL. 18, NO. 9, SEPTEMBER 2020 17 SUPPLEMENTARYMATERIAL A. Derivation of NELBO for LBDTPP in Latent Space Proof of Proposition 1. Given the latent event sequence rep- resentation z= (z 1, . . . ,zL...
2015
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.