pith. sign in

arxiv: 2602.18358 · v3 · submitted 2026-02-20 · 📊 stat.AP · q-fin.ST

Forecasting the Evolving Composition of Inbound Tourism Demand: A Bayesian Compositional Time Series Approach Using Platform Booking Data

Pith reviewed 2026-05-15 20:48 UTC · model grok-4.3

classification 📊 stat.AP q-fin.ST
keywords tourism demand forecastingcompositional time seriesBayesian modelingDirichlet autoregressive moving averageAirbnb datamarket share compositionpandemic impacts
0
0 comments X

The pith

A Bayesian Dirichlet autoregressive moving average model forecasts the evolving shares of tourist origin markets using booking data and outperforms standard methods in key regions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops Bayesian Dirichlet autoregressive moving average models to forecast how the mix of guest origins in tourism changes over time. It applies these to Airbnb data from 2017 to 2025 across major regions, identifying pandemic-induced shifts in composition. The approach models the data directly on the simplex to ensure forecasts add up to one hundred percent. This matters because accurate predictions of source markets help destinations allocate marketing resources and plan for recovery or crises. The models show particular strength where multiple markets compete for similar shares.

Core claim

The central discovery is that BDARMA models, by using a Dirichlet likelihood and allowing seasonal variation in mean and precision, achieve the lowest forecast errors for EMEA destinations with 27 percent lower error than naive methods, while producing coherent forecasts that respect the unit-sum constraint and capture complex temporal dependencies including structural breaks from the pandemic.

What carries the argument

The BDARMA model, a Bayesian time series approach that models compositional data on the simplex using a Dirichlet distribution for the likelihood and autoregressive moving average dynamics in the parameters.

If this is right

  • Destination stakeholders gain probabilistic forecasts of source market shares for strategic planning.
  • Marketing resource allocation and infrastructure investment can be informed by expected changes in origin markets.
  • The model captures heterogeneous recovery patterns across markets after structural breaks.
  • Greatest accuracy gains occur in markets where origins compete in the five to twenty-five percent share range.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Similar compositional forecasting could apply to other time-varying share data such as product market shares or election polling.
  • Combining the model with real-time indicators might allow earlier detection of demand shifts.
  • Extending the framework to include external covariates like economic indicators could improve long-term projections.

Load-bearing premise

The Airbnb booking data must accurately represent the overall inbound tourism composition without significant bias, and the BDARMA specification must capture pandemic structural breaks without overfitting.

What would settle it

Collecting new booking data after 2025 and checking whether BDARMA forecasts continue to show lower error than naive and other benchmarks in out-of-sample tests would confirm or refute the performance claims.

Figures

Figures reproduced from arXiv: 2602.18358 by Harrison Katz.

Figure 1
Figure 1. Figure 1: Guest origin market shares by destination region, January 2017–December 2025. Stacked area [PITH_FULL_IMAGE:figures/full_fig_p015_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Market concentration over time by destination region. The Herfindahl-Hirschman Index (HHI) [PITH_FULL_IMAGE:figures/full_fig_p016_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Average autocorrelation of CLR-transformed origin shares by destination region. All regions show [PITH_FULL_IMAGE:figures/full_fig_p017_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Seasonal pattern in compositional deviation for EMEA. Boxplots show Aitchison distance from [PITH_FULL_IMAGE:figures/full_fig_p018_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Model comparison via leave-one-out cross-validation for EMEA. Points indicate posterior mean [PITH_FULL_IMAGE:figures/full_fig_p019_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Forecast accuracy comparison for EMEA destination. BDARMA achieves the lowest mean absolute [PITH_FULL_IMAGE:figures/full_fig_p021_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Illustrative BDARMA(2,1) forecast from the first evaluation origin (January 2022) for EMEA, [PITH_FULL_IMAGE:figures/full_fig_p023_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Pandemic impact on origin market composition by destination region. Lines show changes in market [PITH_FULL_IMAGE:figures/full_fig_p025_8.png] view at source ↗
read the original abstract

Understanding how the composition of guest origin markets evolves over time is critical for destination marketing organizations, hospitality businesses, and tourism planners. We develop and apply Bayesian Dirichlet autoregressive moving average (BDARMA) models to forecast the compositional dynamics of guest origin market shares using proprietary Airbnb booking data spanning 2017--2025 across four major destination regions. Our analysis reveals substantial pandemic-induced structural breaks in origin composition, with heterogeneous recovery patterns across markets. In our analysis, the BDARMA framework achieves the lowest forecast error for EMEA and competitive performance across destination regions, outperforming standard benchmarks including na\"ive forecasts, exponential smoothing, and SARIMA on log-ratio transformed data in compositionally complex markets. For EMEA destinations, BDARMA achieves 27% lower forecast error than na\"ive methods ($p < 0.001$), with the greatest gains where multiple origin markets compete in the 5-25% share range. By modeling compositions directly on the simplex with a Dirichlet likelihood and incorporating seasonal variation in both mean and precision parameters, our approach produces coherent forecasts that respect the unit-sum constraint while capturing complex temporal dependencies. The methodology provides destination stakeholders with probabilistic forecasts of source market shares, enabling more informed strategic planning for marketing resource allocation, infrastructure investment, and crisis response.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper develops and applies Bayesian Dirichlet autoregressive moving average (BDARMA) models to forecast the time evolution of inbound tourism origin-market shares from proprietary Airbnb booking data (2017–2025) across four destination regions. It reports pandemic-induced structural breaks, heterogeneous recovery patterns, and that the BDARMA specification produces the lowest forecast error for EMEA destinations, achieving a 27% reduction relative to naïve benchmarks (p < 0.001) with largest gains in the 5–25% share range; forecasts are claimed to remain coherent on the simplex by construction.

Significance. If the reported error reductions hold after proper validation of the data source and full disclosure of the model, the work would supply a practical, probabilistically coherent tool for destination marketing organizations to anticipate shifts in source-market composition. The emphasis on seasonal mean and precision parameters and direct simplex modeling addresses a recurring need in tourism analytics where standard time-series methods violate the unit-sum constraint.

major comments (3)
  1. [Data section] Data section: the central claim that Airbnb booking shares constitute a faithful proxy for overall inbound tourism composition is load-bearing for the 27% EMEA error-reduction result, yet no comparison against official statistics (UNWTO, Eurostat, or national arrival counts) is provided; pandemic-era channel shifts could systematically bias shares in the 5–25% range highlighted as the region of greatest improvement.
  2. [Methods] Methods: the abstract and main text supply no explicit BDARMA model equations, prior specifications for the autoregressive/moving-average coefficients, or the functional form of the seasonal precision parameters, rendering the reported forecast-error comparisons unverifiable and preventing assessment of whether the 27% gain is an artifact of the chosen likelihood or cross-validation scheme.
  3. [Results] Results, EMEA panel: the p < 0.001 significance for the 27% error reduction is presented without details on the exact loss function, number of hold-out periods, or correction for multiple comparisons across regions and origin markets, weakening the strength of the outperformance claim relative to the naïve, ETS, and log-ratio SARIMA benchmarks.
minor comments (2)
  1. [Abstract] Abstract: the notation “naïve” appears with an escaped backslash; standardize to “naïve” throughout.
  2. [Figures] Figure captions: ensure all panels are labeled with the exact forecast horizon and region so that the 5–25% share gains can be directly located.

Simulated Author's Rebuttal

3 responses · 1 unresolved

We thank the referee for the constructive and detailed comments, which have helped us improve the clarity and verifiability of the manuscript. We address each major point below and have revised the paper accordingly.

read point-by-point responses
  1. Referee: [Data section] Data section: the central claim that Airbnb booking shares constitute a faithful proxy for overall inbound tourism composition is load-bearing for the 27% EMEA error-reduction result, yet no comparison against official statistics (UNWTO, Eurostat, or national arrival counts) is provided; pandemic-era channel shifts could systematically bias shares in the 5–25% range highlighted as the region of greatest improvement.

    Authors: We acknowledge that a direct comparison to official statistics would strengthen the proxy claim. Due to the proprietary nature of the Airbnb booking dataset, matched official arrival counts at the required granularity are not available to us. In the revised Data section we have added an explicit limitations paragraph discussing potential channel-shift biases during the pandemic period and emphasizing that the data are best suited for platform-specific compositional forecasting rather than total inbound tourism. We retain the 27% EMEA result as a platform-specific finding while noting the scope limitation. revision: partial

  2. Referee: [Methods] Methods: the abstract and main text supply no explicit BDARMA model equations, prior specifications for the autoregressive/moving-average coefficients, or the functional form of the seasonal precision parameters, rendering the reported forecast-error comparisons unverifiable and preventing assessment of whether the 27% gain is an artifact of the chosen likelihood or cross-validation scheme.

    Authors: We have inserted the full BDARMA specification into the Methods section, including the Dirichlet likelihood, the vector autoregressive-moving-average structure on the logit-transformed mean, the exact prior distributions (independent normal priors on AR/MA coefficients with variance 1, and gamma priors on precision parameters), and the seasonal formulation of the precision parameter as a linear function of monthly indicators. These additions render the model fully reproducible and allow readers to verify that the reported gains are not artifacts of the likelihood or validation design. revision: yes

  3. Referee: [Results] Results, EMEA panel: the p < 0.001 significance for the 27% error reduction is presented without details on the exact loss function, number of hold-out periods, or correction for multiple comparisons across regions and origin markets, weakening the strength of the outperformance claim relative to the naïve, ETS, and log-ratio SARIMA benchmarks.

    Authors: We have expanded the Results section to specify that the loss function is the mean absolute percentage error computed on the simplex, that the evaluation uses a rolling 12-month hold-out window, and that Bonferroni correction was applied across the four destination regions and the origin-market comparisons. The reported p < 0.001 remains significant after correction. We also document the exact benchmark implementations (naïve last-observation, ETS, and log-ratio SARIMA) to facilitate direct replication. revision: yes

standing simulated objections not resolved
  • Direct comparison of Airbnb booking shares against official UNWTO/Eurostat/national arrival statistics, which cannot be performed because the underlying dataset is proprietary.

Circularity Check

0 steps flagged

No significant circularity; forecast gains evaluated out-of-sample against external benchmarks

full rationale

The paper fits BDARMA parameters to training portions of the Airbnb booking series and generates forecasts on held-out periods, then compares mean forecast error to naïve, exponential smoothing, and SARIMA benchmarks. The 27% EMEA improvement (p<0.001) is therefore a genuine out-of-sample metric rather than a quantity defined by the fitted parameters themselves. No self-definitional equations, fitted-input-renamed-as-prediction, or load-bearing self-citations appear in the derivation. Data-source limitations (proprietary platform shares) affect external validity but do not create circularity inside the modeling chain.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 0 invented entities

The approach rests on standard Dirichlet likelihood for compositions and typical time-series assumptions; several free parameters are fitted to the booking series.

free parameters (2)
  • autoregressive and moving-average coefficients
    Fitted parameters in the DARMA component that capture temporal dependence in the compositional series.
  • seasonal mean and precision parameters
    Additional parameters allowing seasonal variation in both location and dispersion of the Dirichlet distribution.
axioms (1)
  • domain assumption Dirichlet distribution is a suitable likelihood for compositional time series that must sum to one
    Standard modeling choice in compositional data analysis invoked to enforce the unit-sum constraint.

pith-pipeline@v0.9.0 · 5527 in / 1403 out tokens · 44181 ms · 2026-05-15T20:48:42.902182+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

43 extracted references · 43 canonical work pages

  1. [1]

    Investor Relations

    Airbnb Q4 2024 shareholder letter. Investor Relations. URL:https://s26.q4cdn.com/656283129/files/doc_financials/2024/q4/Airbnb_ Q4-2024-Shareholder-Letter_Final.pdf. Aitchison, J.,

  2. [2]

    1974.tb00999.x

    The statistical analysis of compositional data. Journal of the Royal Statistical Society: Series B (Methodological) 44, 139–177. doi:10.1111/j.2517-6161. 1982.tb01195.x. Aitchison, J.,

  3. [3]

    Chapman and Hall, London

    The Statistical Analysis of Compositional Data. Chapman and Hall, London. doi:10.1007/978-94-009-4109-0. Assaf, A.G., Li, G., Song, H., Tsionas, M.G.,

  4. [4]

    JournalofTravel Research 58, 383–397

    Modelingandforecastingregionaltourism demandusingtheBayesianglobalvectorautoregressive(BGVAR)model. JournalofTravel Research 58, 383–397. doi:10.1177/0047287518759226. Athanasopoulos, G., Hyndman, R.J., Song, H., Wu, D.C.,

  5. [5]

    Masset, R

    The tourism forecast- ing competition. International Journal of Forecasting 27, 822–844. doi:10.1016/j. ijforecast.2010.04.009. Carpenter, B., Gelman, A., Hoffman, M.D., Lee, D., Goodrich, B., Betancourt, M., Brubaker, M., Guo, J., Li, P., Riddell, A.,

  6. [6]

    Journal of Statistical Software , author =

    Stan: Aprobabilisticprogramminglanguage. Journal of Statistical Software 76, 1–32. doi:10.18637/jss.v076.i01. Diebold, F.X., Mariano, R.S.,

  7. [7]

    Parisa Foroutan and Salim Lahmiri

    Comparing predictive accuracy. Journal of Business & Economic Statistics 13, 253–263. doi:10.1080/07350015.1995.10524599. Divisekera, S.,

  8. [8]

    Annals of Tourism Research 30, 31–49

    A model of demand for international tourism. Annals of Tourism Research 30, 31–49. doi:10.1016/S0160-7383(02)00029-4. Dolnicar, S.,

  9. [9]

    Annals of Tourism Research 75, 248–264

    A review of research into paid online peer-to-peer accommodation: 34 Launching the annals of tourism research curated collection on peer-to-peer accommo- dation. Annals of Tourism Research 75, 248–264. doi:10.1016/j.annals.2019.02.003. Egozcue, J.J., Pawlowsky-Glahn, V., Mateu-Figueras, G., Barcelo-Vidal, C.,

  10. [10]

    Gössling, S., Scott, D., Hall, C.M.,

    doi:10.1023/A:1023818214614. Gössling, S., Scott, D., Hall, C.M.,

  11. [11]

    Journal of Sustainable Tourism 29, 1–20

    Pandemics, tourism and global change: A rapid as- sessment of COVID-19. Journal of Sustainable Tourism 29, 1–20. doi:10.1080/09669582. 2020.1758708. published online 27 Apr

  12. [12]

    Annual Review of Statistics and Its Application 8, 271–299

    Compositional data analysis. Annual Review of Statistics and Its Application 8, 271–299. doi:10.1146/annurev-statistics-042720-124436. Grunwald, G.K., Raftery, A.E., Guttorp, P.,

  13. [13]

    Journal of the Royal Statistical Society: Series B (Methodological) 55, 103–116

    Time series of continuous proportions. Journal of the Royal Statistical Society: Series B (Methodological) 55, 103–116. doi:10. 1111/j.2517-6161.1993.tb01470.x. Guttentag, D.,

  14. [14]

    Current Issues in Tourism 18, 1192–1217

    Airbnb: Disruptive innovation and the rise of an informal tourism accommodation sector. Current Issues in Tourism 18, 1192–1217. doi:10.1080/13683500. 2013.827159. Hall, C.M., Prayag, G., Safonov, A., Coles, T., Gössling, S., Naderi Koupaei, S.,

  15. [15]

    Current Issues in Tourism 25, 3057–3067

    Airbnb and the sharing economy. Current Issues in Tourism 25, 3057–3067. doi:10.1080/ 13683500.2022.2122418. Hu, M., Li, H., Song, H., Li, X., Law, R.,

  16. [17]

    Tourism Economics 25, 469–492

    Tourism forecasting: A review of methodological developments over the last decade. Tourism Economics 25, 469–492. doi:10.1177/1354816618812588. 35 Katz, H.,

  17. [18]

    Information gain-based policy op- timization for multi-turn LLM agents.arXiv preprint arXiv:2510.14967, 2025

    Centered MA Dirichlet ARMA for financial compositions: Theory & empir- ical evidence. URL:https://arxiv.org/abs/2510.18903, doi:10.48550/arXiv.2510. 18903,arXiv:2510.18903. Katz, H., Brusch, K.T., Weiss, R.E.,

  18. [19]

    Airbnb stay lengths during and after the pandemic (2019–2024)

    Slomads rising: Structural shifts in U.S. Airbnb stay lengths during and after the pandemic (2019–2024). Tourism and Hospitality 6,

  19. [20]

    Annals of Tourism Research Empirical Insights 6, 100185

    Leadtimesinflux: AnalyzingAirbnbbookingdynamics during global upheavals (2018–2022). Annals of Tourism Research Empirical Insights 6, 100185. Katz, H., Weiss, R.E.,

  20. [21]

    arXiv preprint arXiv:2507.14132 doi:10

    A Bayesian Dirichlet auto-regressive conditional heteroskedas- ticity model for forecasting currency shares. arXiv preprint arXiv:2507.14132 doi:10. 48550/arXiv.2507.14132. Kynčlová, P., Filzmoser, P., Hron, K.,

  21. [22]

    Journal of Forecasting 34, 303–314

    Modeling compositional time series with vector autoregressive models. Journal of Forecasting 34, 303–314. doi:10.1002/for.2336. Li, G., Song, H., Witt, S.F.,

  22. [23]

    International Journal of Forecasting 22, 57–71

    Time varying parameter and fixed parameter linear AIDS: An application to tourism demand forecasting. International Journal of Forecasting 22, 57–71. doi:10.1016/j.ijforecast.2005.03.006. Li, H., Hu, M., Li, G.,

  23. [24]

    Annals of Tourism Research 83, 102912

    Forecasting tourism demand with multisource big data. Annals of Tourism Research 83, 102912. doi:10.1016/j.annals.2020.102912. 36 Li, X., Law, R., Xie, G., Wang, S.,

  24. [25]

    Tourism Management 83, 104245

    Review of tourism forecasting research with internet data. Tourism Management 83, 104245. doi:10.1016/j.tourman.2020.104245. Liu, X., Liu, A., Chen, J.L., Li, G.,

  25. [27]

    Tourism Management 110, 105164

    Google trends and Baidu index data in tourism demand forecasting: A critical assessment of recent applications. Tourism Management 110, 105164. doi:10.1016/j.tourman.2025.105164. Newey, W.K., West, K.D.,

  26. [28]

    Current Issues in Tourism 23, 811–825

    Regulating Airbnb: How cities deal with perceived negative externalities of short-term rentals. Current Issues in Tourism 23, 811–825. doi:10. 1080/13683500.2018.1504899. Pawlowsky-Glahn, V., Egozcue, J.J., Tolosana-Delgado, R.,

  27. [29]

    Wiley, Chichester

    Modeling and Analysis of Compositional Data. Wiley, Chichester. doi:10.1002/9781119003144. Sainaghi, R., Chica-Olmo, J.,

  28. [30]

    Annals of Tourism Research 96, 103464

    The effects of location before and during COVID-19: Impacts on revenue of Airbnb listings in Milan (Italy). Annals of Tourism Research 96, 103464. doi:10.1016/j.annals.2022.103464. Sigala, M.,

  29. [31]

    Journal of Business Research 117, 312–321

    Tourism and COVID-19: Impacts and implications for advancing and resetting industry and research. Journal of Business Research 117, 312–321. doi:10.1016/ j.jbusres.2020.06.015. Song, H., Li, G.,

  30. [32]

    Tourism Management 29, 203–220

    Tourism demand modelling and forecasting: A review of recent research. Tourism Management 29, 203–220. doi:10.1016/j.tourman.2007.07.016. 37 Song, H., Li, G., Cai, Y.,

  31. [33]

    Annals of Tourism Research 96, 103445

    Tourism forecasting competition in the time of COVID-19: An assessment of ex ante forecasts. Annals of Tourism Research 96, 103445. doi:10.1016/ j.annals.2022.103445. Song, H., Li, G., Witt, S.F., Athanasopoulos, G.,

  32. [34]

    International Journal of Forecasting 27, 855–869

    Forecasting tourist arrivals using time-varying parameter structural time series models. International Journal of Forecasting 27, 855–869. doi:10.1016/j.ijforecast.2010.06.001. Song, H., Liu, A., Li, G., Liu, X.,

  33. [35]

    International Journal of Tourism Research 23, 914–927

    Bayesian bootstrap aggregation for tourism demand forecasting. International Journal of Tourism Research 23, 914–927. doi:10.1002/jtr

  34. [36]

    Annals of Tourism Research 75, 338–362

    A review of research on tourism demand forecasting: Launching the annals of tourism research curated collection on tourism demand forecast- ing. Annals of Tourism Research 75, 338–362. doi:10.1016/j.annals.2018.12.001. Song, H., Qiu, R.T.R., Park, J.,

  35. [37]

    The functional role of cardiac activity in perception and action,

    Progress in tourism demand research: Theory and empirics. Tourism Management 94, 104655. doi:10.1016/j.tourman.2022.104655. Sun, S., Wei, Y., Tsui, K.L., Wang, S.,

  36. [38]

    Tourism Management 70, 1–10

    Forecasting tourist arrivals with machine learn- ing and internet search index. Tourism Management 70, 1–10. doi:10.1016/j.tourman. 2018.07.010. Vehtari, A., Gelman, A., Gabry, J.,

  37. [39]

    Rank-normalization, folding, and localization: An improved R for assessing convergence of MCMC

    Rank- normalization, folding, and localization: An improved ˆRfor assessing convergence of MCMC. Bayesian Analysis 16, 667–718. doi:10.1214/20-BA1221. 38 Witt, S.F., Witt, C.A.,

  38. [40]

    International Journal of Forecasting 11, 447–475

    Forecasting tourism demand: A review of empirical research. International Journal of Forecasting 11, 447–475. doi:10.1016/0169-2070(95)00591-7. Wu, D.C., Li, G., Song, H.,

  39. [41]

    Annals of Tourism Research 39, 667–685

    Economic analysis of tourism consumption dynamics: A time-varying parameter demand system approach. Annals of Tourism Research 39, 667–685. doi:10.1016/j.annals.2011.09.003. Wu, D.C., Song, H., Shen, S.,

  40. [42]

    International Journal of Contemporary Hospitality Management 29, 507–529

    New developments in tourism and hotel demand model- ing and forecasting. International Journal of Contemporary Hospitality Management 29, 507–529. doi:10.1108/IJCHM-05-2015-0249. Wu, J., Li, M., Zhao, E., Sun, S., Wang, S.,

  41. [43]

    Tourism Management 98, 104759

    Can multi-source heterogeneous data improve the forecasting performance of tourist arrivals amid COVID-19? Mixed-data sampling approach. Tourism Management 98, 104759. doi:10.1016/j.tourman.2023. 104759. Zervas, G., Proserpio, D., Byers, J.W.,

  42. [44]

    Journal of Marketing Research 54, 687–705

    The rise of the sharing economy: Estimating the impact of Airbnb on the hotel industry. Journal of Marketing Research 54, 687–705. doi:10.1509/jmr.15.0204. Zheng, T., Chen, R.,

  43. [45]

    Journal of Multivariate Analysis 158, 31–46

    Dirichlet ARMA models for compositional time series. Journal of Multivariate Analysis 158, 31–46. doi:10.1016/j.jmva.2017.03.006. 39