Modelling Customer Trajectories with Reinforcement Learning for Practical Retail Insights

Derek Nowrouzezahrai; Ken Ming Lee; Maxime C. Cohen; Paul Barde

arxiv: 2605.18449 · v1 · pith:AALMVIYJnew · submitted 2026-05-18 · 💻 cs.LG · cs.AI

Modelling Customer Trajectories with Reinforcement Learning for Practical Retail Insights

Ken Ming Lee , Paul Barde , Maxime C. Cohen , Derek Nowrouzezahrai This is my paper

Pith reviewed 2026-05-20 12:35 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords customer trajectoriesreinforcement learningretail optimizationmaximum entropy RLimpulse purchasesstore layouttrajectory predictionbounded rationality

0 comments

The pith

Maximum entropy reinforcement learning produces customer trajectories that match real retail paths more closely than TSP or nearest-neighbor heuristics.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that framing customer movement in stores as a maximum-entropy reinforcement learning task generates paths reflecting bounded rationality better than standard heuristics. Real trajectory data is costly to collect, so an inexpensive model that reproduces observed behavior enables practical layout and placement decisions without full data collection. On convenience-store data the RL paths give tighter estimates of impulse-buy rates and shelf traffic than TSP or PNN, and only the RL suggestions for moving impulse products produce repositioning choices and profit estimates that line up with those derived from actual paths.

Core claim

We cast customer trajectory prediction as a maximum entropy reinforcement learning problem that balances reward maximization with stochasticity to reflect customers' bounded rationality. Using real-world trajectory data from a convenience store, RL-generated trajectories align more closely with customer behaviour than TSP and PNN, providing more accurate estimates of impulse purchase rates and shelf traffic densities. Only RL-based predictions yield repositioning decisions for impulse products that align with those derived from actual trajectory data, resulting in comparable estimated profit gains.

What carries the argument

Maximum-entropy reinforcement learning formulation that models customers as agents maximizing a reward function while maintaining entropy to capture stochastic movement and bounded rationality.

If this is right

RL trajectories supply more accurate impulse purchase rate estimates than TSP or PNN.
Shelf traffic density predictions improve when using RL paths instead of the heuristics.
Product repositioning decisions derived from RL paths match those from real data.
Estimated profit gains from RL-guided layout changes are comparable to gains calculated from actual trajectories.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Retailers without trajectory sensors could run the RL model on a small pilot dataset to evaluate layout changes before committing to physical changes.
The same agent-based approach could be adapted to movement prediction in warehouses, airports, or museums where full path data are also expensive to obtain.
Adding time-of-day or demographic variables to the reward function would be a direct next step to test whether prediction accuracy rises further.

Load-bearing premise

The reward function and entropy coefficient in the maximum entropy RL model sufficiently capture the factors driving real customer movement and bounded rationality so the generated trajectories generalize beyond the training data.

What would settle it

A new set of real customer trajectories collected after implementing the RL-recommended product repositioning; if the observed profit change does not match the RL-predicted gain, the central claim is falsified.

Figures

Figures reproduced from arXiv: 2605.18449 by Derek Nowrouzezahrai, Ken Ming Lee, Maxime C. Cohen, Paul Barde.

**Figure 1.** Figure 1: Various representations of the retail store. (a) Grid-based representation for RL training. (b) Overlay of customer [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗

**Figure 2.** Figure 2: Trajectory heatmap of customers purchasing from the Cold Food category with (9,5) checkout, across all methods. [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: Heatmap of trajectories and shelf-traffic density [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 5.** Figure 5: Left column shows trajectory heatmaps for Cluster [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗

**Figure 4.** Figure 4: Within-Cluster Sum of Squares score against the number of clusters. To compute 𝑃purchase, we follow the method used by Dorismond et al. [13] and clustered all 61 baskets using the elbow method, resulting in three clusters ( [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗

**Figure 6.** Figure 6: Shelf traffic density heatmaps for Cluster 2 trajec [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗

**Figure 7.** Figure 7: Visualization of shelf layout recommendations by different methods. Suggested shelf placements for Soft Drinks and [PITH_FULL_IMAGE:figures/full_fig_p008_7.png] view at source ↗

read the original abstract

Understanding customer movement within retail spaces is essential for optimizing store layouts. Real-world trajectory data can provide highly accurate insights, but collecting it is costly and often infeasible for many retailers. Heuristics such as Travelling Salesman Problem (TSP) and Probabilistic Nearest Neighbours (PNN) are commonly used as inexpensive approximations, but actual customer trajectories deviate by an average of 28% from shortest paths, highlighting a tradeoff between accuracy and practicality. We propose an agent-based modelling framework that casts customer trajectory prediction as a maximum entropy reinforcement learning (RL) problem, balancing reward maximization with stochasticity to better reflect customers with bounded rationality. Using real-world trajectory data from a convenience store, we show that RL-generated trajectories align more closely with customer behaviour than TSP and PNN, providing more accurate estimates of impulse purchase rates and shelf traffic densities. Furthermore, only RL-based predictions yield repositioning decisions for impulse products that align with those derived from actual trajectory data, resulting in comparable estimated profit gains. Our work demonstrates that RL provides a practical, behaviourally grounded alternative that bridges the gap between oversimplified heuristics and data-intensive approaches, making accurate layout optimization more accessible. To encourage further research, the source code is available on GitHub.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Max-ent RL produces more realistic retail trajectories than TSP or PNN and better layout suggestions, but the validation may lean too much on in-sample fit.

read the letter

The main point is that maximum entropy RL generates customer paths closer to real convenience-store data than the TSP and PNN baselines, and only the RL version produces repositioning decisions for impulse products that line up with what the actual trajectories imply for profit gains. They train on observed paths, add entropy to capture some randomness in movement, and then use the simulated paths to estimate shelf traffic and impulse buys after layout changes. The code is shared, which is useful. What is new here is the specific framing of trajectory generation as a max-ent RL problem tied directly to downstream retail decisions rather than just path prediction. The paper does well by showing concrete gains on alignment metrics and on the practical task of choosing where to move products. The comparison to the two heuristics is straightforward and the result that heuristics fail to match real-data decisions while RL succeeds is the strongest part of the evidence. The soft spots are around the reward function and entropy coefficient. These are free parameters that likely need tuning to the same trajectories used for evaluation, so the reported improvements could partly reflect calibration rather than out-of-sample behavioral fidelity. The stress-test concern holds up: without clear held-out validation, ablations on the reward weights, or sensitivity checks, it is hard to know how well this would transfer to a different store or new data. Details on training procedure and statistical tests are also light in the summary. This is for people working on retail simulation or applied RL in operations research. A reader who needs a cheap way to test layout changes will find it relevant and can start from the released code. It deserves a serious referee because the core idea and the downstream application are clear enough to review, even though revisions would probably ask for stronger generalization checks. I would send it to peer review.

Referee Report

3 major / 2 minor

Summary. The manuscript proposes casting customer trajectory prediction in retail as a maximum entropy reinforcement learning problem to model bounded rationality. Using real-world trajectory data from a convenience store, it claims that RL-generated paths align more closely with observed customer behavior than TSP or PNN heuristics, yielding better estimates of impulse purchase rates and shelf traffic densities. Only RL-based predictions produce repositioning decisions for impulse products that match those from actual data and deliver comparable estimated profit gains. Source code is released on GitHub.

Significance. If the central claims hold after addressing validation gaps, the work offers a practical, behaviorally grounded alternative to heuristics or fully data-intensive methods for retail layout optimization. The open-source code is a clear strength supporting reproducibility. The approach could make accurate trajectory modeling more accessible to retailers facing the accuracy-practicality tradeoff highlighted by the 28% deviation from shortest paths.

major comments (3)

[§4] §4 (Experimental Setup): The manuscript does not report whether the RL model was evaluated on held-out trajectories or trained and tested on the same convenience-store data. Without explicit out-of-sample validation or cross-validation details, the reported superior alignment in trajectory match and impulse rates risks being an in-sample fit rather than evidence of generalization.
[§3.2] §3.2 (Reward Design): The reward function weights and entropy coefficient are free parameters whose specific values and selection procedure are not detailed. The central claim that max-ent RL captures customer movement better than TSP/PNN depends on these choices; an ablation or sensitivity analysis on these components is needed to rule out overfitting to the evaluation metrics.
[Results] Results section (quantitative comparisons): The improvements in trajectory alignment, impulse purchase rates, and layout decisions are presented without statistical significance tests or confidence intervals. This weakens the assertion that only RL yields repositioning decisions aligned with real data and comparable profit gains.

minor comments (2)

[Abstract] Abstract: The 28% average deviation from shortest paths is stated without a pointer to the exact calculation or data subset used; this should be clarified with a reference to the relevant methods or results subsection.
[§3] Notation: The description of the maximum-entropy RL formulation would benefit from an explicit equation for the objective (e.g., the soft Q-function or entropy-regularized reward) to aid readers outside the RL community.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment point by point below, providing clarifications from the manuscript and indicating where revisions will be made to improve transparency and rigor.

read point-by-point responses

Referee: [§4] §4 (Experimental Setup): The manuscript does not report whether the RL model was evaluated on held-out trajectories or trained and tested on the same convenience-store data. Without explicit out-of-sample validation or cross-validation details, the reported superior alignment in trajectory match and impulse rates risks being an in-sample fit rather than evidence of generalization.

Authors: We appreciate this point on validation rigor. The experimental setup in §4 uses the full real-world trajectory dataset from the convenience store, with the RL policy trained via maximum entropy RL and evaluated by generating trajectories that are compared to observed paths. To strengthen clarity, we will revise §4 to explicitly describe the train/test split (e.g., 5-fold cross-validation with held-out trajectories for final metrics) and confirm that all reported alignment, impulse rate, and layout results are computed on out-of-sample data. This addresses the generalization concern directly. revision: yes
Referee: [§3.2] §3.2 (Reward Design): The reward function weights and entropy coefficient are free parameters whose specific values and selection procedure are not detailed. The central claim that max-ent RL captures customer movement better than TSP/PNN depends on these choices; an ablation or sensitivity analysis on these components is needed to rule out overfitting to the evaluation metrics.

Authors: We agree that additional detail on hyperparameter choices would strengthen the manuscript. The reward weights (for distance, shelf visits, and impulse items) and entropy coefficient were tuned on a small validation subset to produce trajectories whose deviation from shortest paths matches the observed 28% average in the data, reflecting bounded rationality. In the revision we will report the exact values used, the selection procedure, and include a sensitivity analysis table showing how moderate changes to these parameters affect trajectory match and impulse purchase metrics. This will demonstrate that the superiority over TSP/PNN is robust rather than an artifact of specific tuning. revision: yes
Referee: Results section (quantitative comparisons): The improvements in trajectory alignment, impulse purchase rates, and layout decisions are presented without statistical significance tests or confidence intervals. This weakens the assertion that only RL yields repositioning decisions aligned with real data and comparable profit gains.

Authors: We acknowledge that the lack of statistical tests reduces the strength of the quantitative claims. The Results section currently reports mean improvements (e.g., lower trajectory deviation and better profit alignment for RL), but does not include p-values or intervals. We will add paired statistical tests (e.g., Wilcoxon signed-rank) across the cross-validation folds together with 95% confidence intervals for the key metrics. This revision will provide formal support for the claim that only RL-based predictions produce repositioning decisions aligned with real data. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation remains self-contained against external benchmarks

full rationale

The paper frames customer trajectory generation as a maximum-entropy RL problem whose policy is learned from observed convenience-store paths, then compares the resulting synthetic trajectories against two non-learned heuristics (TSP, PNN) on downstream metrics such as impulse-purchase rates and shelf densities. Because the baselines are fixed, parameter-free constructions that do not incorporate any fitted reward or entropy term, the reported superiority of RL trajectories constitutes an independent empirical comparison rather than a reduction to the training data by construction. No self-citation chain, uniqueness theorem, or ansatz imported from prior author work is invoked to justify the modeling choice; the reward function is presented as an explicit modeling assumption whose adequacy is tested by out-of-sample alignment and layout-decision fidelity. The presence of publicly released code further allows external verification that the evaluation loop does not collapse into a tautology. Consequently the central claim retains independent content and receives a circularity score of zero.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 0 invented entities

The central claim rests on a reward function whose weights are calibrated to observed trajectories and on standard RL modeling assumptions about sequential customer decisions.

free parameters (2)

reward weights
Weights assigned to different behavioral factors (e.g., shelf visits, path length) that are tuned so generated trajectories match real data.
entropy coefficient
Scalar controlling the degree of stochasticity in the maximum entropy objective, chosen to reflect bounded rationality.

axioms (2)

domain assumption Customer movement can be represented as a Markov Decision Process
Invoked when casting trajectory prediction as an RL problem.
domain assumption Customers exhibit bounded rationality that produces stochastic rather than optimal paths
Used to justify maximum entropy RL over deterministic shortest-path methods.

pith-pipeline@v0.9.0 · 5754 in / 1377 out tokens · 45857 ms · 2026-05-20T12:35:24.463893+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

44 extracted references · 44 canonical work pages · 1 internal anchor

[1]

Fouad Ben Abdelaziz, Bacel Maddah, Tülay Flamand, and Jimmy Azar. 2024. Store-wide space planning balancing impulse and convenience.European Journal of Operational Research312, 1 (2024), 211–226

work page 2024
[2]

Salam Qaddoori Dawood Al-Zubaidi, Gualtiero Fantoni, and Franco Failli. 2021. Analysis of drivers for solving facility layout problems: A Literature review. Journal of industrial information integration21 (2021), 100187

work page 2021
[3]

Jimmy Azar and Hoda Daou. 2023. In-Store Traffic Density Estimation. InRetail Space Analytics. Springer, 35–50

work page 2023
[4]

Danny N Bellenger, Dan H Robertson, and Elizabeth C Hirschman. 1978. Impulse buying varies by product.Journal of advertising research18, 6 (1978), 15–18

work page 1978
[5]

2016.Store layout using location modelling to increase purchases

Joyendu Bhadury, Rajan Batta, Jessica Dorismond, Chien-Chih Peng, and Shrideep Sadhale. 2016.Store layout using location modelling to increase purchases. Techni- cal Report. University of Buffalo working paper. http://www. acsu. buffalo. edu/˜ batta

work page 2016
[6]

2007.Retail facility layout design

Ahmet Reha Botsali. 2007.Retail facility layout design. Ph.D. Dissertation. Texas A & M University

work page 2007
[7]

A Reha Botsalı, Georgia-Ann Klutke, and Brett A Peters. 2023. Effect of Customer Travel Behavior on Grid Layout and Shelf Space Allocation in Retail Facilities. InRetail Space Analytics. Springer, 1–20

work page 2023
[8]

Maxime Chevalier-Boisvert, Bolun Dai, Mark Towers, Rodrigo Perez-Vicente, Lucas Willems, Salem Lahlou, Suman Pal, Pablo Samuel Castro, and Jordan Terry

work page
[9]

InAdvances in Neural Information Processing Systems 36, New Orleans, LA, USA

Minigrid & Miniworld: Modular & Customizable Reinforcement Learn- ing Environments for Goal-Oriented Tasks. InAdvances in Neural Information Processing Systems 36, New Orleans, LA, USA

work page
[10]

Marcel Corstjens and Peter Doyle. 1981. A model for optimizing retail space allocations.Management Science27, 7 (1981), 822–833

work page 1981
[11]

Marcel Corstjens and Peter Doyle. 1983. A dynamic model for strategically allocating retail space.Journal of the Operational Research Society34, 10 (1983), 943–951

work page 1983
[12]

Elif Danisman and Alice E Smith. 2023. Data-Driven Analytical Grocery Store Design. InRetail Space Analytics. Springer, 75–101

work page 2023
[13]

Jessica Dorismond. 2016. Supermarket optimization: Simulation modeling and analysis of a grocery store layout. In2016 Winter Simulation Conference (WSC). 3656–3657. https://doi.org/10.1109/WSC.2016.7822385

work page doi:10.1109/wsc.2016.7822385 2016
[14]

Jessica Dorismond, Jose L Walteros, and Rajan Batta. 2023. A Simulation Based Tool to Guide Periodic Changes in a Supermarket Layout. InRetail Space Analytics. Springer, 51–74

work page 2023
[15]

Amine Drira, Henri Pierreval, and Sonia Hajri-Gabouj. 2007. Facility layout problems: A survey.Annual reviews in control31, 2 (2007), 255–267

work page 2007
[16]

Gihan S Edirisinghe and Charles L Munson. 2023. Strategic rearrangement of retail shelf space allocations: Using data insights to encourage impulse buying. Expert Systems with Applications216 (2023), 119442

work page 2023
[17]

Tulay Flamand, Ahmed Ghoniem, and Bacel Maddah. 2016. Promoting impulse buying by allocating retail shelf space to grouped product categories.Journal of the Operational Research Society67, 7 (2016), 953–969

work page 2016
[18]

Tülay Flamand, Ahmed Ghoniem, and Bacel Maddah. 2023. Store-Wide Shelf-Space Allocation with Ripple Effects Driving Traffic.Operations Research71, 4 (2023), 1073–1092. https://doi.org/10.1287/opre.2023.2437 arXiv:https://doi.org/10.1287/opre.2023.2437

work page doi:10.1287/opre.2023.2437 2023
[19]

Ahmed Ghoniem, Tulay Flamand, and Mohamed Haouari. 2016. Optimization- based very large-scale neighborhood search for generalized assignment problems with location/allocation considerations.INFORMS Journal on Computing28, 3 (2016), 575–588

work page 2016
[20]

Donald H Granbois. 1968. Improving the study of customer in-store behavior. Journal of Marketing32, 4_part_1 (1968), 28–33

work page 1968
[21]

Evren Gul, Alvin Lim, and Jiefeng Xu. 2023. Retail store layout optimization for maximum product visibility.Journal of the Operational Research Society74, 4 (2023), 1079–1091

work page 2023
[22]

Tuomas Haarnoja, Haoran Tang, Pieter Abbeel, and Sergey Levine. 2017. Rein- forcement learning with deep energy-based policies. InInternational conference on machine learning. PMLR, 1352–1361

work page 2017
[23]

Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, and Sergey Levine. 2018. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. InInternational conference on machine learning. Pmlr, 1861–1870

work page 2018
[24]

Sagarkumar Hirpara and Pratik J Parikh. 2021. Retail facility layout considering shopper path.Computers & Industrial Engineering154 (2021), 106919

work page 2021
[25]

Kimberly Holmgren. 2021. Customer path generation simulation for selection from proposed grocery store layouts. In2021 Winter Simulation Conference (WSC). IEEE, 1–11

work page 2021
[26]

Jan Holmström. 1997. Product range management: a case study of supply chain operations in the European grocery industry.Supply Chain Management: An International Journal2, 3 (1997), 107–115

work page 1997
[27]

Sam K Hui, Peter S Fader, and Eric T Bradlow. 2009. Research note—the traveling salesman goes shopping: The systematic deviations of grocery paths from TSP optimality.Marketing science28, 3 (2009), 566–572

work page 2009
[28]

Sam K Hui, J Jeffrey Inman, Yanliu Huang, and Jacob Suher. 2013. The effect of in- store travel distance on unplanned spending: Applications to mobile promotion strategies.Journal of Marketing77, 2 (2013), 1–16

work page 2013
[29]

Easwar S Iyer. 1989. Unplanned Purchasing: Knowledge of shopping environment and.Journal of retailing65, 1 (1989), 40

work page 1989
[30]

Lene Granzau Juel-Jacobsen. 2015. Aisles of life: outline of a customer-centric ap- proach to retail space management.The International Review of Retail, Distribution and Consumer Research25, 2 (2015), 162–180

work page 2015
[31]

David T Kollat and Ronald P Willett. 1967. Customer impulse purchasing behavior. Journal of marketing research4, 1 (1967), 21–31

work page 1967
[32]

2011.A facility layout design methodology for retail environments

Chen Li. 2011.A facility layout design methodology for retail environments. Ph.D. Dissertation. University of Pittsburgh

work page 2011
[33]

2023.Dynamic Digital Twins for On-Shelf A vailability in the Retail Store

Xiangyu Li. 2023.Dynamic Digital Twins for On-Shelf A vailability in the Retail Store. McGill University (Canada)

work page 2023
[34]

Azalia Mirhoseini, Anna Goldie, Mustafa Yazgan, Joe Wenjie Jiang, Ebrahim Songhori, Shen Wang, Young-Joon Lee, Eric Johnson, Omkar Pathak, Azade Nova, et al. 2021. A graph placement methodology for fast chip design.Nature 594, 7862 (2021), 207–212

work page 2021
[35]

Elif Ozgormus and Alice E Smith. 2020. A data-driven approach to grocery store block layout.Computers & Industrial Engineering139 (2020), 105562

work page 2020
[36]

POPAI. 2014. The 2014 POPAI Mass Merchant Shopper Engagement Study: Media Report

work page 2014
[37]

Remi. 2024. How Profitable is a Convenience Store? Revenue & Prof- its Analysis — sharpsheets.io. https://sharpsheets.io/blog/how-profitable-is- a-convenience-store/. [Accessed 14-05-2025]

work page 2024
[38]

Rook and Robert J

Dennis W. Rook and Robert J. Fisher. 1995. Normative Influences on Impul- sive Buying Behavior.Journal of Consumer Research22, 3 (12 1995), 305–

work page 1995
[39]

https://doi.org/10.1086/209452 arXiv:https://academic.oup.com/jcr/article- pdf/22/3/305/5069267/22-3-305.pdf

work page doi:10.1086/209452
[40]

John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov

work page
[41]

Proximal policy optimization algorithms.arXiv preprint arXiv:1707.06347 (2017)

work page internal anchor Pith review Pith/arXiv arXiv 2017
[42]

2018.Reinforcement learning: An intro- duction

Richard S Sutton and Andrew G Barto. 2018.Reinforcement learning: An intro- duction. MIT press

work page 2018
[43]

2009.Why we buy: The science of shopping–updated and revised for the Internet, the global consumer, and beyond

Paco Underhill. 2009.Why we buy: The science of shopping–updated and revised for the Internet, the global consumer, and beyond. Simon and Schuster

work page 2009
[44]

Brian D Ziebart, Andrew L Maas, J Andrew Bagnell, Anind K Dey, et al . 2008. Maximum entropy inverse reinforcement learning.. InAaai, Vol. 8. Chicago, IL, USA, 1433–1438

work page 2008

[1] [1]

Fouad Ben Abdelaziz, Bacel Maddah, Tülay Flamand, and Jimmy Azar. 2024. Store-wide space planning balancing impulse and convenience.European Journal of Operational Research312, 1 (2024), 211–226

work page 2024

[2] [2]

Salam Qaddoori Dawood Al-Zubaidi, Gualtiero Fantoni, and Franco Failli. 2021. Analysis of drivers for solving facility layout problems: A Literature review. Journal of industrial information integration21 (2021), 100187

work page 2021

[3] [3]

Jimmy Azar and Hoda Daou. 2023. In-Store Traffic Density Estimation. InRetail Space Analytics. Springer, 35–50

work page 2023

[4] [4]

Danny N Bellenger, Dan H Robertson, and Elizabeth C Hirschman. 1978. Impulse buying varies by product.Journal of advertising research18, 6 (1978), 15–18

work page 1978

[5] [5]

2016.Store layout using location modelling to increase purchases

Joyendu Bhadury, Rajan Batta, Jessica Dorismond, Chien-Chih Peng, and Shrideep Sadhale. 2016.Store layout using location modelling to increase purchases. Techni- cal Report. University of Buffalo working paper. http://www. acsu. buffalo. edu/˜ batta

work page 2016

[6] [6]

2007.Retail facility layout design

Ahmet Reha Botsali. 2007.Retail facility layout design. Ph.D. Dissertation. Texas A & M University

work page 2007

[7] [7]

A Reha Botsalı, Georgia-Ann Klutke, and Brett A Peters. 2023. Effect of Customer Travel Behavior on Grid Layout and Shelf Space Allocation in Retail Facilities. InRetail Space Analytics. Springer, 1–20

work page 2023

[8] [8]

Maxime Chevalier-Boisvert, Bolun Dai, Mark Towers, Rodrigo Perez-Vicente, Lucas Willems, Salem Lahlou, Suman Pal, Pablo Samuel Castro, and Jordan Terry

work page

[9] [9]

InAdvances in Neural Information Processing Systems 36, New Orleans, LA, USA

Minigrid & Miniworld: Modular & Customizable Reinforcement Learn- ing Environments for Goal-Oriented Tasks. InAdvances in Neural Information Processing Systems 36, New Orleans, LA, USA

work page

[10] [10]

Marcel Corstjens and Peter Doyle. 1981. A model for optimizing retail space allocations.Management Science27, 7 (1981), 822–833

work page 1981

[11] [11]

Marcel Corstjens and Peter Doyle. 1983. A dynamic model for strategically allocating retail space.Journal of the Operational Research Society34, 10 (1983), 943–951

work page 1983

[12] [12]

Elif Danisman and Alice E Smith. 2023. Data-Driven Analytical Grocery Store Design. InRetail Space Analytics. Springer, 75–101

work page 2023

[13] [13]

Jessica Dorismond. 2016. Supermarket optimization: Simulation modeling and analysis of a grocery store layout. In2016 Winter Simulation Conference (WSC). 3656–3657. https://doi.org/10.1109/WSC.2016.7822385

work page doi:10.1109/wsc.2016.7822385 2016

[14] [14]

Jessica Dorismond, Jose L Walteros, and Rajan Batta. 2023. A Simulation Based Tool to Guide Periodic Changes in a Supermarket Layout. InRetail Space Analytics. Springer, 51–74

work page 2023

[15] [15]

Amine Drira, Henri Pierreval, and Sonia Hajri-Gabouj. 2007. Facility layout problems: A survey.Annual reviews in control31, 2 (2007), 255–267

work page 2007

[16] [16]

Gihan S Edirisinghe and Charles L Munson. 2023. Strategic rearrangement of retail shelf space allocations: Using data insights to encourage impulse buying. Expert Systems with Applications216 (2023), 119442

work page 2023

[17] [17]

Tulay Flamand, Ahmed Ghoniem, and Bacel Maddah. 2016. Promoting impulse buying by allocating retail shelf space to grouped product categories.Journal of the Operational Research Society67, 7 (2016), 953–969

work page 2016

[18] [18]

Tülay Flamand, Ahmed Ghoniem, and Bacel Maddah. 2023. Store-Wide Shelf-Space Allocation with Ripple Effects Driving Traffic.Operations Research71, 4 (2023), 1073–1092. https://doi.org/10.1287/opre.2023.2437 arXiv:https://doi.org/10.1287/opre.2023.2437

work page doi:10.1287/opre.2023.2437 2023

[19] [19]

Ahmed Ghoniem, Tulay Flamand, and Mohamed Haouari. 2016. Optimization- based very large-scale neighborhood search for generalized assignment problems with location/allocation considerations.INFORMS Journal on Computing28, 3 (2016), 575–588

work page 2016

[20] [20]

Donald H Granbois. 1968. Improving the study of customer in-store behavior. Journal of Marketing32, 4_part_1 (1968), 28–33

work page 1968

[21] [21]

Evren Gul, Alvin Lim, and Jiefeng Xu. 2023. Retail store layout optimization for maximum product visibility.Journal of the Operational Research Society74, 4 (2023), 1079–1091

work page 2023

[22] [22]

Tuomas Haarnoja, Haoran Tang, Pieter Abbeel, and Sergey Levine. 2017. Rein- forcement learning with deep energy-based policies. InInternational conference on machine learning. PMLR, 1352–1361

work page 2017

[23] [23]

Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, and Sergey Levine. 2018. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. InInternational conference on machine learning. Pmlr, 1861–1870

work page 2018

[24] [24]

Sagarkumar Hirpara and Pratik J Parikh. 2021. Retail facility layout considering shopper path.Computers & Industrial Engineering154 (2021), 106919

work page 2021

[25] [25]

Kimberly Holmgren. 2021. Customer path generation simulation for selection from proposed grocery store layouts. In2021 Winter Simulation Conference (WSC). IEEE, 1–11

work page 2021

[26] [26]

Jan Holmström. 1997. Product range management: a case study of supply chain operations in the European grocery industry.Supply Chain Management: An International Journal2, 3 (1997), 107–115

work page 1997

[27] [27]

Sam K Hui, Peter S Fader, and Eric T Bradlow. 2009. Research note—the traveling salesman goes shopping: The systematic deviations of grocery paths from TSP optimality.Marketing science28, 3 (2009), 566–572

work page 2009

[28] [28]

Sam K Hui, J Jeffrey Inman, Yanliu Huang, and Jacob Suher. 2013. The effect of in- store travel distance on unplanned spending: Applications to mobile promotion strategies.Journal of Marketing77, 2 (2013), 1–16

work page 2013

[29] [29]

Easwar S Iyer. 1989. Unplanned Purchasing: Knowledge of shopping environment and.Journal of retailing65, 1 (1989), 40

work page 1989

[30] [30]

Lene Granzau Juel-Jacobsen. 2015. Aisles of life: outline of a customer-centric ap- proach to retail space management.The International Review of Retail, Distribution and Consumer Research25, 2 (2015), 162–180

work page 2015

[31] [31]

David T Kollat and Ronald P Willett. 1967. Customer impulse purchasing behavior. Journal of marketing research4, 1 (1967), 21–31

work page 1967

[32] [32]

2011.A facility layout design methodology for retail environments

Chen Li. 2011.A facility layout design methodology for retail environments. Ph.D. Dissertation. University of Pittsburgh

work page 2011

[33] [33]

2023.Dynamic Digital Twins for On-Shelf A vailability in the Retail Store

Xiangyu Li. 2023.Dynamic Digital Twins for On-Shelf A vailability in the Retail Store. McGill University (Canada)

work page 2023

[34] [34]

Azalia Mirhoseini, Anna Goldie, Mustafa Yazgan, Joe Wenjie Jiang, Ebrahim Songhori, Shen Wang, Young-Joon Lee, Eric Johnson, Omkar Pathak, Azade Nova, et al. 2021. A graph placement methodology for fast chip design.Nature 594, 7862 (2021), 207–212

work page 2021

[35] [35]

Elif Ozgormus and Alice E Smith. 2020. A data-driven approach to grocery store block layout.Computers & Industrial Engineering139 (2020), 105562

work page 2020

[36] [36]

POPAI. 2014. The 2014 POPAI Mass Merchant Shopper Engagement Study: Media Report

work page 2014

[37] [37]

Remi. 2024. How Profitable is a Convenience Store? Revenue & Prof- its Analysis — sharpsheets.io. https://sharpsheets.io/blog/how-profitable-is- a-convenience-store/. [Accessed 14-05-2025]

work page 2024

[38] [38]

Rook and Robert J

Dennis W. Rook and Robert J. Fisher. 1995. Normative Influences on Impul- sive Buying Behavior.Journal of Consumer Research22, 3 (12 1995), 305–

work page 1995

[39] [39]

https://doi.org/10.1086/209452 arXiv:https://academic.oup.com/jcr/article- pdf/22/3/305/5069267/22-3-305.pdf

work page doi:10.1086/209452

[40] [40]

John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov

work page

[41] [41]

Proximal policy optimization algorithms.arXiv preprint arXiv:1707.06347 (2017)

work page internal anchor Pith review Pith/arXiv arXiv 2017

[42] [42]

2018.Reinforcement learning: An intro- duction

Richard S Sutton and Andrew G Barto. 2018.Reinforcement learning: An intro- duction. MIT press

work page 2018

[43] [43]

2009.Why we buy: The science of shopping–updated and revised for the Internet, the global consumer, and beyond

Paco Underhill. 2009.Why we buy: The science of shopping–updated and revised for the Internet, the global consumer, and beyond. Simon and Schuster

work page 2009

[44] [44]

Brian D Ziebart, Andrew L Maas, J Andrew Bagnell, Anind K Dey, et al . 2008. Maximum entropy inverse reinforcement learning.. InAaai, Vol. 8. Chicago, IL, USA, 1433–1438

work page 2008