Contextual Bandits for Resource-Constrained Devices using Probabilistic Learning

Amy Loutfi; Antonello Rosato; Denis Kleyko; Kevin Johansson; Marco Angioli

arxiv: 2605.13346 · v1 · pith:GZOKMZK7new · submitted 2026-05-13 · 💻 cs.LG

Contextual Bandits for Resource-Constrained Devices using Probabilistic Learning

Marco Angioli , Kevin Johansson , Antonello Rosato , Amy Loutfi , Denis Kleyko This is my paper

Pith reviewed 2026-05-14 19:05 UTC · model grok-4.3

classification 💻 cs.LG

keywords contextual banditshyperdimensional computingprobabilistic learninglow-precisionresource-constrained deviceson-device learningonline decision making

0 comments

The pith

A probabilistic update rule lets hyperdimensional contextual bandits run on low-precision hardware while outperforming binarized versions and approaching full performance with 3 bits per component.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Contextual bandits solve sequential decision problems under partial feedback, but standard linear algorithms and even hyperdimensional variants demand too much memory and precision for small devices. The paper replaces the accumulation-based learning rule in HD-CB with a probabilistic version that updates only a random subset of vector components according to a time-decaying probability and keeps every component value bounded inside a fixed range. This change removes the need for periodic binarization, prevents overflow, and scales the update cost down with the fraction of components touched. Off-policy tests on standard synthetic benchmarks show the new rule beats binarized HD-CB at identical precision and nearly matches the original high-precision HD-CB once each component uses only three bits. The result matters because it opens a practical path for running adaptive decision agents directly on sensors and edge hardware that cannot afford floating-point arithmetic or large memory.

Core claim

The paper claims that replacing deterministic accumulation in HD-CB with a probabilistic update rule—selecting a random subset of components at each step, applying a time-decaying update probability, and clamping values to a fixed interval [-k, +k]—yields a low-precision algorithm that consistently outperforms its binarized counterpart at the same bit width and approaches the regret performance of full-precision HD-CB with as few as three bits per component, as measured by off-policy evaluation on standardized synthetic contextual bandit benchmarks.

What carries the argument

The probabilistic update rule that selects random vector components for update with a time-decaying probability and bounds their values to a fixed range, replacing accumulation and eliminating periodic binarization.

If this is right

Expected update cost drops in direct proportion to the fraction of components selected at each step.
Memory and energy budgets shrink because low-precision fixed-range values replace high-precision accumulation.
Decision quality improves over binarized HD-CB at every tested bit width on the evaluated benchmarks.
Deployment on strictly constrained hardware becomes feasible without sacrificing the convergence speed of the original HD-CB approach.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same bounded probabilistic update could be tested on other hyperdimensional tasks such as sequence prediction or anomaly detection to see whether the precision savings transfer.
Hardware measurements of actual energy per decision on microcontrollers would quantify gains that synthetic regret curves cannot capture.
Adjusting the decay schedule of the update probability per environment might further reduce regret in non-stationary settings.

Load-bearing premise

The probabilistic rule with random subset selection and time-decaying probability preserves enough learning information to match or exceed binarized HD-CB without introducing bias or instability that would appear only outside synthetic benchmarks.

What would settle it

An on-device experiment in which probabilistic HD-CB produces higher cumulative regret or visible instability than binarized HD-CB at the same precision would falsify the performance claim.

Figures

Figures reproduced from arXiv: 2605.13346 by Amy Loutfi, Antonello Rosato, Denis Kleyko, Kevin Johansson, Marco Angioli.

**Figure 2.** Figure 2: Memory footprint (KiB, log scale) vs. context di [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗

read the original abstract

Contextual bandits (CB) are online sequential decision-making problems under partial feedback that underpin many adaptive services. There is a growing demand to deploy CB agents directly on-device, under strict constraints on memory, compute, and energy. However, standard linear CB algorithms are often impractical for resource-constrained devices with their unfavorable scaling in computational and memory costs. Recently, HD-CB, a CB approach based on hyperdimensional computing principles, has been proposed to model and solve CB problems by moving into high-dimensional spaces. HD-CB offers faster convergence, favorable scalability, and improves memory efficiency compared to linear CB algorithms. However, its learning rule is accumulation-based: the values of action vectors grow over time, requiring high precision. While periodic binarization can prevent overflow in low-precision components, it may discard important information about magnitudes and degrade decision quality. This paper introduces probabilistic HD-CB, a low-precision variant that replaces deterministic accumulation with a probabilistic update rule. At each step, only a random subset of vector components is updated, with a time-decaying update probability, and component values are constrained to a predefined range [-k,+k]. This approach enables low-precision components, prevents overflow without periodic binarization, and reduces the expected update cost in proportion to the fraction of updated components. Off-policy evaluation on standardized synthetic CB benchmarks using the Open Bandit Pipeline shows that probabilistic HD-CB consistently outperforms binarized HD-CB at equal precision, while approaching the performance of HD-CB with as few as 3 bits per component.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Probabilistic updates let HD-CB run at low precision without binarization, but off-policy tests leave online stability unproven.

read the letter

The core move here is swapping deterministic accumulation in hyperdimensional contextual bandits for a probabilistic rule: at each step only a random subset of components gets updated, the update probability decays over time, and values stay clamped inside [-k, +k]. This directly targets the overflow and precision problems that force either high-bit storage or periodic binarization in the earlier HD-CB work. The description of how this also lowers expected compute per step is straightforward and practical for edge hardware. The off-policy results on synthetic benchmarks from the Open Bandit Pipeline show the new version beating binarized HD-CB at matched precision and coming close to full-precision HD-CB with only 3 bits per component, which is the sort of concrete deployment gain that matters for IoT devices. The evaluation stays on fixed logs, so it cannot reveal whether the stochastic subset selection and decay create bias that then shifts action choices and the data distribution in true closed-loop interaction. The abstract also gives no variance numbers, significance tests, or exact hyperparameter schedules, which makes the reported gains hard to weigh. This is aimed at researchers who need online learning on memory- and precision-constrained hardware. Anyone already working with hyperdimensional methods or resource-aware bandits would get value from the update rule itself. I would send it to peer review; the idea is clear and the target problem is real, even if the experiments need tightening to address the online case.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes probabilistic HD-CB as a low-precision variant of hyperdimensional computing-based contextual bandits (HD-CB). It replaces deterministic accumulation with a probabilistic update rule that selects a random subset of vector components according to a time-decaying probability, clamps component values to [-k, +k], and thereby avoids overflow and reduces expected update cost. Off-policy evaluation on synthetic benchmarks via the Open Bandit Pipeline is reported to show consistent outperformance over binarized HD-CB at equal precision and performance approaching that of full HD-CB with as few as 3 bits per component.

Significance. If the performance claims are confirmed under closed-loop conditions, the work would offer a practical route to deploying contextual bandits on memory- and energy-constrained devices, extending the applicability of HD-CB while preserving its favorable scaling properties. The reliance on standardized benchmarks is a methodological strength that supports reproducibility.

major comments (2)

[Abstract and evaluation description] Abstract and evaluation description: the reported outperformance rests entirely on off-policy replay of fixed synthetic logs. Because the probabilistic subset selection and decay schedule alter action probabilities, the resulting data distribution in true online interaction may differ systematically from the logged data; no closed-loop experiments are described that would detect such bias or instability.
[Method and experimental sections] Method and experimental sections: the update rule introduces free parameters k and the decay schedule whose values are not reported with sensitivity analysis or default settings; without these details it is impossible to determine whether the claimed gains at 3-bit precision are robust or depend on post-hoc tuning.

minor comments (2)

The abstract states that results are 'consistent' but supplies no information on number of runs, variance, or statistical tests; adding these would strengthen the presentation.
Notation for the time-decaying probability and the precise clamping operation should be given explicitly with an equation in the method section.

Simulated Author's Rebuttal

2 responses · 0 unresolved

Thank you for the constructive feedback. We address each major comment below with clarifications and planned revisions.

read point-by-point responses

Referee: Abstract and evaluation description: the reported outperformance rests entirely on off-policy replay of fixed synthetic logs. Because the probabilistic subset selection and decay schedule alter action probabilities, the resulting data distribution in true online interaction may differ systematically from the logged data; no closed-loop experiments are described that would detect such bias or instability.

Authors: We acknowledge the limitation of relying solely on off-policy replay via the Open Bandit Pipeline. While this is the standard methodology for reproducible contextual bandit evaluation on synthetic logs and controls for policy-induced biases in the logged data, the probabilistic updates could indeed induce distribution shifts in true closed-loop interaction. We will revise the evaluation description and add an explicit limitations paragraph noting this point and recommending closed-loop validation as future work. No new experiments are added at this stage. revision: partial
Referee: Method and experimental sections: the update rule introduces free parameters k and the decay schedule whose values are not reported with sensitivity analysis or default settings; without these details it is impossible to determine whether the claimed gains at 3-bit precision are robust or depend on post-hoc tuning.

Authors: We apologize for the omission. The default settings used were k=3 (clamping bound) and a linear decay of update probability from 0.5 to 0.05 over the horizon. We will revise the method section to state these values explicitly and add a sensitivity analysis subsection in the experiments, reporting performance for k in {1,2,3,4,5} and alternative decay schedules. This will confirm robustness of the reported 3-bit gains. revision: yes

Circularity Check

0 steps flagged

No significant circularity; probabilistic update rule and benchmark results are independently defined and evaluated

full rationale

The paper introduces probabilistic HD-CB via an explicit new update rule (random subset selection with time-decaying probability and clamping to [-k, +k]) that replaces the accumulation-based rule of prior HD-CB. Performance claims are supported solely by off-policy evaluation on external standardized synthetic benchmarks via the Open Bandit Pipeline, with no equations, parameters, or results reducing by construction to fitted inputs, self-definitions, or load-bearing self-citations. The derivation chain from rule definition to reported outperformance remains self-contained against the cited benchmarks.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 0 invented entities

The approach rests on standard contextual bandit assumptions and hyperdimensional vector operations while introducing a new probabilistic update rule whose parameters (bound k and decay schedule) are design choices without independent derivation.

free parameters (2)

k
Bound on component values [-k, +k] chosen to constrain precision and prevent overflow.
update probability decay schedule
Time-decaying probability for component updates is a tunable design parameter.

axioms (1)

domain assumption Contextual bandit problems can be effectively modeled and solved using hyperdimensional vector representations
Directly extends the HD-CB framework without re-deriving its foundations.

pith-pipeline@v0.9.0 · 5585 in / 1343 out tokens · 43595 ms · 2026-05-14T19:05:43.806442+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

probabilistic update rule... random subset... time-decaying update probability... constrained to [-κ,+κ]

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

30 extracted references · 30 canonical work pages

[1]

A contextual-bandit approach to personalized news article recommendation,

L. Liet al., “A contextual-bandit approach to personalized news article recommendation,” inInt. Conf. on World Wide Web, 2010, pp. 661–670

work page 2010
[2]

Artwork personalization at Netflix,

F. Amatet al., “Artwork personalization at Netflix,” inACM Conference on Recommender Systems (RecSys), 2018, p. 487–488

work page 2018
[3]

Multiworld testing decision service: A system for experimentation, learning, and decision-making,

A. Agarwalet al., “Multiworld testing decision service: A system for experimentation, learning, and decision-making,”Whitepaper of Microsoft, pp. 1–40, 2016

work page 2016
[4]

How The New York Times is experimenting with recommendation algorithms,

A. Coenen, “How The New York Times is experimenting with recommendation algorithms,” 2019, accessed: 2026-01-13. [Online]. Available: https://tinyurl.com/536zku2f

work page 2019
[5]

How we boosted app revenue by 10% with real-time personalization,

J. Runge, “How we boosted app revenue by 10% with real-time personalization,” 2020, accessed: 2026-01-13. [Online]. Available: https://tinyurl.com/433eawyb

work page 2020
[6]

Automatic hardware accelerators reconfiguration through LinearUCB algorithms on a RISC-V processor,

M. Angioliet al., “Automatic hardware accelerators reconfiguration through LinearUCB algorithms on a RISC-V processor,” inConf. on PhD Res. in Microelect. and Elect. (PRIME), 2023, pp. 169–172

work page 2023
[7]

Contextual-bandit anomaly detection for IoT data in distributed hierarchical edge computing,

M. V . Ngoet al., “Contextual-bandit anomaly detection for IoT data in distributed hierarchical edge computing,” inIEEE Int. Conf. on Distributed Computing Systems (ICDCS), 2020, pp. 1227–1230

work page 2020
[8]

Edge computing in the dark: Leveraging contextual- combinatorial bandit and coded computing,

C.-S. Yanget al., “Edge computing in the dark: Leveraging contextual- combinatorial bandit and coded computing,”IEEE/ACM Transactions on Networking, vol. 29, no. 3, pp. 1022–1031, 2021

work page 2021
[9]

Contextual bandits for adapting treatment in a mouse model of de novo carcinogenesis,

A. Durandet al., “Contextual bandits for adapting treatment in a mouse model of de novo carcinogenesis,” inMachine Learning for Healthcare (MLHC), 2018, pp. 67–82

work page 2018
[10]

Efficient implementation of LinearUCB through algo- rithmic improvements and vector computing acceleration for embedded learning systems,

M. Angioliet al., “Efficient implementation of LinearUCB through algo- rithmic improvements and vector computing acceleration for embedded learning systems,”ACM Transactions on Embedded Computing Systems, vol. 24, no. 4, pp. 1–23, 2025

work page 2025
[11]

Hyperdimensional computing: An introduction to comput- ing in distributed representation with high-dimensional random vectors,

P. Kanerva, “Hyperdimensional computing: An introduction to comput- ing in distributed representation with high-dimensional random vectors,” Cognitive Computation, vol. 1, no. 2, pp. 139–159, 2009

work page 2009
[12]

A survey on hyperdimensional computing aka vector symbolic architectures, Part I: Models and data transformations,

D. Kleykoet al., “A survey on hyperdimensional computing aka vector symbolic architectures, Part I: Models and data transformations,”ACM Computing Surveys, vol. 55, no. 6, pp. 1–40, 2022

work page 2022
[13]

HD-CB: The first exploration of hyperdimensional computing for contextual bandits problems,

M. Angioliet al., “HD-CB: The first exploration of hyperdimensional computing for contextual bandits problems,”IEEE Open Journal of the Computer Society, vol. 7, pp. 105–116, 2026

work page 2026
[14]

Randomness in neural networks: An overview,

S. Scardapane and D. Wang, “Randomness in neural networks: An overview,”Data Min. Know. Disc., vol. 7, no. 2, pp. 1–18, 2017

work page 2017
[15]

Reducing catastrophic forgetting with associative learn- ing,

Y . Shenet al., “Reducing catastrophic forgetting with associative learn- ing,”Neural Computation, vol. 35, no. 11, pp. 1797–1819, 2023

work page 2023
[16]

HD-CBBIN: A lightweight approach for contextual bandit learning in real-time applications,

M. Angioliet al., “HD-CBBIN: A lightweight approach for contextual bandit learning in real-time applications,” inInt. Joint Conf. on Neur. Netw. (IJCNN), 2025, pp. 1–8

work page 2025
[17]

A binary self-organizing map and its FPGA implemen- tation,

K. Appiahet al., “A binary self-organizing map and its FPGA implemen- tation,” inInt. Joint Conf. on Neur. Netw. (IJCNN), 2009, pp. 164–171

work page 2009
[18]

Integer self-organizing maps for digital hardware,

D. Kleykoet al., “Integer self-organizing maps for digital hardware,” in Int. Joint Conf. on Neur. Netw. (IJCNN), 2019, pp. 1–8

work page 2019
[19]

Survey on applications of multi-armed and contextual bandits,

D. Bouneffoufet al., “Survey on applications of multi-armed and contextual bandits,” inIEEE Congr. Evol. Comput., 2020, p. 1–8

work page 2020
[20]

From ads to interventions: Contextual bandits in mobile health,

A. Tewari and S. A. Murphy, “From ads to interventions: Contextual bandits in mobile health,” inMobile Health: Sensors, Analytic Methods, and Applications, 2017, pp. 495–517

work page 2017
[21]

DroneBandit: Multi-armed contextual bandits for col- laborative edge-to-cloud inference in resource-constrained nanodrones,

G. Chacunet al., “DroneBandit: Multi-armed contextual bandits for col- laborative edge-to-cloud inference in resource-constrained nanodrones,” inGreat Lakes Symposium on VLSI (GLSVLSI), 2024, p. 98–104

work page 2024
[22]

To ask or not to ask: Human-in-the-loop contextual bandits with applications in robot-assisted feeding,

R. Banerjeeet al., “To ask or not to ask: Human-in-the-loop contextual bandits with applications in robot-assisted feeding,” inIEEE Interna- tional Conf. on Robotics and Automation (ICRA), 2025, pp. 1378–1384

work page 2025
[23]

Unbiased offline evaluation of contextual-bandit-based news article recommendation algorithms,

L. Liet al., “Unbiased offline evaluation of contextual-bandit-based news article recommendation algorithms,” inACM Int. Conf. on Web Search and Data Mining (WSDM), 2011, pp. 297–306

work page 2011
[24]

A comparison of vector symbolic architectures,

K. Schlegelet al., “A comparison of vector symbolic architectures,” Artificial Intelligence Review, vol. 55, no. 6, pp. 4523–4555, 2022

work page 2022
[25]

Multiplicative binding, representation operators & anal- ogy,

R. W. Gayler, “Multiplicative binding, representation operators & anal- ogy,” inAdvances in Analogy Research: Integration of Theory and Data from the Cognitive, Computational, and Neural Sciences, 1998, pp. 1–4

work page 1998
[26]

Vector symbolic architectures as a computing frame- work for emerging hardware,

D. Kleykoet al., “Vector symbolic architectures as a computing frame- work for emerging hardware,”Proceedings of the IEEE, vol. 110, no. 10, pp. 1538–1571, 2022

work page 2022
[27]

Sparse binary distributed encoding of scalars,

D. A. Rachkovskijet al., “Sparse binary distributed encoding of scalars,” J. Autom. Inf. Sci., vol. 37, no. 6, pp. 12–23, 2005

work page 2005
[28]

Density encoding enables resource-efficient randomly connected neural networks,

D. Kleykoet al., “Density encoding enables resource-efficient randomly connected neural networks,”IEEE Transactions on Neural Networks and Learning Systems, vol. 32, no. 8, pp. 3777–3783, 2021

work page 2021
[29]

A family of binary spatter codes,

P. Kanerva, “A family of binary spatter codes,” inInt. Conf. on Artificial Neural Networks (ICANN), 1995, pp. 517–522

work page 1995
[30]

Open bandit dataset and pipeline: Towards realistic and reproducible off-policy evaluation,

Y . Saitoet al., “Open bandit dataset and pipeline: Towards realistic and reproducible off-policy evaluation,” inAdvances in Neural Information Processing Systems (NeurIPS), 2021, pp. 1–14

work page 2021

[1] [1]

A contextual-bandit approach to personalized news article recommendation,

L. Liet al., “A contextual-bandit approach to personalized news article recommendation,” inInt. Conf. on World Wide Web, 2010, pp. 661–670

work page 2010

[2] [2]

Artwork personalization at Netflix,

F. Amatet al., “Artwork personalization at Netflix,” inACM Conference on Recommender Systems (RecSys), 2018, p. 487–488

work page 2018

[3] [3]

Multiworld testing decision service: A system for experimentation, learning, and decision-making,

A. Agarwalet al., “Multiworld testing decision service: A system for experimentation, learning, and decision-making,”Whitepaper of Microsoft, pp. 1–40, 2016

work page 2016

[4] [4]

How The New York Times is experimenting with recommendation algorithms,

A. Coenen, “How The New York Times is experimenting with recommendation algorithms,” 2019, accessed: 2026-01-13. [Online]. Available: https://tinyurl.com/536zku2f

work page 2019

[5] [5]

How we boosted app revenue by 10% with real-time personalization,

J. Runge, “How we boosted app revenue by 10% with real-time personalization,” 2020, accessed: 2026-01-13. [Online]. Available: https://tinyurl.com/433eawyb

work page 2020

[6] [6]

Automatic hardware accelerators reconfiguration through LinearUCB algorithms on a RISC-V processor,

M. Angioliet al., “Automatic hardware accelerators reconfiguration through LinearUCB algorithms on a RISC-V processor,” inConf. on PhD Res. in Microelect. and Elect. (PRIME), 2023, pp. 169–172

work page 2023

[7] [7]

Contextual-bandit anomaly detection for IoT data in distributed hierarchical edge computing,

M. V . Ngoet al., “Contextual-bandit anomaly detection for IoT data in distributed hierarchical edge computing,” inIEEE Int. Conf. on Distributed Computing Systems (ICDCS), 2020, pp. 1227–1230

work page 2020

[8] [8]

Edge computing in the dark: Leveraging contextual- combinatorial bandit and coded computing,

C.-S. Yanget al., “Edge computing in the dark: Leveraging contextual- combinatorial bandit and coded computing,”IEEE/ACM Transactions on Networking, vol. 29, no. 3, pp. 1022–1031, 2021

work page 2021

[9] [9]

Contextual bandits for adapting treatment in a mouse model of de novo carcinogenesis,

A. Durandet al., “Contextual bandits for adapting treatment in a mouse model of de novo carcinogenesis,” inMachine Learning for Healthcare (MLHC), 2018, pp. 67–82

work page 2018

[10] [10]

Efficient implementation of LinearUCB through algo- rithmic improvements and vector computing acceleration for embedded learning systems,

M. Angioliet al., “Efficient implementation of LinearUCB through algo- rithmic improvements and vector computing acceleration for embedded learning systems,”ACM Transactions on Embedded Computing Systems, vol. 24, no. 4, pp. 1–23, 2025

work page 2025

[11] [11]

Hyperdimensional computing: An introduction to comput- ing in distributed representation with high-dimensional random vectors,

P. Kanerva, “Hyperdimensional computing: An introduction to comput- ing in distributed representation with high-dimensional random vectors,” Cognitive Computation, vol. 1, no. 2, pp. 139–159, 2009

work page 2009

[12] [12]

A survey on hyperdimensional computing aka vector symbolic architectures, Part I: Models and data transformations,

D. Kleykoet al., “A survey on hyperdimensional computing aka vector symbolic architectures, Part I: Models and data transformations,”ACM Computing Surveys, vol. 55, no. 6, pp. 1–40, 2022

work page 2022

[13] [13]

HD-CB: The first exploration of hyperdimensional computing for contextual bandits problems,

M. Angioliet al., “HD-CB: The first exploration of hyperdimensional computing for contextual bandits problems,”IEEE Open Journal of the Computer Society, vol. 7, pp. 105–116, 2026

work page 2026

[14] [14]

Randomness in neural networks: An overview,

S. Scardapane and D. Wang, “Randomness in neural networks: An overview,”Data Min. Know. Disc., vol. 7, no. 2, pp. 1–18, 2017

work page 2017

[15] [15]

Reducing catastrophic forgetting with associative learn- ing,

Y . Shenet al., “Reducing catastrophic forgetting with associative learn- ing,”Neural Computation, vol. 35, no. 11, pp. 1797–1819, 2023

work page 2023

[16] [16]

HD-CBBIN: A lightweight approach for contextual bandit learning in real-time applications,

M. Angioliet al., “HD-CBBIN: A lightweight approach for contextual bandit learning in real-time applications,” inInt. Joint Conf. on Neur. Netw. (IJCNN), 2025, pp. 1–8

work page 2025

[17] [17]

A binary self-organizing map and its FPGA implemen- tation,

K. Appiahet al., “A binary self-organizing map and its FPGA implemen- tation,” inInt. Joint Conf. on Neur. Netw. (IJCNN), 2009, pp. 164–171

work page 2009

[18] [18]

Integer self-organizing maps for digital hardware,

D. Kleykoet al., “Integer self-organizing maps for digital hardware,” in Int. Joint Conf. on Neur. Netw. (IJCNN), 2019, pp. 1–8

work page 2019

[19] [19]

Survey on applications of multi-armed and contextual bandits,

D. Bouneffoufet al., “Survey on applications of multi-armed and contextual bandits,” inIEEE Congr. Evol. Comput., 2020, p. 1–8

work page 2020

[20] [20]

From ads to interventions: Contextual bandits in mobile health,

A. Tewari and S. A. Murphy, “From ads to interventions: Contextual bandits in mobile health,” inMobile Health: Sensors, Analytic Methods, and Applications, 2017, pp. 495–517

work page 2017

[21] [21]

DroneBandit: Multi-armed contextual bandits for col- laborative edge-to-cloud inference in resource-constrained nanodrones,

G. Chacunet al., “DroneBandit: Multi-armed contextual bandits for col- laborative edge-to-cloud inference in resource-constrained nanodrones,” inGreat Lakes Symposium on VLSI (GLSVLSI), 2024, p. 98–104

work page 2024

[22] [22]

To ask or not to ask: Human-in-the-loop contextual bandits with applications in robot-assisted feeding,

R. Banerjeeet al., “To ask or not to ask: Human-in-the-loop contextual bandits with applications in robot-assisted feeding,” inIEEE Interna- tional Conf. on Robotics and Automation (ICRA), 2025, pp. 1378–1384

work page 2025

[23] [23]

Unbiased offline evaluation of contextual-bandit-based news article recommendation algorithms,

L. Liet al., “Unbiased offline evaluation of contextual-bandit-based news article recommendation algorithms,” inACM Int. Conf. on Web Search and Data Mining (WSDM), 2011, pp. 297–306

work page 2011

[24] [24]

A comparison of vector symbolic architectures,

K. Schlegelet al., “A comparison of vector symbolic architectures,” Artificial Intelligence Review, vol. 55, no. 6, pp. 4523–4555, 2022

work page 2022

[25] [25]

Multiplicative binding, representation operators & anal- ogy,

R. W. Gayler, “Multiplicative binding, representation operators & anal- ogy,” inAdvances in Analogy Research: Integration of Theory and Data from the Cognitive, Computational, and Neural Sciences, 1998, pp. 1–4

work page 1998

[26] [26]

Vector symbolic architectures as a computing frame- work for emerging hardware,

D. Kleykoet al., “Vector symbolic architectures as a computing frame- work for emerging hardware,”Proceedings of the IEEE, vol. 110, no. 10, pp. 1538–1571, 2022

work page 2022

[27] [27]

Sparse binary distributed encoding of scalars,

D. A. Rachkovskijet al., “Sparse binary distributed encoding of scalars,” J. Autom. Inf. Sci., vol. 37, no. 6, pp. 12–23, 2005

work page 2005

[28] [28]

Density encoding enables resource-efficient randomly connected neural networks,

D. Kleykoet al., “Density encoding enables resource-efficient randomly connected neural networks,”IEEE Transactions on Neural Networks and Learning Systems, vol. 32, no. 8, pp. 3777–3783, 2021

work page 2021

[29] [29]

A family of binary spatter codes,

P. Kanerva, “A family of binary spatter codes,” inInt. Conf. on Artificial Neural Networks (ICANN), 1995, pp. 517–522

work page 1995

[30] [30]

Open bandit dataset and pipeline: Towards realistic and reproducible off-policy evaluation,

Y . Saitoet al., “Open bandit dataset and pipeline: Towards realistic and reproducible off-policy evaluation,” inAdvances in Neural Information Processing Systems (NeurIPS), 2021, pp. 1–14

work page 2021