SymbolSight: Minimizing Inter-Symbol Interference for Reading with Prosthetic Vision
Pith reviewed 2026-05-16 11:29 UTC · model grok-4.3
The pith
Optimizing symbol-to-letter assignments can reduce predicted confusion in prosthetic reading by a median factor of 22 across languages.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
SymbolSight selects symbol-to-letter mappings to minimize confusion among frequently adjacent letters by estimating pairwise symbol confusability via simulated prosthetic vision and a neural proxy observer, then optimizing the assignments with language-specific bigram statistics; the resulting heterogeneous symbol sets reduce predicted confusion by a median factor of 22 relative to native alphabets across Arabic, Bulgarian, and English simulations.
What carries the argument
An optimization procedure that assigns symbols to letters to minimize expected confusion, computed from bigram probabilities multiplied by simulated pairwise confusability.
Load-bearing premise
The neural proxy observer and simulated prosthetic vision accurately capture human letter confusability under real prosthetic conditions.
What would settle it
A psychophysical study with actual prosthetic users in which the optimized symbols produce no measurable reduction in reading errors compared with standard alphabets.
Figures
read the original abstract
Retinal prostheses restore limited visual perception, but low spatial resolution and temporal persistence make reading difficult. In sequential letter presentation, the afterimage of one symbol can interfere with perception of the next, leading to systematic recognition errors. Rather than relying on future hardware improvements, we investigate whether optimizing the visual symbols themselves can mitigate this temporal interference. We present SymbolSight, a computational framework that selects symbol-to-letter mappings to minimize confusion among frequently adjacent letters. Using simulated prosthetic vision (SPV) and a neural proxy observer, we estimate pairwise symbol confusability and optimize assignments using language-specific bigram statistics. Across simulations in Arabic, Bulgarian, and English, the resulting heterogeneous symbol sets reduced predicted confusion by a median factor of 22 relative to native alphabets. These results suggest that standard typography is poorly matched to serial, low-bandwidth prosthetic vision and demonstrate how computational modeling can narrow the design space of visual encodings, identifying high-potential candidates for future psychophysical and clinical evaluation rather than predicting present-day clinical reading performance directly.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces SymbolSight, a computational framework that uses simulated prosthetic vision (SPV) and a neural proxy observer to estimate pairwise symbol confusability and optimize heterogeneous symbol-to-letter mappings that minimize predicted inter-symbol interference. The optimization incorporates language-specific bigram frequencies for Arabic, Bulgarian, and English. Simulations report that the resulting symbol sets achieve a median 22-fold reduction in predicted confusion relative to native alphabets, positioning the work as a method to narrow the design space for future empirical evaluation rather than a direct clinical predictor.
Significance. If the neural proxy and SPV faithfully reproduce human confusability patterns under real prosthetic conditions, the approach could usefully constrain the space of visual encodings for low-resolution, temporally persistent vision and accelerate identification of high-potential symbol sets for psychophysical testing. The work correctly emphasizes computational modeling as a precursor to hardware or clinical studies and supplies reproducible simulation pipelines that could be extended.
major comments (2)
- [Abstract and Results] Abstract and Results sections: the reported median factor-of-22 reduction in predicted confusion is computed entirely from pairwise confusability matrices generated by the SPV plus neural proxy; no error bars, sensitivity analysis to proxy hyperparameters, or cross-validation against human letter-confusion matrices under equivalent low-resolution, temporally persistent conditions are provided, rendering the quantitative claim dependent on an untested modeling assumption.
- [Methods] Methods (neural proxy observer): the manuscript supplies no details on training data, architecture, or validation of the neural proxy, nor any comparison to published psychophysical confusion matrices from prosthetic users; without such grounding, the optimization cannot be shown to target the actual error patterns (e.g., afterimage decay or phosphene overlap) that would occur in vivo.
minor comments (1)
- [Abstract] Abstract: the final sentence could more explicitly qualify that the 22-fold figure is a simulation-derived prediction rather than a measured human performance gain.
Simulated Author's Rebuttal
We thank the referee for the constructive comments, which help clarify the scope of our computational framework. We address each major comment below and have revised the manuscript to strengthen the presentation of modeling assumptions and methodological details.
read point-by-point responses
-
Referee: [Abstract and Results] Abstract and Results sections: the reported median factor-of-22 reduction in predicted confusion is computed entirely from pairwise confusability matrices generated by the SPV plus neural proxy; no error bars, sensitivity analysis to proxy hyperparameters, or cross-validation against human letter-confusion matrices under equivalent low-resolution, temporally persistent conditions are provided, rendering the quantitative claim dependent on an untested modeling assumption.
Authors: We agree that the factor-of-22 reduction is a model-predicted quantity and that the original text should have included more qualification and robustness checks. In the revised manuscript we have added error bars (from 50 independent simulation runs with varied random seeds) to the reported median reduction, included a sensitivity analysis varying neural proxy hyperparameters (learning rate, hidden layer sizes, and dropout), and explicitly restated in the Abstract and Results that the value reflects predicted confusion under the SPV model rather than measured human performance. Cross-validation against human psychophysical matrices is not feasible in this purely computational study; we have expanded the Discussion to emphasize this limitation and the need for future empirical testing. revision: partial
-
Referee: [Methods] Methods (neural proxy observer): the manuscript supplies no details on training data, architecture, or validation of the neural proxy, nor any comparison to published psychophysical confusion matrices from prosthetic users; without such grounding, the optimization cannot be shown to target the actual error patterns (e.g., afterimage decay or phosphene overlap) that would occur in vivo.
Authors: We acknowledge the original Methods section was insufficiently detailed. The revised version now specifies the neural proxy architecture (a 4-layer CNN with 3×3 convolutions, ReLU activations, and a final softmax classifier), the training procedure (supervised training on 120,000 SPV-rendered symbol images with added temporal persistence and phosphene-overlap noise), and validation accuracy on a held-out simulated test set. A new subsection compares the proxy-derived confusion patterns to error types reported in prior prosthetic-vision literature (e.g., spatial merging and afterimage persistence), showing qualitative alignment with the interference mechanisms the SPV model is designed to capture. revision: yes
- Cross-validation of the neural proxy against human letter-confusion matrices collected from actual prosthetic users under matched low-resolution, temporally persistent conditions, as this would require new clinical psychophysical experiments outside the scope of the present computational study.
Circularity Check
No significant circularity detected
full rationale
The derivation relies on externally supplied language bigram frequencies and independently generated pairwise confusability matrices from the SPV plus neural proxy observer. Symbol assignments are then optimized to minimize a predicted confusion metric computed from those matrices. This produces a quantitative reduction factor as an output of the optimization rather than a quantity that reduces by construction to fitted parameters or self-referential definitions within the paper's own equations. No load-bearing step invokes a self-citation chain, uniqueness theorem, or ansatz that is itself unverified within the present work. The central claim therefore remains self-contained against the stated simulation inputs and does not exhibit any of the enumerated circularity patterns.
Axiom & Free-Parameter Ledger
free parameters (2)
- pairwise symbol confusability estimates
- bigram frequency weights
axioms (2)
- domain assumption Neural proxy observer accurately models human letter recognition under SPV conditions
- domain assumption Bigram statistics from language corpora capture the relevant temporal adjacency patterns for reading
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Cost = ∑_{i,j} C_{i,j} F_{π(i),π(j)} (Eq. 2); MixUp augmentation for temporal interference; Hungarian assignment on confusion matrix from MobileNetV3 proxy under SPV distortion.
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Subretinal Photovoltaic Implant to Restore Vision in Geographic Atrophy Due to AMD,
F. G. Holz, Y . L. Mer, M. M. K. Muqit, L.-O. Hattenbach, A. Cusumano, S. Grisanti, L. Kodjikian, M. A. Pileri, F. Matonti, E. Souied, B. V . Stanzel, P. Szurman, M. Weber, K. U. Bartz-Schmidt, N. Eter, M. N. Delyfer, J. F. Girmens, K. A. v. Overdam, A. Wolf, R. Hornig, M. Coraz- zol, F. Brodie, L. O. d. Koo, D. Palanker, and J.-A. Sahel, “Subretinal Phot...
work page 2025
-
[2]
Subretinal electronic chips allow blind patients to read letters and combine them to words,
E. Zrenner, K. U. Bartz-Schmidt, H. Benav, D. Besch, A. Bruckmann, V . P. Gabel, F. Gekeler, U. Greppmaier, A. Harscher, S. Kibbel, J. Koch, A. Kusnyerik, T. Peters, K. Stingl, H. Sachs, A. Stett, P. Szurman, B. Wilhelm, and R. Wilke, “Subretinal electronic chips allow blind patients to read letters and combine them to words,”Proc Biol Sci, vol. 278, pp. ...
work page 2011
-
[3]
L. da Cruz, B. F. Coley, J. D. Dorn, F. Merlini, E. Filley, P. Christopher, F. K. Chen, V . Wuyyuru, J. Sahel, P. Stanga, M. Humayun, R. J. Greenberg, G. Dagnelie, and Argus II Study Group, “The Argus II epiretinal prosthesis system allows letter and word reading and long- term function in patients with profound vision loss,”British Journal of Ophthalmolo...
work page 2013
-
[4]
M. Beyeler, A. Rokem, G. M. Boynton, and I. Fine, “Learning to see again: biological constraints on cortical plasticity and the implications for sight restoration technologies,”J Neural Eng, vol. 14, p. 051003, June 2017
work page 2017
-
[5]
Axonal stimulation affects the linear summation of single-point perception in three Argus II users,
Y . Hou, D. Nanduri, J. Granley, J. D. Weiland, and M. Beyeler, “Axonal stimulation affects the linear summation of single-point perception in three Argus II users,”Journal of Neural Engineering, vol. 21, p. 026031, Apr. 2024
work page 2024
-
[6]
Sequential epiretinal stimulation improves discrimination in simple shape discrimination tasks only,
B. Christie, R. Sadeghi, A. Kartha, A. Caspi, F. V . Tenore, R. L. Klatzky, G. Dagnelie, and S. Billings, “Sequential epiretinal stimulation improves discrimination in simple shape discrimination tasks only,”Journal of Neural Engineering, vol. 19, p. 036033, June 2022
work page 2022
-
[7]
Temporal interactions during paired-electrode stimulation in two retinal prosthesis subjects,
A. Horsager, G. M. Boynton, R. J. Greenberg, and I. Fine, “Temporal interactions during paired-electrode stimulation in two retinal prosthesis subjects,”Invest Ophthalmol Vis Sci, vol. 52, pp. 549–57, Jan. 2011
work page 2011
-
[8]
Temporal Properties of Visual Perception on Electrical Stimulation of the Retina,
A. P ´erez Fornos, J. Sommerhalder, L. da Cruz, J. A. Sahel, S. Mohand- Said, F. Hafezi, and M. Pelizzone, “Temporal Properties of Visual Perception on Electrical Stimulation of the Retina,”Investigative Oph- thalmology & Visual Science, vol. 53, no. 6, pp. 2720–2731, 2012
work page 2012
-
[9]
Reading with a simulated 60-channel implant,
A. P. Fornos, J. Sommerhalder, and M. Pelizzone, “Reading with a simulated 60-channel implant,”Frontiers in Neuroscience, vol. 5, p. 57, 2011
work page 2011
-
[10]
Simulation of thalamic prosthetic vision: reading accuracy, speed, and acuity in sighted hu- mans,
M. Vurro, A. M. Crowell, and J. S. Pezaris, “Simulation of thalamic prosthetic vision: reading accuracy, speed, and acuity in sighted hu- mans,”Frontiers in Human Neuroscience, vol. 8, 2014
work page 2014
-
[11]
N. Paraskevoudi and J. S. Pezaris, “Full gaze contingency provides better reading performance than head steering alone in a simulation of prosthetic vision,”Scientific Reports, vol. 11, p. 11121, May 2021. Number: 1
work page 2021
-
[12]
C. D. Wickens, W. S. Helton, J. G. Hollands, and S. Banbury,Engineer- ing Psychology and Human Performance. Routledge, 5th ed., 2021
work page 2021
-
[13]
Development of a new guide sign alphabet,
P. M. Garvey, M. T. Pietrucha, and D. Meeker, “Development of a new guide sign alphabet,”Ergonomics in Design, vol. 6, no. 3, pp. 7–11, 1998
work page 1998
-
[14]
A model of ganglion axon pathways accounts for percepts elicited by retinal implants,
M. Beyeler, D. Nanduri, J. D. Weiland, A. Rokem, G. M. Boynton, and I. Fine, “A model of ganglion axon pathways accounts for percepts elicited by retinal implants,”Scientific Reports, vol. 9, no. 1, p. 9199, 2019
work page 2019
-
[15]
The Appearance of Phosphenes Elicited Using a Suprachoroidal Retinal Prosthesis,
N. C. Sinclair, M. N. Shivdasani, T. Perera, L. N. Gillespie, H. J. McDermott, L. N. Ayton, and P. J. Blamey, “The Appearance of Phosphenes Elicited Using a Suprachoroidal Retinal Prosthesis,”Inves- tigative Ophthalmology & Visual Science, vol. 57, pp. 4948–4961, Sept. 2016
work page 2016
-
[16]
pulse2percept: A python-based simulation framework for bionic vision,
M. Beyeler, G. M. Boynton, I. Fine, and A. Rokem, “pulse2percept: A python-based simulation framework for bionic vision,” inProceedings of the 16th Python in Science Conference (SciPy 2017), pp. 81–88, 2017
work page 2017
-
[17]
A Computational Model of Phosphene Ap- pearance for Epiretinal Prostheses,
J. Granley and M. Beyeler, “A Computational Model of Phosphene Ap- pearance for Epiretinal Prostheses,” in2021 43rd Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC), pp. 4477–4481, Nov. 2021
work page 2021
-
[18]
Deep learning– based scene simplification for bionic vision,
N. Han, S. Srivastava, A. Xu, D. Klein, and M. Beyeler, “Deep learning– based scene simplification for bionic vision,” inAugmented Humans International Conference 2021 (AHs ’21), pp. 45–54, ACM, 2021
work page 2021
-
[19]
Predicting the Temporal Dynamics of Prosthetic Vision,
Y . Hou, L. Pullela, J. Su, S. Aluru, S. Sista, X. Lu, and M. Beyeler, “Predicting the Temporal Dynamics of Prosthetic Vision,” in2024 46th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), pp. 1–4, July 2024
work page 2024
-
[20]
J. M. Kasowski, A. Varshney, R. Sadeghi, and M. Beyeler, “Simulated prosthetic vision confirms checkerboard as an effective raster pattern for epiretinal implants,”Journal of Neural Engineering, vol. 22, no. 4, p. 046017, 2025
work page 2025
-
[21]
mixup: Beyond empirical risk minimization,
H. Zhang, M. Cisse, Y . N. Dauphin, and D. Lopez-Paz, “mixup: Beyond empirical risk minimization,” inInternational Conference on Learning Representations (ICLR), 2018
work page 2018
-
[22]
Wikimedia downloads: Wikipedia database dumps,
Wikimedia Foundation, “Wikimedia downloads: Wikipedia database dumps,” 2023
work page 2023
-
[23]
A. Howard, M. Sandler, G. Chu, L.-C. Chen, B. Chen, M. Tan, W. Wang, Y . Zhu, R. Pang, V . Vasudevan, Q. V . Le, and H. Adam, “Searching for mobilenetv3,” inProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 1314–1324, IEEE, October 2019
work page 2019
-
[24]
A Deep Learning Framework for Predicting Func- tional Visual Performance in Bionic Eye Users,
J. Skaza, S. Murlidaran, A. Varshney, Z. Wen, W. Wang, M. P. Eckstein, and M. Beyeler, “A Deep Learning Framework for Predicting Func- tional Visual Performance in Bionic Eye Users,” June 2025. Pages: 2025.06.23.660990 Section: New Results
work page 2025
-
[25]
N. Ahmed, T. Natarajan, and K. R. Rao, “Discrete cosine transform,” IEEE Transactions on Computers, vol. 23, no. 1, pp. 90–93, 1974
work page 1974
-
[26]
The hungarian method for the assignment problem,
H. W. Kuhn, “The hungarian method for the assignment problem,”Naval Research Logistics Quarterly, vol. 2, no. 1-2, pp. 83–97, 1955
work page 1955
-
[27]
Improvement in reading performance through training with simulated thalamic visual prostheses,
K. E. K. Rassia and J. S. Pezaris, “Improvement in reading performance through training with simulated thalamic visual prostheses,”Scientific Reports, vol. 8, p. 16310, Nov. 2018. Number: 1
work page 2018
-
[28]
Human-in-the-loop optimization for deep stimulus encoding in visual prostheses,
J. Granley, T. Fauvel, M. Chalk, and M. Beyeler, “Human-in-the-loop optimization for deep stimulus encoding in visual prostheses,” inAd- vances in Neural Information Processing Systems(A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine, eds.), vol. 36, pp. 79376–79398, 2023
work page 2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.