pith. sign in

arxiv: 1907.06286 · v1 · pith:KTFJQVSCnew · submitted 2019-07-14 · 🧬 q-bio.NC · cs.CV· cs.LG· cs.SD· eess.AS

Autoencoding sensory substitution

Pith reviewed 2026-05-24 21:22 UTC · model grok-4.3

classification 🧬 q-bio.NC cs.CVcs.LGcs.SDeess.AS
keywords sensory substitutionautoencodersdeep learningimage-to-soundvisual impairmentauditory perceptionrehabilitation
0
0 comments X

The pith

Deep recurrent autoencoders convert images to short sounds enabling above-chance visual task performance after hours of training.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops deep recurrent autoencoders to perform image-to-sound conversion for sensory substitution. It shortens the substituting audio signals and incorporates computational hearing models to make visual features more perceptually distinct to the auditory system. In experiments, a blindfolded participant reached above-chance accuracy on hand posture discrimination after five days and on reaching movements after only a few hours of training.

Core claim

By training deep recurrent autoencoders for image-to-sound conversion while constraining the visual space and integrating computational hearing models, the authors demonstrated above-chance-level accuracy in hand posture discrimination and reaching movements after only a few hours of training.

What carries the argument

Deep recurrent autoencoders that map constrained visual inputs to shortened audio signals optimized for perceptual discernibility by the human auditory system.

Load-bearing premise

The audio signals produced by the trained autoencoders map to perceptually distinct auditory components that the human auditory system and brain can rapidly adapt to for visual tasks.

What would settle it

A controlled test in which participants trained with the autoencoder signals fail to reach above-chance accuracy on the hand posture or reaching tasks.

Figures

Figures reproduced from arXiv: 1907.06286 by Lauri Parkkonen, Viktor T\'oth.

Figure 1
Figure 1. Figure 1: Relations between quantities of information theory. Reprinted from [51]. [PITH_FULL_IMAGE:figures/full_fig_p015_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Receptive fields of ganglion and simple cells. Adapted from [62]. Comparison of the visual cortex of sighted and blind individuals poses difficulty, due to the recent lack, or complete absence of visual experience in the case of late and congenitally blind, respectively. Cross-modal plasticity is then driven by this lack of input to establish functional connections with other sensory brain regions, which i… view at source ↗
Figure 3
Figure 3. Figure 3: Two-dimensional Gabor filter. Biologically-inspired edge detection models incorporate the early hierarchy of the visual system, including retinal ganglion cells, LGN and V1 simple cells. Azzopardi and others [70] argues that the Gabor filter and other convolutional approaches ignore the functionality of LGN neurons and fail to emulate simple cell properties, such as cross-orientation suppression, response … view at source ↗
Figure 4
Figure 4. Figure 4: The output of three edge detection algorithms and the input image. Sobel and [PITH_FULL_IMAGE:figures/full_fig_p019_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Fletcher-Munson equal-loudness contours shown in blue, the latest ISO 226:2003 [PITH_FULL_IMAGE:figures/full_fig_p021_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Azimuth and elevation address the interaural polar coordinate system as latitude [PITH_FULL_IMAGE:figures/full_fig_p025_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: An acoustic goniometer from World War I. It served as a hearing extension, [PITH_FULL_IMAGE:figures/full_fig_p026_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: A disentangled example of an explicit SS conversion function, which we preliminary [PITH_FULL_IMAGE:figures/full_fig_p037_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Illustration of an implicit conversion method, trained on the classi [PITH_FULL_IMAGE:figures/full_fig_p038_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: SS devices spread along the axes of substitution delay and visual space abstraction. [PITH_FULL_IMAGE:figures/full_fig_p042_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Depiction of an autoencoder. Autoencoder models are trained to reconstruct [PITH_FULL_IMAGE:figures/full_fig_p047_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: VAE learned manifold of the MNIST dataset. A VAE with a two-dimensional [PITH_FULL_IMAGE:figures/full_fig_p048_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: The architecture of the DRAW model [35], unfolded for two recurrent iterations. [PITH_FULL_IMAGE:figures/full_fig_p049_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: In V2A SS, visual information is conveyed through sound to aid the blind. [PITH_FULL_IMAGE:figures/full_fig_p055_14.png] view at source ↗
Figure 15
Figure 15. Figure 15: Unrolled architecture of AEV2A. Compared to the DRAW model (Figure [PITH_FULL_IMAGE:figures/full_fig_p056_15.png] view at source ↗
Figure 16
Figure 16. Figure 16: Images from our hand posture dataset and their corresponding contour repre [PITH_FULL_IMAGE:figures/full_fig_p058_16.png] view at source ↗
Figure 17
Figure 17. Figure 17: ITD and ILD by sound location azimuth. ITD calculated according to the [PITH_FULL_IMAGE:figures/full_fig_p060_17.png] view at source ↗
Figure 18
Figure 18. Figure 18: A random generated soundscape of three soundstreams with 25% overlap and 4 [PITH_FULL_IMAGE:figures/full_fig_p062_18.png] view at source ↗
Figure 19
Figure 19. Figure 19: Simple fitted model of sound localization error by azimuth. The amount of noise imposed on ˜θm is set to be proportional to the localization error [PITH_FULL_IMAGE:figures/full_fig_p063_19.png] view at source ↗
Figure 20
Figure 20. Figure 20: Probability density of Gaussian distributions of the applied binaural noising at [PITH_FULL_IMAGE:figures/full_fig_p064_20.png] view at source ↗
Figure 21
Figure 21. Figure 21: The minimalistic apparatus used to synthesize the dataset of hand postures. [PITH_FULL_IMAGE:figures/full_fig_p067_21.png] view at source ↗
Figure 22
Figure 22. Figure 22: Example images and corresponding contour representations from the table [PITH_FULL_IMAGE:figures/full_fig_p068_22.png] view at source ↗
Figure 23
Figure 23. Figure 23: The author wearing the blindfold. The mask provided total visual abstinence. [PITH_FULL_IMAGE:figures/full_fig_p070_23.png] view at source ↗
Figure 24
Figure 24. Figure 24: Experimental setup of the reaching movement case study. A Google Cardboard [PITH_FULL_IMAGE:figures/full_fig_p071_24.png] view at source ↗
Figure 25
Figure 25. Figure 25: Examples of posterior collapse. AEV2A reconstructed images of the hand [PITH_FULL_IMAGE:figures/full_fig_p073_25.png] view at source ↗
Figure 26
Figure 26. Figure 26: The effect of sequence length on the reconstruction loss computed on the test set. For each grid (a) and V1 (b) writer attention model, the sequence length is indicated by the coloring above. Note that the offsets of the Y axes are different [PITH_FULL_IMAGE:figures/full_fig_p073_26.png] view at source ↗
Figure 27
Figure 27. Figure 27: Reconstruction loss of the best performing models, comparing hearing ( [PITH_FULL_IMAGE:figures/full_fig_p074_27.png] view at source ↗
Figure 28
Figure 28. Figure 28: Relationship between decoded visual and corresponding audio features. The [PITH_FULL_IMAGE:figures/full_fig_p075_28.png] view at source ↗
Figure 29
Figure 29. Figure 29: FM vectors across soundscapes of the beer can and gear audio representations. [PITH_FULL_IMAGE:figures/full_fig_p076_29.png] view at source ↗
Figure 30
Figure 30. Figure 30: Example distributions of ground azimuth values (a, b) and modulation intensity [PITH_FULL_IMAGE:figures/full_fig_p077_30.png] view at source ↗
read the original abstract

Tens of millions of people live blind, and their number is ever increasing. Visual-to-auditory sensory substitution (SS) encompasses a family of cheap, generic solutions to assist the visually impaired by conveying visual information through sound. The required SS training is lengthy: months of effort is necessary to reach a practical level of adaptation. There are two reasons for the tedious training process: the elongated substituting audio signal, and the disregard for the compressive characteristics of the human hearing system. To overcome these obstacles, we developed a novel class of SS methods, by training deep recurrent autoencoders for image-to-sound conversion. We successfully trained deep learning models on different datasets to execute visual-to-auditory stimulus conversion. By constraining the visual space, we demonstrated the viability of shortened substituting audio signals, while proposing mechanisms, such as the integration of computational hearing models, to optimally convey visual features in the substituting stimulus as perceptually discernible auditory components. We tested our approach in two separate cases. In the first experiment, the author went blindfolded for 5 days, while performing SS training on hand posture discrimination. The second experiment assessed the accuracy of reaching movements towards objects on a table. In both test cases, above-chance-level accuracy was attained after a few hours of training. Our novel SS architecture broadens the horizon of rehabilitation methods engineered for the visually impaired. Further improvements on the proposed model shall yield hastened rehabilitation of the blind and a wider adaptation of SS devices as a consequence.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript proposes training deep recurrent autoencoders for visual-to-auditory sensory substitution, with visual-space constraints and integration of computational hearing models, to produce shortened audio signals whose components are perceptually discernible. It reports two experiments (hand-posture discrimination after 5 days of blindfolded training; reaching movements) claiming above-chance accuracy after only a few hours, contrasting with the months typically required.

Significance. If the empirical results prove robust and the assumed perceptual mapping is validated, the work could meaningfully shorten adaptation times for sensory-substitution devices and thereby increase their practical utility. The core idea of using autoencoders to optimize the substitution signal is a coherent direction, but the current manuscript supplies none of the quantitative or mechanistic evidence needed to evaluate that potential.

major comments (2)
  1. [Abstract] Abstract: the central claim that 'above-chance-level accuracy was attained after a few hours of training' is stated without any reported accuracies, error bars, trial counts, dataset sizes, model hyperparameters, or statistical tests. Because the soundness of the empirical result is the load-bearing element of the paper, this omission prevents evaluation of whether the data support the claim.
  2. [Experiments] Experiments (both cases): the manuscript invokes 'perceptually discernible auditory components' produced by the autoencoders (after hearing-model integration) to explain the rapid adaptation, yet provides no implementation details of the hearing models, no psychoacoustic validation of distinctiveness, and no intermediate metrics (e.g., feature separability) that would distinguish this mechanism from task simplicity or subject familiarity.
minor comments (1)
  1. [Abstract] Abstract: the sentence 'the author went blindfolded for 5 days' is ambiguous; it should specify whether this refers to the first author, a subject, or both.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments. We address each major point below and will revise the manuscript to improve clarity and completeness of the reported results.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the central claim that 'above-chance-level accuracy was attained after a few hours of training' is stated without any reported accuracies, error bars, trial counts, dataset sizes, model hyperparameters, or statistical tests. Because the soundness of the empirical result is the load-bearing element of the paper, this omission prevents evaluation of whether the data support the claim.

    Authors: We agree that the abstract should be self-contained. The experiments section reports the accuracies, trial counts, and statistical comparisons supporting above-chance performance, along with dataset sizes and model details in the methods. In revision we will move key quantitative results (accuracies with error bars, trial numbers, and p-values) into the abstract while keeping it concise. revision: yes

  2. Referee: [Experiments] Experiments (both cases): the manuscript invokes 'perceptually discernible auditory components' produced by the autoencoders (after hearing-model integration) to explain the rapid adaptation, yet provides no implementation details of the hearing models, no psychoacoustic validation of distinctiveness, and no intermediate metrics (e.g., feature separability) that would distinguish this mechanism from task simplicity or subject familiarity.

    Authors: We will add the implementation details of the hearing-model integration (including equations and parameter choices) to the methods section. The manuscript does not contain separate psychoacoustic validation experiments or feature-separability metrics; the evidence for rapid adaptation rests on the behavioral outcomes. We will revise the discussion to explicitly note this scope limitation and clarify that the proposed mechanism is supported indirectly by the shortened training times rather than by direct perceptual tests. revision: partial

Circularity Check

0 steps flagged

No circularity: purely empirical neural-network training study

full rationale

The paper reports training deep recurrent autoencoders on image-to-sound conversion tasks, followed by two behavioral experiments (hand posture discrimination and reaching movements) that achieved above-chance accuracy after short training. No equations, derivations, first-principles predictions, or fitted parameters renamed as outputs appear in the abstract or described methods. Claims rest on experimental results rather than any reduction of predictions to inputs by construction. No self-citation chains or ansatzes are invoked as load-bearing steps. This is a standard empirical ML application paper with no mathematical derivation chain to inspect for circularity.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 0 invented entities

The claim depends on the empirical success of neural-network training whose hyperparameters and data constraints are not specified, plus the untested assumption that the resulting sounds will be rapidly interpretable by human listeners.

free parameters (2)
  • recurrent autoencoder architecture and training hyperparameters
    Layer counts, hidden dimensions, learning rate, and sequence length chosen to produce usable audio outputs.
  • visual space constraint parameters
    Limits placed on input image content or resolution to enable shorter audio signals.
axioms (1)
  • domain assumption Neural networks trained on image-sound pairs can produce audio encodings that align with human auditory perceptual categories.
    Required for the claim that the substituting stimulus will be 'perceptually discernible auditory components'.

pith-pipeline@v0.9.0 · 5805 in / 1250 out tokens · 27526 ms · 2026-05-24T21:22:25.284066+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

216 extracted references · 216 canonical work pages · 24 internal anchors

  1. [1]

    Vision impairment and blindness,

    WHO, “Vision impairment and blindness,” Oct. 2018. [On- line]. A vailable: http://www.who.int/news-room/fact-sheets/detail/ blindness-and-visual-impairment

  2. [2]

    Restoring vision,

    B. Roska and J.-A. Sahel, “Restoring vision,”Nature, vol. 557, no. 7705, pp. 359–367, 2018

  3. [3]

    Rehabilitation of lost functional vision with the Argus II retinal prosthesis,

    M. Markowitz, M. Rankin, M. Mongy, B. E. Patino, J. Manusow, R. G. Devenyi, and S. N. Markowitz, “Rehabilitation of lost functional vision with the Argus II retinal prosthesis,”Canadian Journal of Ophthalmology. Journal Canadien D’ophtalmologie, vol. 53, no. 1, pp. 14–22, Feb. 2018

  4. [4]

    Are Supramodality and Cross-Modal Plasticity the Yin and Yang of Brain Development? From Blindness to Rehabilitation,

    L. Cecchetti, R. Kupers, M. Ptito, P. Pietrini, and E. Ricciardi, “Are Supramodality and Cross-Modal Plasticity the Yin and Yang of Brain Development? From Blindness to Rehabilitation,”Frontiers in Systems Neuroscience, vol. 10, Nov. 2016. [Online]. A vailable: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5099160/

  5. [5]

    Enhanced sensory perception in synaesthesia,

    M. J. Banissy, V. Walsh, and J. Ward, “Enhanced sensory perception in synaesthesia,”Experimental Brain Research, vol. 196, no. 4, pp. 565–571, Jul

  6. [6]

    A vailable: https://doi.org/10.1007/s00221-009-1888-0

    [Online]. A vailable: https://doi.org/10.1007/s00221-009-1888-0

  7. [7]

    Transforming 3d Coloured Pixels into Musical Instrument Notes for Vision Substitution Applications,

    G. Bologna, B. Deville, T. Pun, and M. Vinckenbosch, “Transforming 3d Coloured Pixels into Musical Instrument Notes for Vision Substitution Applications,”EURASIP Journal on Image and Video Processing, vol. 2007, no. 1, p. 076204, Aug. 2007. [Online]. A vailable: https://doi.org/10.1155/2007/76204

  8. [8]

    Integration and binding in rehabilitative sensory substitution: Increasing resolution using a new Zooming-in approach,

    G. Buchs, S. Maidenbaum, S. Levy-Tzedek, and A. Amedi, “Integration and binding in rehabilitative sensory substitution: Increasing resolution using a new Zooming-in approach,”Restorative Neurology and Neuroscience, vol. 34, no. 1, pp. 97–105, Jan. 2016. [Online]. A vailable: https://content.iospress. com/articles/restorative-neurology-and-neuroscience/rnn150592

  9. [9]

    The Vibe: A Versatile Vision-to-Audition Sensory Substitution Device,

    S. Hanneton, M. Auvray, and B. Durette, “The Vibe: A Versatile Vision-to-Audition Sensory Substitution Device,” 2010. [Online]. A vailable: https://www.hindawi.com/journals/abb/2010/282341/abs/

  10. [10]

    An experimental system for auditory image representations,

    P. B. L. Meijer, “An experimental system for auditory image representations,” IEEE Transactions on Biomedical Engineering, vol. 39, no. 2, pp. 112–121, Feb. 1992

  11. [11]

    Visual experiences in the blind induced by an auditory sensory substitution device,

    J. Ward and P. Meijer, “Visual experiences in the blind induced by an auditory sensory substitution device,”Consciousness and Cognition, vol. 19, no. 1, pp. 492–500, Mar. 2010. [Online]. A vailable: http: //www.sciencedirect.com/science/article/pii/S1053810009001718 77

  12. [12]

    Audio–Vision Substitution for Blind Individu- als: Addressing Human Information Processing Capacity Limitations,

    D. J. Brown and M. J. Proulx, “Audio–Vision Substitution for Blind Individu- als: Addressing Human Information Processing Capacity Limitations,”IEEE Journal of Selected Topics in Signal Processing, vol. 10, no. 5, pp. 924–931, Aug. 2016

  13. [13]

    Developmental changes in the multisensory temporal binding window persist into adolescence,

    A. Hillock-Dunn and M. T. Wallace, “Developmental changes in the multisensory temporal binding window persist into adolescence,” Developmental science, vol. 15, no. 5, pp. 688–696, Sep. 2012. [Online]. A vailable:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4013750/

  14. [14]

    Multisensory perceptual learning and sensory substitution,

    M. J. Proulx, D. J. Brown, A. Pasqualotto, and P. Meijer, “Multisensory perceptual learning and sensory substitution,”Neuroscience & Biobehavioral Reviews, vol. 41, pp. 16–25, Apr. 2014. [Online]. A vailable: http: //www.sciencedirect.com/science/article/pii/S0149763412002072

  15. [15]

    R. F. Lyon,Human and Machine Hearing, Apr

  16. [16]

    A vailable: /core/books/human-and-machine-hearing/ 3660166B40020EE587D94BB7A309FC12

    [Online]. A vailable: /core/books/human-and-machine-hearing/ 3660166B40020EE587D94BB7A309FC12

  17. [17]

    Visual-auditory interaction in speeded classification: Role of stimulus difference,

    E. Ben-Artzi and L. E. Marks, “Visual-auditory interaction in speeded classification: Role of stimulus difference,”Perception & Psychophysics, vol. 57, no. 8, pp. 1151–1162, Nov. 1995. [Online]. A vailable: https://doi.org/10.3758/BF03208371

  18. [18]

    Effects of some variations in auditory input upon visual choice reaction time,

    I. H. Bernstein and B. A. Edelstein, “Effects of some variations in auditory input upon visual choice reaction time,”Journal of Experimental Psychology, vol. 87, no. 2, pp. 241–247, 1971

  19. [19]

    Crossmodal binding of audio-visual correspondent features,

    K. K. Evans and A. Treisman, “Crossmodal binding of audio-visual correspondent features,”Journal of Vision, vol. 5, no. 8, pp. 874–874, Sep

  20. [20]

    A vailable: https://jov.arvojournals.org/article.aspx?articleid= 2132676

    [Online]. A vailable: https://jov.arvojournals.org/article.aspx?articleid= 2132676

  21. [21]

    Cross-modality matching of brightness and loudness

    J. C. Stevens and L. E. Marks, “Cross-modality matching of brightness and loudness. ”Proceedings of the National Academy of Sciences of the United States of America, vol. 54, no. 2, pp. 407–411, Aug. 1965. [Online]. A vailable: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC219679/

  22. [22]

    Does bigger mean louder? Crossmodal congruency and memory judgment,

    L. Brunel, “Does bigger mean louder? Crossmodal congruency and memory judgment,”Multisensory Research, vol. 26, no. 0, pp. 67–68, May 2013. [Online]. A vailable: http://booksandjournals.brillonline.com/content/journals/ 10.1163/22134808-000s0045

  23. [23]

    Concurrent Encoding of Frequency and Amplitude Modulation in Human Auditory Cortex: MEG Evidence,

    H. Luo, Y. Wang, D. Poeppel, and J. Z. Simon, “Concurrent Encoding of Frequency and Amplitude Modulation in Human Auditory Cortex: MEG Evidence,”Journal of Neurophysiology, vol. 96, no. 5, pp. 2712–2723, Nov

  24. [24]

    A vailable: https://www.physiology.org/doi/full/10.1152/jn

    [Online]. A vailable: https://www.physiology.org/doi/full/10.1152/jn. 01256.2005 78

  25. [25]

    Subcortical neural coding mechanisms for auditory temporal processing,

    R. D. Frisina, “Subcortical neural coding mechanisms for auditory temporal processing,”Hearing Research, vol. 158, no. 1, pp. 1–27, Aug

  26. [26]

    A vailable: http://www.sciencedirect.com/science/article/pii/ S0378595501002969

    [Online]. A vailable: http://www.sciencedirect.com/science/article/pii/ S0378595501002969

  27. [27]

    Auditory abstraction from spectro-temporal features to coding auditory entities,

    G. Chechik and I. Nelken, “Auditory abstraction from spectro-temporal features to coding auditory entities,”Proceedings of the National Academy of Sciences of the United States of America, vol. 109, no. 46, pp. 18 968–18 973, Nov. 2012. [Online]. A vailable: https: //www.ncbi.nlm.nih.gov/pmc/articles/PMC3503225/

  28. [28]

    An anatomical and functional topography of human auditory cortical areas,

    M. Moerel, F. De Martino, and E. Formisano, “An anatomical and functional topography of human auditory cortical areas,”Frontiers in Neuroscience, vol. 8, Jul. 2014. [Online]. A vailable: https://www.ncbi.nlm.nih.gov/pmc/ articles/PMC4114190/

  29. [29]

    A comparison of detection and discrimination of temporal asymmetry in amplitude modulation,

    M. A. Akeroyd and R. D. Patterson, “A comparison of detection and discrimination of temporal asymmetry in amplitude modulation,”The Journal of the Acoustical Society of America, vol. 101, no. 1, pp. 430–439, Jan. 1997. [Online]. A vailable: http://asa.scitation.org/doi/10.1121/1.417988

  30. [30]

    Encoding of frequency-modulation (FM) rates in human auditory cortex,

    H. Okamoto and R. Kakigi, “Encoding of frequency-modulation (FM) rates in human auditory cortex,”Scientific Reports, vol. 5, p. 18143, Dec. 2015. [Online]. A vailable: https://www.nature.com/articles/srep18143

  31. [31]

    Interactive coding of visual spatial frequency and auditory amplitude-modulation rate,

    E. Guzman-Martinez, L. Ortega, M. Grabowecky, J. Mossbridge, and S. Suzuki, “Interactive coding of visual spatial frequency and auditory amplitude-modulation rate,”Current Biology, vol. 22, no. 5, pp. 383–388, Mar. 2012. [Online]. A vailable: https://www.ncbi.nlm.nih.gov/pmc/articles/ PMC3298604/

  32. [32]

    Designing sensory-substitution devices: Principles, pitfalls and potential,

    A. Kristjańsson, A. Moldoveanu, O. I. Jo ´hannesson, O. Balan, S. Spagnol, V. V. Valgeirsdo´ttir, and R. Unnthorsson, “Designing sensory-substitution devices: Principles, pitfalls and potential,”Restorative Neurology and Neuroscience, vol. 34, no. 5, pp. 769–787, 2016

  33. [33]

    The State of the Art of Sensory Substitution,

    M. Auvray and L. R. Harris, “The State of the Art of Sensory Substitution,” Multisensory Research, vol. 27, no. 5-6, pp. 265–269, Nov. 2014. [Online]. A vailable:http://booksandjournals.brillonline.com/content/journals/10.1163/ 22134808-00002464

  34. [34]

    The Nature of Consciousness in the Visually Deprived Brain,

    R. Kupers, P. Pietrini, E. Ricciardi, and M. Ptito, “The Nature of Consciousness in the Visually Deprived Brain,”Frontiers in Psychology, vol. 2, 2011. [Online]. A vailable:https://www.frontiersin.org/articles/10.3389/fpsyg.2011.00019/full

  35. [35]

    Genuine and drug-induced synesthesia: A comparison,

    C. Sinke, J. H. Halpern, M. Zedler, J. Neufeld, H. M. Emrich, and T. Passie, “Genuine and drug-induced synesthesia: A comparison,”Consciousness and Cognition, vol. 21, no. 3, pp. 1419–1434, Sep. 2012. [Online]. A vailable: http://www.sciencedirect.com/science/article/pii/S1053810012000669 79

  36. [36]

    Homological scaffolds of brain functional networks,

    G. Petri, P. Expert, F. Turkheimer, R. Carhart-Harris, D. Nutt, P. J. Hellyer, and F. Vaccarino, “Homological scaffolds of brain functional networks,” Journal of The Royal Society Interface, vol. 11, no. 101, p. 20140873, Dec

  37. [37]

    A vailable: http://rsif.royalsocietypublishing.org/content/11/ 101/20140873

    [Online]. A vailable: http://rsif.royalsocietypublishing.org/content/11/ 101/20140873

  38. [38]

    Generalized learning of visual-to-auditory substitution in sighted individuals,

    J.-K. Kim and R. J. Zatorre, “Generalized learning of visual-to-auditory substitution in sighted individuals,”Brain Research, vol. 1242, pp. 263–275, Nov. 2008

  39. [39]

    Reading with sounds: sensory substitution selectively activates the visual word form area in the blind,

    E. Striem-Amit, L. Cohen, S. Dehaene, and A. Amedi, “Reading with sounds: sensory substitution selectively activates the visual word form area in the blind,” Neuron, vol. 76, no. 3, pp. 640–652, Nov. 2012

  40. [40]

    ImageNet classification with deep convolutional neural networks

    A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet classification with deep convolutional neural networks. ” Curran Associates Inc., Dec. 2012, pp. 1097–1105. [Online]. A vailable: http://dl.acm.org/citation.cfm?id=2999134. 2999257

  41. [41]

    DRAW: A Recurrent Neural Network For Image Generation

    K. Gregor, I. Danihelka, A. Graves, D. J. Rezende, and D. Wierstra, “DRA W: A Recurrent Neural Network For Image Generation,”arXiv:1502.04623 [cs], Feb. 2015, arXiv: 1502.04623. [Online]. A vailable: http://arxiv.org/abs/1502.04623

  42. [42]

    Two-tone suppression in the basilar membrane of the cochlea: mechanical basis of auditory-nerve rate suppression,

    M. A. Ruggero, L. Robles, and N. C. Rich, “Two-tone suppression in the basilar membrane of the cochlea: mechanical basis of auditory-nerve rate suppression,” Journal of Neurophysiology, vol. 68, no. 4, pp. 1087–1099, Oct. 1992

  43. [43]

    Gerstner and W

    W. Gerstner and W. M. Kistler,Spiking neuron models: Single neurons, populations, plasticity. Cambridge university press, 2002

  44. [44]

    A Theory of How Columns in the Neocortex Enable Learning the Structure of the World,

    J. Hawkins, S. Ahmad, and Y. Cui, “A Theory of How Columns in the Neocortex Enable Learning the Structure of the World,”Frontiers in Neural Circuits, vol. 11, 2017. [Online]. A vailable: https://www.frontiersin.org/ articles/10.3389/fncir.2017.00081/full

  45. [45]

    Impact of blindness onset on the functional organization and the connectivity of the occipital cortex,

    O. Collignon, G. Dormal, G. Albouy, G. Vandewalle, P. Voss, C. Phillips, and F. Lepore, “Impact of blindness onset on the functional organization and the connectivity of the occipital cortex,”Brain: A Journal of Neurology, vol. 136, no. Pt 9, pp. 2769–2783, Sep. 2013

  46. [46]

    Use of sensory substitution devices as a model system for investigating cross-modal neuroplasticity in humans,

    A. C. Nau, M. C. Murphy, and K. C. Chan, “Use of sensory substitution devices as a model system for investigating cross-modal neuroplasticity in humans,”Neural Regeneration Research, vol. 10, no. 11, pp. 1717–1719, Nov. 2015. [Online]. A vailable: https: //www.ncbi.nlm.nih.gov/pmc/articles/PMC4705765/

  47. [47]

    Shape conveyed by 80 visual-to-auditory sensory substitution activates the lateral occipital complex,

    A. Amedi, W. M. Stern, J. A. Camprodon, F. Bermpohl, L. Merabet, S. Rotman, C. Hemond, P. Meijer, and A. Pascual-Leone, “Shape conveyed by 80 visual-to-auditory sensory substitution activates the lateral occipital complex,” Nature Neuroscience, vol. 10, no. 6, pp. 687–689, Jun. 2007. [Online]. A vailable:https://www.nature.com/articles/nn1912

  48. [48]

    Cortical plasticity and preserved function in early blindness,

    L. Renier, A. G. De Volder, and J. P. Rauschecker, “Cortical plasticity and preserved function in early blindness,”Neuroscience & Biobehavioral Reviews, vol. 41, pp. 53–63, Apr. 2014. [Online]. A vailable: http://www.sciencedirect.com/science/article/pii/S0149763413000328

  49. [49]

    Central visual pathways,

    R. H Wurtz and E. R Kandel, “Central visual pathways,”Principles of Neural Science, 01 2000

  50. [50]

    Functional recruitment of visual cortex for sound encoded object identification in the blind,

    L. B. Merabet, L. Battelli, S. Obretenova, S. Maguire, P. Meijer, and A. Pascual-Leone, “Functional recruitment of visual cortex for sound encoded object identification in the blind,”Neuroreport, vol. 20, no. 2, pp. 132–138, Jan. 2009. [Online]. A vailable: https://www.ncbi.nlm.nih.gov/pmc/articles/ PMC3951767/

  51. [51]

    Functional specialization for auditory–spatial processing in the occipital cortex of congenitally blind humans,

    O. Collignon, G. Vandewalle, P. Voss, G. Albouy, G. Charbonneau, M. Lassonde, and F. Lepore, “Functional specialization for auditory–spatial processing in the occipital cortex of congenitally blind humans,”Proceedings of the National Academy of Sciences, vol. 108, no. 11, pp. 4435–4440, Mar

  52. [52]

    A vailable: http://www.pnas.org/content/108/11/4435

    [Online]. A vailable: http://www.pnas.org/content/108/11/4435

  53. [53]

    How Visual Is the Visual Cortex? Comparing Connectional and Functional Fingerprints between Congenitally Blind and Sighted Individuals,

    X. Wang, M. V. Peelen, Z. Han, C. He, A. Caramazza, and Y. Bi, “How Visual Is the Visual Cortex? Comparing Connectional and Functional Fingerprints between Congenitally Blind and Sighted Individuals,”The Journal of Neuro- science: The Official Journal of the Society for Neuroscience, vol. 35, no. 36, pp. 12 545–12 559, Sep. 2015

  54. [54]

    Corticocortical connec- tions mediate primary visual cortex responses to auditory stimulation in the blind,

    C. Klinge, F. Eippert, B. Röder, and C. Büchel, “Corticocortical connec- tions mediate primary visual cortex responses to auditory stimulation in the blind,”The Journal of Neuroscience: The Official Journal of the Society for Neuroscience, vol. 30, no. 38, pp. 12 798–12 805, Sep. 2010

  55. [55]

    Maturation of visual callosal connections in visually deprived kittens: a challenging critical period,

    G. M. Innocenti, D. O. Frost, and J. Illes, “Maturation of visual callosal connections in visually deprived kittens: a challenging critical period,”Journal of Neuroscience, vol. 5, no. 2, pp. 255–267, Feb. 1985. [Online]. A vailable: http://www.jneurosci.org/content/5/2/255

  56. [56]

    Information Coding,

    B. Richmond, “Information Coding,”Science, vol. 294, no. 5551, pp. 2493–2494, Dec. 2001. [Online]. A vailable: http://science.sciencemag.org/ content/294/5551/2493

  57. [57]

    R. Q. Quiroga and S. Panzeri,Principles of neural coding. CRC Press, 2013

  58. [58]

    Stone,Principles of Neural Information Theory: Computational Neuroscience and Metabolic Efficiency, 06 2018

    J. Stone,Principles of Neural Information Theory: Computational Neuroscience and Metabolic Efficiency, 06 2018. 81

  59. [59]

    Neural Mechanisms of Selective Visual Attention,

    R. Desimone and J. Duncan, “Neural Mechanisms of Selective Visual Attention,”Annual Review of Neuroscience, vol. 18, no. 1, pp. 193–222, 1995. [Online]. A vailable: https://doi.org/10.1146/annurev.ne.18.030195.001205

  60. [60]

    Segregation of form, color, movement, and depth: anatomy, physiology, and perception,

    M. Livingstone and D. Hubel, “Segregation of form, color, movement, and depth: anatomy, physiology, and perception,”Science, vol. 240, no. 4853, pp. 740–749, May 1988. [Online]. A vailable: http://science.sciencemag.org/ content/240/4853/740

  61. [61]

    Distributed hierarchical processing in the primate cerebral cortex,

    D. J. Felleman and D. C. Van Essen, “Distributed hierarchical processing in the primate cerebral cortex,”Cerebral Cortex (New York, N.Y.: 1991), vol. 1, no. 1, pp. 1–47, Feb. 1991

  62. [62]

    Uniformity of monkey striate cortex: A parallel relationship betweenfield size, scatter, and magnification factor,

    D. H. Hubel and T. N. Wiesel, “Uniformity of monkey striate cortex: A parallel relationship betweenfield size, scatter, and magnification factor,”Journal of Comparative Neurology, vol. 158, no. 3, pp. 295–305, Dec. 1974. [Online]. A vailable:https://onlinelibrary.wiley.com/doi/abs/10.1002/cne.901580305

  63. [63]

    Complete Pattern of Ocular Dominance Columns in Human Primary Visual Cortex,

    D. L. Adams, L. C. Sincich, and J. C. Horton, “Complete Pattern of Ocular Dominance Columns in Human Primary Visual Cortex,”Journal of Neuroscience, vol. 27, no. 39, pp. 10 391–10 403, Sep. 2007. [Online]. A vailable: http://www.jneurosci.org/content/27/39/10391

  64. [64]

    Induction of visual orientation modules in auditory cortex,

    J. Sharma, A. Angelucci, and M. Sur, “Induction of visual orientation modules in auditory cortex,”Nature, vol. 404, no. 6780, pp. 841–847, Apr. 2000. [Online]. A vailable: https://www.nature.com/articles/35009043

  65. [65]

    Separate visual pathways for perception and action,

    M. A. Goodale and A. D. Milner, “Separate visual pathways for perception and action,”Trends in Neurosciences, vol. 15, no. 1, pp. 20–25, Jan

  66. [66]

    A vailable: http://www.sciencedirect.com/science/article/pii/ 0166223692903448

    [Online]. A vailable: http://www.sciencedirect.com/science/article/pii/ 0166223692903448

  67. [67]

    Object vision and spatial vision: two cortical pathways,

    M. Mishkin, L. G. Ungerleider, and K. A. Macko, “Object vision and spatial vision: two cortical pathways,”Trends in Neurosciences, vol. 6, pp. 414–417, Jan. 1983. [Online]. A vailable: http://www.sciencedirect.com/science/article/ pii/016622368390190X

  68. [68]

    Receptivefields and functional architecture of monkey striate cortex,

    D. H. Hubel and T. N. Wiesel, “Receptivefields and functional architecture of monkey striate cortex,”The Journal of Physiology, vol. 195, no. 1, pp. 215–243, Mar. 1968. [Online]. A vailable: https: //physoc.onlinelibrary.wiley.com/doi/abs/10.1113/jphysiol.1968.sp008455

  69. [69]

    Stochasticity, spikes and decoding: sufficiency and utility of order statistics,

    B. J. Richmond, “Stochasticity, spikes and decoding: sufficiency and utility of order statistics,”Biological Cybernetics, vol. 100, no. 6, pp. 447–457, Jun

  70. [70]

    A vailable: https://doi.org/10.1007/s00422-009-0321-x

    [Online]. A vailable: https://doi.org/10.1007/s00422-009-0321-x

  71. [71]

    P. O. Hoyer,Probabilistic models of early vision. Helsinki University of Technology, Nov. 2002. [Online]. A vailable: https://aaltodoc.aalto.fi: 443/handle/123456789/2234 82

  72. [72]

    Functional connectivity of visual cortex in the blind follows retinotopic organization principles,

    E. Striem-Amit, S. Ovadia-Caro, A. Caramazza, D. S. Margulies, A. Villringer, and A. Amedi, “Functional connectivity of visual cortex in the blind follows retinotopic organization principles,”Brain: A Journal of Neurology, vol. 138, no. Pt 6, pp. 1679–1695, Jun. 2015

  73. [73]

    Development and Plasticity of the Primary Visual Cortex,

    J. S. Espinosa and M. Stryker, “Development and Plasticity of the Primary Visual Cortex,”Neuron, vol. 75, no. 2, pp. 230–249, Jul

  74. [74]

    A vailable: http://www.sciencedirect.com/science/article/pii/ S0896627312005697

    [Online]. A vailable: http://www.sciencedirect.com/science/article/pii/ S0896627312005697

  75. [75]

    Neuroplasticity in the blind and sensory substitution for vision,

    E. Striem-Amit, “Neuroplasticity in the blind and sensory substitution for vision,” Ph.D. dissertation, 2014

  76. [76]

    Finding the Edges (Sobel Operator) - Computerphile,

    Computerphile, “Finding the Edges (Sobel Operator) - Computerphile,” 2015. [Online]. A vailable: https://www.youtube.com/watch?v=uihBwtPIBxM

  77. [77]

    Canny Edge Detector - Computerphile,

    Computerphile, “Canny Edge Detector - Computerphile,” 2015. [Online]. A vailable:https://www.youtube.com/watch?v=sRFM5IEqR2w

  78. [78]

    Gábor,Theory of communication

    D. Gábor,Theory of communication. London: Institution of Electrical Engineering, 1946, oCLC: 39115995

  79. [79]

    Gaborfilter-based edge detection,

    R. Mehrotra, K. R. Namuduri, and N. Ranganathan, “Gaborfilter-based edge detection,”Pattern Recognition, vol. 25, no. 12, pp. 1479–1494, Dec

  80. [80]

    A vailable: http://www.sciencedirect.com/science/article/pii/ 003132039290121X

    [Online]. A vailable: http://www.sciencedirect.com/science/article/pii/ 003132039290121X

Showing first 80 references.