pith. sign in

arxiv: 2604.26678 · v1 · submitted 2026-04-29 · 💻 cs.CV

Hearing the Room Through the Shape of the Drum: Modal-Guided Sound Recovery from Multi-Point Surface Vibrations

Pith reviewed 2026-05-07 13:50 UTC · model grok-4.3

classification 💻 cs.CV
keywords sound recoveryvibration sensingspeckle vibrometrymodal analysisvisual microphonessurface vibrationsresonant transfer functionacoustic reconstruction
0
0 comments X

The pith

Multi-point surface vibrations recover original sound by inverting an object's vibrational modes.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper demonstrates a way to extract scene sound from the surface vibrations of ordinary solid objects that respond weakly or with strong resonance. It uses simultaneous multi-point, multi-axis vibration capture to build a model of how the object's vibrational modes shape the observed signals. This model is inverted to undo the object's resonant filtering and combine the measurements into an estimate of the true sound source. The result extends sound recovery to objects that single-point methods cannot handle well.

Core claim

The authors derive a physics-guided vibration formation model that expresses the captured multi-point multi-axis vibrations as the scene sound source filtered by the object's vibrational modes. Inverting the resonant transfer function derived from this model fuses the multiple vibration signals to recover the original sound waveform, yielding better results than single-point speckle vibrometry or standard multi-signal fusion techniques on solid objects with poor vibration responses.

What carries the argument

The modal vibration formation model that links sound source to multi-point vibrations through the object's vibrational modes and supports explicit inversion of the resonant transfer function.

If this is right

  • Solid objects with resonant or weak vibration responses become usable as visual microphones.
  • Fusing multiple surface points improves sound recovery where single-point capture is insufficient.
  • The method produces an estimate of the scene sound rather than a filtered version distorted by the object.
  • Recovery succeeds across a wider set of everyday objects without requiring favorable surface properties.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Objects near a sound source could be used for indirect, non-contact audio capture even when the source itself is not visible.
  • The approach might generalize to dynamic scenes if modal properties can be tracked over time.
  • Combining this inversion with other sensing modalities could reduce reliance on direct line-of-sight to the sound emitter.

Load-bearing premise

The object's vibrational modes can be sufficiently captured and modeled from the multi-point measurements to invert the resonant transfer function accurately for arbitrary solid objects.

What would settle it

A side-by-side comparison of the recovered audio waveform against a direct microphone recording of the same scene sound; large waveform mismatch or poor intelligibility would show the modal inversion failed.

Figures

Figures reproduced from arXiv: 2604.26678 by Mark Sheinin, Matan Kichler, Shai Bagon.

Figure 1
Figure 1. Figure 1: We introduce a novel approach for sound recovery from multi-point, speckle-based vibration measurements. Our system captures view at source ↗
Figure 2
Figure 2. Figure 2: Frequency-dependent coupling of speckle shifts across view at source ↗
Figure 3
Figure 3. Figure 3: Robust mode estimation. (a) Initial mode candidates view at source ↗
Figure 4
Figure 4. Figure 4: Sound recovery from a drumhead. We capture the view at source ↗
Figure 6
Figure 6. Figure 6: The experiment compares reconstructions whose mode view at source ↗
Figure 5
Figure 5. Figure 5: Results across objects having various geometries and view at source ↗
Figure 7
Figure 7. Figure 7: Comparison between our model-based sound recov view at source ↗
Figure 8
Figure 8. Figure 8: Results across objects having various geometries and view at source ↗
read the original abstract

Optical vibration sensing enables recovering the scene sound directly from the surface vibration of nearby objects, turning everyday objects into ``visual microphones''. However, most prior methods had focused on capturing the vibrations of specific objects with highly favorable vibration responses. These include objects where the surface vibrations are generated by the object itself (e.g., speaker membrane or guitar body) or objects consisting of a thin membrane which is highly reactive to sound (e.g., a chip bag or the leaf of a plant). In this paper, we tackle sound recovery for a more challenging class of solid objects whose vibration responses are poor or highly resonant. We simultaneously capture vibrations for multiple surface points on the object using a speckle-based vibrometry imaging system. Then, we derive a novel physics-guided vibration formation model that relates the scene sound source to the captured multi-point multi-axis vibrations via the object's vibrational modes. The model is then used to reverse the resonant transfer function of the vibrating object, fusing multiple vibration signals to estimate the original sound source in the scene. We evaluate our approach by recovering sound from a variety of everyday objects, demonstrating that it significantly outperforms traditional single-point speckle vibrometry in challenging scenarios and other signal-processing-based methods for multi-signal fusing.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces a physics-guided vibration formation model for recovering scene sound sources from multi-point, multi-axis surface vibrations captured via speckle-based vibrometry on solid objects with poor or highly resonant responses. The model relates the incident sound to observed vibrations through the object's vibrational modes; the resonant transfer function is then inverted by fusing the multi-point signals. Evaluations on everyday objects claim significant outperformance relative to single-point speckle vibrometry and other multi-signal fusion baselines.

Significance. If the central claims hold, the work meaningfully broadens optical sound recovery beyond thin-membrane or self-vibrating objects to a wider class of everyday solids. The explicit incorporation of vibrational modes for transfer-function inversion is a constructive strength over purely empirical fusion methods, and the multi-point speckle setup provides a practical sensing advance. Reproducible evaluation across object types would further strengthen the contribution.

major comments (2)
  1. [§3] §3 (vibration formation model derivation): The central inversion step assumes that vibrational modes (shapes, frequencies, damping) can be recovered sufficiently from the limited multi-point, multi-axis speckle measurements to accurately reverse the object's resonant filtering. The manuscript provides no independent validation or observability analysis of the recovered modes against ground-truth modal parameters for arbitrary solids; without this, the physics-guided claim risks circularity with data-driven fitting of the transfer function.
  2. [§5] §5 (experimental evaluation): While outperformance is reported for challenging resonant objects, the results lack quantitative ablation on the number of surface points or axes required for stable mode estimation, nor do they report mode-recovery error metrics (e.g., frequency or shape reconstruction accuracy) separate from final sound SNR. This leaves the load-bearing assumption untested for objects where surface observability is poor.
minor comments (2)
  1. Notation for the modal expansion and transfer-function matrix should be introduced with explicit dimensions and variable definitions at first use to aid readability.
  2. Figure captions for the multi-point vibration visualizations would benefit from indicating the specific object and sound source used in each panel.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment below and have revised the manuscript to incorporate additional validation and analyses as described.

read point-by-point responses
  1. Referee: [§3] §3 (vibration formation model derivation): The central inversion step assumes that vibrational modes (shapes, frequencies, damping) can be recovered sufficiently from the limited multi-point, multi-axis speckle measurements to accurately reverse the object's resonant filtering. The manuscript provides no independent validation or observability analysis of the recovered modes against ground-truth modal parameters for arbitrary solids; without this, the physics-guided claim risks circularity with data-driven fitting of the transfer function.

    Authors: We agree that explicit independent validation strengthens the physics-guided claim and reduces the risk of circularity. In the revised manuscript we have added an observability analysis based on the rank of the measurement matrix formed by the multi-point multi-axis observations, showing that the dominant modes become identifiable with as few as four well-placed points. For the evaluated objects we now report a direct comparison between the estimated modal frequencies and independently measured resonant frequencies obtained via separate hammer-impact tests; the mean frequency error is below 3 Hz for the first three modes. While obtaining full ground-truth mode shapes for arbitrary everyday solids remains experimentally challenging without specialized modal-analysis equipment, the added frequency validation and the consistent SNR gains over single-point baselines support the utility of the modal inversion. revision: yes

  2. Referee: [§5] §5 (experimental evaluation): While outperformance is reported for challenging resonant objects, the results lack quantitative ablation on the number of surface points or axes required for stable mode estimation, nor do they report mode-recovery error metrics (e.g., frequency or shape reconstruction accuracy) separate from final sound SNR. This leaves the load-bearing assumption untested for objects where surface observability is poor.

    Authors: We acknowledge that separate mode-recovery metrics and systematic ablations were missing. The revised experiments now include (i) an ablation varying the number of surface points (1–9) and axes (single-axis vs. tri-axis) while reporting both final sound SNR and per-mode frequency estimation error (MAE in Hz), and (ii) a discussion of objects with poor surface observability (e.g., highly damped or geometrically complex solids) where mode estimation degrades. These results indicate that at least five points are typically required for stable recovery on resonant objects and quantify the degradation when observability is limited, directly addressing the load-bearing assumption. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation relies on standard modal analysis

full rationale

The paper derives a physics-guided vibration formation model from established vibrational modes of solid objects and applies it to invert the resonant transfer function by fusing multi-point measurements. No equations or steps in the abstract or description reduce a prediction to a fitted input by construction, nor does the central claim depend on a self-citation chain or self-definitional loop. Mode estimation from speckle data is presented as an input to the inversion rather than being redefined by it, making the derivation self-contained against external physics benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The central claim rests on the existence of a derivable physics-guided formation model linking sound to multi-point vibrations via modes; details of mode estimation and inversion assumptions are not provided in the abstract.

pith-pipeline@v0.9.0 · 5527 in / 1047 out tokens · 37689 ms · 2026-05-07T13:50:48.515865+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

58 extracted references · 58 canonical work pages

  1. [1]

    Imaging with local speckle intensity correlations: the- ory and practice.ACM Transactions on Graphics (TOG), 40 (3):1–22, 2021

    Marina Alterman, Chen Bar, Ioannis Gkioulekas, and Anat Levin. Imaging with local speckle intensity correlations: the- ory and practice.ACM Transactions on Graphics (TOG), 40 (3):1–22, 2021. 2

  2. [2]

    Springer Berlin, Heidelberg,

    Jacob Benesty, Jingdong Chen, and Yiteng Huang.Micro- phone Array Signal Processing. Springer Berlin, Heidelberg,

  3. [3]

    John Wiley & Sons Singapore Pte

    Jacob Benesty, Israel Cohen, and Jingdong Chen.Funda- mentals of Signal Enhancement and Array Signal Process- ing. John Wiley & Sons Singapore Pte. Ltd., 2017. 2, 6, 7, 11

  4. [4]

    Long-range detection of acoustic vibrations by speckle tracking.Applied optics, 58 (28):7805–7809, 2019

    S Bianchi and E Giacomozzi. Long-range detection of acoustic vibrations by speckle tracking.Applied optics, 58 (28):7805–7809, 2019. 2

  5. [5]

    Sound speeds of solids from ultrasonic pulse receiver measurements

    Jack Denman Borg and Dan Dolan. Sound speeds of solids from ultrasonic pulse receiver measurements. Technical re- port, Sandia National Laboratories, 2025. 3

  6. [6]

    Estimating the material properties of fabric from video

    Katherine L Bouman, Bei Xiao, Peter Battaglia, and William T Freeman. Estimating the material properties of fabric from video. InProceedings of the IEEE international conference on computer vision, pages 1984–1991, 2013. 1

  7. [7]

    Ventura.Frequency-Domain Identification, chapter 10, pages 261–280

    Rune Brincker and Carlos E. Ventura.Frequency-Domain Identification, chapter 10, pages 261–280. John Wiley & Sons, Ltd, 2015. 4, 5

  8. [8]

    Modal identification from ambient responses using frequency do- main decomposition

    Rune Brincker, Lingmi Zhang, and Palle Andersen. Modal identification from ambient responses using frequency do- main decomposition. InIMAC 18: Proceedings of the Inter- national Modal Analysis Conference (IMAC), 2000. 4, 5

  9. [9]

    Smaller than the eye can see: Vibration analysis with video cameras

    Oral Buyukozturk, Justin G Chen, Neal Wadhwa, Abe Davis, Fr´edo Durand, and William T Freeman. Smaller than the eye can see: Vibration analysis with video cameras. InWorld Conference on Non-Destructive Testing 2016, 2016. 1

  10. [10]

    Yates, and Laura Waller

    Mingxuan Cai, Dekel Galor, Amit Pal Singh Kohli, Jacob L. Yates, and Laura Waller. Event2audio: Event-based opti- cal vibration sensing. InIEEE International Conference on Computational Photography, 2025. 1, 2

  11. [11]

    Chen, Neal Wadhwa, Young-Jin Cha, Fr ´edo Du- rand, William T

    Justin G. Chen, Neal Wadhwa, Young-Jin Cha, Fr ´edo Du- rand, William T. Freeman, and Oral Buyukozturk. Modal identification of simple structures with high-speed video us- ing motion magnification.Journal of Sound and Vibration, 345:58–71, 2015. 1

  12. [12]

    Video camera– based vibration measurement for civil infrastructure applica- tions.Journal of Infrastructure Systems, 23(3):B4016013, 2017

    Justin G Chen, Abe Davis, Neal Wadhwa, Fr ´edo Durand, William T Freeman, and Oral B ¨uy¨uk¨ozt¨urk. Video camera– based vibration measurement for civil infrastructure applica- tions.Journal of Infrastructure Systems, 23(3):B4016013, 2017

  13. [13]

    Event-based motion magnification

    Yutian Chen, Shi Guo, Fangzheng Yu, Feng Zhang, Jinwei Gu, and Tianfan Xue. Event-based motion magnification. In European Conference on Computer Vision, pages 428–444. Springer, 2024. 1

  14. [14]

    Speech Processing in Modern Communication

    Israel Cohen, Jacob Benesty, and Sharon Gannot, editors. Speech Processing in Modern Communication. Springer Berlin, Heidelberg, 2010. 6

  15. [15]

    Lothar Cremer, Manfred Heckl, and Bert A. T. Petersson. Structure-Borne Sound: Structural Vibrations and Sound Radiation at Audio Frequencies. Springer-Verlag Berlin Hei- delberg, 3rd edition, 2005. 3

  16. [16]

    The visual microphone: Passive recovery of sound from video.ACM Trans

    Abe Davis, Michael Rubinstein, Neal Wadhwa, Gautham J Mysore, Fredo Durand, and William T Freeman. The visual microphone: Passive recovery of sound from video.ACM Trans. Graph., 2014. 1, 2

  17. [17]

    Image-space modal bases for plausible manipulation of objects in video

    Abe Davis, Justin G Chen, and Fr ´edo Durand. Image-space modal bases for plausible manipulation of objects in video. ACM Transactions on Graphics (TOG), 34(6):1–7, 2015. 1

  18. [18]

    Video magnification in presence of large motions

    Mohamed Elgharib, Mohamed Hefeeda, Fredo Durand, and William T Freeman. Video magnification in presence of large motions. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 4119– 4127, 2015. 1

  19. [19]

    Ewins.Modal Testing: Theory, Practice and Appli- cation

    David J. Ewins.Modal Testing: Theory, Practice and Appli- cation. Research Studies Press, 2nd edition, 2000. 3, 5

  20. [20]

    Prentice Hall, 1984

    Simon Haykin, editor.Array Signal Processing. Prentice Hall, 1984. 6

  21. [21]

    Modal analysis methods – fre- quency domain

    Jimin He and Zhi-Fang Fu. Modal analysis methods – fre- quency domain. InModal Analysis, chapter 8, pages 159–

  22. [22]

    Butterworth-Heinemann, Oxford, 2001. 4

  23. [23]

    Speech intelligibility pre- diction using a neurogram similarity index measure.Speech Communication, 54(2):306–320, 2012

    Andrew Hines and Naomi Harte. Speech intelligibility pre- diction using a neurogram similarity index measure.Speech Communication, 54(2):306–320, 2012. 11, 12

  24. [24]

    ViSQOLAudio: An objec- tive audio quality metric for low bitrate codecs.The Journal of the Acoustical Society of America, 137(6):EL449–EL455,

    Andrew Hines, Eoin Gillen, Damien Kelly, Jan Skoglund, Anil Kokaram, and Naomi Harte. ViSQOLAudio: An objec- tive audio quality metric for low bitrate codecs.The Journal of the Acoustical Society of America, 137(6):EL449–EL455,

  25. [25]

    DE-R 351 Diffractive Optical Element.https://holoeye.com/product/de-r- 351/, 2023

    HOLOEYE Photonics AG. DE-R 351 Diffractive Optical Element.https://holoeye.com/product/de-r- 351/, 2023. Accessed: 2025-11-11. 6

  26. [26]

    Event-based vi- sual microphone

    Matthew Howard and Keigo Hirakawa. Event-based vi- sual microphone. InICASSP 2023 - 2023 IEEE Interna- tional Conference on Acoustics, Speech and Signal Process- ing (ICASSP), pages 1–5, 2023. 2

  27. [27]

    Kensei Jo, Mohit Gupta, and Shree K. Nayar. Spedo: 6 dof ego-motion sensor using speckle defocus imaging. InPro- ceedings of the IEEE International Conference on Computer Vision (ICCV), 2015. 2

  28. [28]

    Can one hear the shape of a drum?The american mathematical monthly, 1966

    Mark Kac. Can one hear the shape of a drum?The american mathematical monthly, 1966. 1

  29. [29]

    Learning to see inside opaque liquid containers using speckle vibrometry

    Matan Kichler, Shai Bagon, and Mark Sheinin. Learning to see inside opaque liquid containers using speckle vibrometry. InInt. Conf. Comput. Vis., 2025. 2, 6

  30. [30]

    Kingma and Jimmy Ba

    Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization. InICLR, 2015. 6

  31. [31]

    Motion magnification.ACM transactions on graphics (TOG), 24(3):519–526, 2005

    Ce Liu, Antonio Torralba, William T Freeman, Fr ´edo Du- rand, and Edward H Adelson. Motion magnification.ACM transactions on graphics (TOG), 24(3):519–526, 2005. 1, 2

  32. [32]

    Meirovitch.Fundamentals of Vibrations

    L. Meirovitch.Fundamentals of Vibrations. McGraw-Hill,

  33. [33]

    Lamphone: Real-time passive sound recovery from light bulb vibrations.Cryptology ePrint Archive, 2020

    Ben Nassi, Yaron Pirutin, Adi Shamir, Yuval Elovici, and Boris Zadov. Lamphone: Real-time passive sound recovery from light bulb vibrations.Cryptology ePrint Archive, 2020. 1 9

  34. [34]

    Live demonstration: Event-based visual micro- phone

    Ryogo Niwa, Tatsuki Fushimi, Kenta Yamamoto, and Yoichi Ochiai. Live demonstration: Event-based visual micro- phone. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Work- shops, pages 4054–4055, 2023. 2

  35. [35]

    Learning-based video motion magnification

    Tae-Hyun Oh, Ronnachai Jaroensri, Changil Kim, Mohamed Elgharib, Fr’edo Durand, William T Freeman, and Wojciech Matusik. Learning-based video motion magnification. In Proceedings of the European Conference on Computer Vi- sion (ECCV), pages 633–648, 2018. 1, 2

  36. [36]

    Rao.Vibration of Continuous Systems

    Singiresu S. Rao.Vibration of Continuous Systems. John Wiley & Sons, 2007. 3

  37. [37]

    Richards.Fundamentals of Radar Signal Process- ing

    Mark A. Richards.Fundamentals of Radar Signal Process- ing. McGraw-Hill Education, 2nd edition, 2014. 6

  38. [38]

    Smoothing and differentiation of data by simplified least squares procedures

    Abraham Savitzky and Marcel JE Golay. Smoothing and differentiation of data by simplified least squares procedures. Analytical chemistry, 1964. 5

  39. [39]

    Narasimhan

    Mark Sheinin, Dorian Chan, Matthew O’Toole, and Srini- vasa G. Narasimhan. Dual-shutter optical vibration sensing. InIEEE Conf. Comput. Vis. Pattern Recog., 2022. 1, 2

  40. [40]

    Smith, Pratham Desai, Vishal Agarwal, and Mo- hit Gupta

    Brandon M. Smith, Pratham Desai, Vishal Agarwal, and Mo- hit Gupta. Colux: multi-object 3d micro-motion analysis us- ing speckle imaging.ACM Trans. Graph., 36(4), 2017

  41. [41]

    Smith, Matthew O’Toole, and Mohit Gupta

    Brandon M. Smith, Matthew O’Toole, and Mohit Gupta. Tracking multiple objects outside the line of sight using speckle imaging. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018. 2

  42. [42]

    Steinmetz and Joshua D

    Christian J. Steinmetz and Joshua D. Reiss. auraloss: Audio focused loss functions in PyTorch. InDigital Music Research Network One-day Workshop (DMRN+15), 2020. 11, 13

  43. [43]

    Sullivan.Practical Array Processing

    Mark C. Sullivan.Practical Array Processing. McGraw Hill,

  44. [44]

    Woinowsky-Krieger.Theory of Plates and Shells

    Stephen Timoshenko and S. Woinowsky-Krieger.Theory of Plates and Shells. McGraw-Hill, 2nd edition, 1959. 3

  45. [45]

    SciPy 1.0: fundamental algorithms for scientific computing in python.Nature methods, 2020

    Pauli Virtanen, Ralf Gommers, Travis E Oliphant, Matt Haberland, Tyler Reddy, David Cournapeau, Evgeni Burovski, Pearu Peterson, Warren Weckesser, Jonathan Bright, et al. SciPy 1.0: fundamental algorithms for scientific computing in python.Nature methods, 2020. 5

  46. [46]

    Phase-based video motion processing

    Neal Wadhwa, Michael Rubinstein, Fr ´edo Durand, and William T Freeman. Phase-based video motion processing. ACM Transactions on Graphics (TOG), 32(4):1–10, 2013. 1, 2

  47. [47]

    Riesz pyramids for fast phase-based video magnification

    Neal Wadhwa, Michael Rubinstein, Fr ´edo Durand, and William T Freeman. Riesz pyramids for fast phase-based video magnification. InIEEE International Conference on Computational Photography, pages 1–10. IEEE, 2014

  48. [48]

    Eu- lerian video magnification and analysis.Communications of the ACM, 60(1):87–95, 2016

    Neal Wadhwa, Hao-Yu Wu, Abe Davis, Michael Rubin- stein, Eugene Shih, Gautham J Mysore, Justin G Chen, Oral Buyukozturk, John V Guttag, William T Freeman, et al. Eu- lerian video magnification and analysis.Communications of the ACM, 60(1):87–95, 2016. 1, 2

  49. [49]

    Phase-coherent multi-sensor synthesis for enhanced photoa- coustic imaging: a comprehensive framework for optimal sensor integration.Biomed

    Chaoneng Wu, Wei Li, Yizhi Liang, Peiqian He, Changze Song, Xue Bai, Linghao Cheng, Long Jin, and Bai-Ou Guan. Phase-coherent multi-sensor synthesis for enhanced photoa- coustic imaging: a comprehensive framework for optimal sensor integration.Biomed. Opt. Express, 16(5):1909–1924,

  50. [50]

    Eulerian video mag- nification for revealing subtle changes in the world.ACM transactions on graphics (TOG), 31(4):1–8, 2012

    Hao-Yu Wu, Michael Rubinstein, Eugene Shih, John Guttag, Fr´edo Durand, and William Freeman. Eulerian video mag- nification for revealing subtle changes in the world.ACM transactions on graphics (TOG), 31(4):1–8, 2012. 1, 2

  51. [51]

    Fast motion estimation of one-dimensional laser speckle image and its application on real-time audio signal acquisition

    Nan Wu and Shinichiro Haruyama. Fast motion estimation of one-dimensional laser speckle image and its application on real-time audio signal acquisition. In2020 the 6th In- ternational Conference on Communication and Information Processing, pages 128–134, 2020. 2

  52. [52]

    The 20k samples-per- second real time detection of acoustic vibration based on dis- placement estimation of one-dimensional laser speckle im- ages.Sensors, 21(9):2938, 2021

    Nan Wu and Shinichiro Haruyama. The 20k samples-per- second real time detection of acoustic vibration based on dis- placement estimation of one-dimensional laser speckle im- ages.Sensors, 21(9):2938, 2021. 2

  53. [53]

    Paral- lel WaveGAN: A fast waveform generation model based on generative adversarial networks with multi-resolution spec- trogram

    Ryuichi Yamamoto, Eunwoo Song, and Jae-Min Kim. Paral- lel WaveGAN: A fast waveform generation model based on generative adversarial networks with multi-resolution spec- trogram. InIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2020. 11, 13

  54. [54]

    Simultaneous remote extraction of multiple speech sources and heart beats from secondary speckles pattern.Op- tics express, 17(24):21566–21580, 2009

    Zeev Zalevsky, Yevgeny Beiderman, Israel Margalit, Shimshon Gingold, Mina Teicher, Vicente Mico, and Javier Garcia. Simultaneous remote extraction of multiple speech sources and heart beats from secondary speckles pattern.Op- tics express, 17(24):21566–21580, 2009. 1, 2

  55. [55]

    Narasimhan

    Tianyuan Zhang, Mark Sheinin, Dorian Chan, Mark Rau, Matthew O’Toole, and Srinivasa G. Narasimhan. Analyz- ing physical impacts using transient surface wave imaging. InIEEE Conf. Comput. Vis. Pattern Recog., 2023. 2, 4

  56. [56]

    Video acceleration magnification

    Yichao Zhang, Silvia L Pintea, and Jan C Van Gemert. Video acceleration magnification. InProceedings of the IEEE Con- ference on Computer Vision and Pattern Recognition, pages 529–537, 2017. 1

  57. [57]

    PhD thesis, Universit´e d’Ottawa/University of Ot- tawa, 2016

    Meng Zhou.Vibration Extraction Using Rolling Shutter Cameras. PhD thesis, Universit´e d’Ottawa/University of Ot- tawa, 2016. 1

  58. [58]

    Event-based visual vibrometry

    Xinyu Zhou, Peiqi Duan, Yeliduosi Xiaokaiti, Chao Xu, and Boxin Shi. Event-based visual vibrometry. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 24666–24676, 2025. 2 10 A. Spatially varying optical transfer and mode shape estimation In Sec. 3.2 of the main manuscript, we simplified the rela- tionship between the ...