pith. machine review for the scientific record. sign in

arxiv: 2605.09971 · v1 · submitted 2026-05-11 · 💻 cs.HC · cs.AI

Recognition: 2 theorem links

· Lean Theorem

HapticLDM: A Diffusion Model for Text-to-Vibrotactile Generation

Authors on Pith no claims yet

Pith reviewed 2026-05-12 04:08 UTC · model grok-4.3

classification 💻 cs.HC cs.AI
keywords haptic generationtext-to-vibrationdiffusion modelsvibrotactile feedbacksemantic alignmentglobal denoisinghaptic designmetaverse
0
0 comments X

The pith

HapticLDM shows that latent diffusion models can generate accurate vibrotactile feedback directly from text descriptions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper proposes HapticLDM, the first model to apply latent diffusion techniques to the task of converting natural language into vibration patterns for haptic devices. Previous autoregressive methods struggled with maintaining consistency across entire sequences due to their step-by-step processing and limited data. The new approach uses specialized text handling to focus on motion dynamics and a denoising process that maintains stable changes over time. If successful, this enables designers to create fitting haptic effects for scenarios in virtual worlds and entertainment more quickly and with greater variety. User studies and comparisons confirm improvements in how well the vibrations match the described meaning and feel lifelike.

Core claim

HapticLDM is the first generative model for converting text into vibrotactile feedback that uses latent diffusion models. By curating data with emphasis on dynamic characteristics and applying a global denoising mechanism to ensure coherent temporal variations, it overcomes the sequential limitations of autoregressive models in modeling global dependencies. Extensive tests confirm superior realism, semantic alignment, and usability in simplifying haptic design.

What carries the argument

A global denoising mechanism within the latent diffusion model that regulates coherent and stable variations in the temporal envelope of the generated vibrations.

Load-bearing premise

That emphasizing dynamic characteristics in text processing and applying global denoising adequately resolves the global dependency and data constraint problems that limit autoregressive models.

What would settle it

If a follow-up study with larger participant groups or more diverse vibration scenarios shows no significant difference in perceived realism or semantic match compared to the autoregressive baseline, the superiority claim would be falsified.

Figures

Figures reproduced from arXiv: 2605.09971 by Anran Xu, Cai Chen, Fei Wang, Jiahao Xiong, Lijia Pan, Pinzhi Huang, Tao Wen.

Figure 1
Figure 1. Figure 1: Overview of HapticLDM. A text prompt is initially provided as input. A corresponding vibration signal is then generated using HapticLDM. Output [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: The overall training framework of HapticLDM. Top (VAE Training): The VAE (comprising Encoder E and Decoder G) and the Discriminator (D) [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: The workflow of random A/B test for performance comparison between HapticLDM and HapticGen [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: (a) Box plot of the 5-Likert scale scoring results for 4 types data [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗
read the original abstract

Text-to-vibration generation converts natural language into haptic feedback, enabling vibration-effect designers to get scenarios-fitted vibrations more efficiently, which shows great potentials in application fields such as metaverse, games, and film to enrich the user experience in interactive scenarios. The core challenge in this field is how to generate accurate, consistent, and complete vibrations according to textual semantics. Very recent autoregressive (AR) approaches (e.g., HapticGen) exhibit limited capacity in fully capturing global dependencies, owing to the inherent sequential nature of their modeling and prevailing data constraints. In this paper, we proposed HapticLDM, the first text-to-vibration generative model built upon Latent Diffusion Models (LDMs). Firstly, with respect to the data, we introduced a text-processing strategy that emphasizes dynamic characteristics to curate high-quality data pairs for fine-grained dynamic modeling. Secondly, HapticLDM incorporates a global denoising mechanism that regulates coherent and stable variations in the temporal envelope. Furthermore, we conduct extensive evaluations, including A/B testing against the state-of-the-art baseline and a user study involving 30 participants. The results demonstrate that our model enhances realism and semantic alignment. Qualitative feedback further indicates that HapticLDM simplifies the haptic design workflow while generating diverse, subtle, and physically precise vibrations.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper introduces HapticLDM as the first latent diffusion model for text-to-vibrotactile generation. It addresses limitations of prior autoregressive models (e.g., HapticGen) in capturing global dependencies under data constraints by proposing a text-processing strategy that emphasizes dynamic characteristics for curating high-quality data pairs and a global denoising mechanism to enforce coherent temporal envelope variations. The work reports A/B testing against baselines and a 30-participant user study demonstrating gains in realism, semantic alignment, workflow simplification, and generation of diverse, subtle, physically precise vibrations.

Significance. If the empirical claims hold, the work has moderate significance as the first diffusion-based approach in text-to-haptic generation, offering a potential alternative to sequential AR models for applications in metaverse, gaming, and film. The user study provides initial evidence of practical utility, but the absence of ablations or isolated metrics for the proposed mechanisms limits attribution of gains and reduces the strength of the central contribution.

major comments (2)
  1. Abstract: The headline claim that the dynamic-characteristic text curation plus global denoising mechanism successfully mitigate AR global-dependency limits under the stated data constraints is not supported by any ablation results, explicit metric for global temporal coherence (e.g., envelope correlation over full sequences), or controlled comparison isolating these components versus the LDM backbone itself.
  2. Evaluation section (A/B testing and user study): The reported improvements in realism and semantic alignment rest on a 30-participant study and A/B tests, yet no details are provided on data splits, exact quantitative metrics, statistical significance tests, or how the study isolates the contribution of the text-processing and denoising additions.
minor comments (1)
  1. Abstract: The baseline is referred to as 'state-of-the-art' before naming HapticGen; moving the specific name earlier would improve readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on our manuscript. We agree that additional empirical support and evaluation details will strengthen the paper and will incorporate revisions accordingly.

read point-by-point responses
  1. Referee: Abstract: The headline claim that the dynamic-characteristic text curation plus global denoising mechanism successfully mitigate AR global-dependency limits under the stated data constraints is not supported by any ablation results, explicit metric for global temporal coherence (e.g., envelope correlation over full sequences), or controlled comparison isolating these components versus the LDM backbone itself.

    Authors: We acknowledge that the abstract claim would benefit from direct empirical isolation of the proposed components. In the revised manuscript, we will add ablation studies comparing the full HapticLDM against (i) the LDM backbone without dynamic text curation and (ii) without the global denoising mechanism. We will also report an explicit global temporal coherence metric (temporal envelope correlation over full sequences) and controlled comparisons demonstrating how these additions address AR limitations under the given data constraints. revision: yes

  2. Referee: Evaluation section (A/B testing and user study): The reported improvements in realism and semantic alignment rest on a 30-participant study and A/B tests, yet no details are provided on data splits, exact quantitative metrics, statistical significance tests, or how the study isolates the contribution of the text-processing and denoising additions.

    Authors: We will expand the Evaluation section to specify the train/test data splits, the exact quantitative metrics used in A/B testing (including any perceptual or signal-based scores), results of statistical significance tests (e.g., paired t-tests with p-values), and a detailed description of the user study protocol that clarifies how participant tasks and comparisons isolate the contributions of the text-processing strategy and global denoising mechanism. revision: yes

Circularity Check

0 steps flagged

No circularity: model proposal, data curation, and external evaluation form independent chain

full rationale

The paper introduces HapticLDM as a latent diffusion architecture with two stated novelties (dynamic-characteristic text curation and global denoising), trains it on curated pairs, and reports gains via A/B tests and a 30-participant user study against an external baseline (HapticGen). No equations, fitted parameters, or self-citations are shown to reduce the claimed improvements in realism or semantic alignment back to the inputs by construction. The derivation remains self-contained against external benchmarks and does not invoke uniqueness theorems, ansatzes, or renamings that collapse to prior self-referential definitions.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The central claim rests on the assumption that diffusion models can be adapted to vibration signals via latent space and that the custom data curation captures necessary dynamics; no explicit free parameters or invented entities are detailed in the abstract.

pith-pipeline@v0.9.0 · 5547 in / 1216 out tokens · 34778 ms · 2026-05-12T04:08:18.625300+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

  • IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel echoes
    ?
    echoes

    ECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.

    HapticLDM incorporates a global denoising mechanism that regulates coherent and stable variations in the temporal envelope... Unlike autoregressive methods, our approach performs global denoising in latent space, enabling modeling under a broader receptive field.

  • IndisputableMonolith/Foundation/ArithmeticFromLogic.lean embed_strictMono_of_one_lt echoes
    ?
    echoes

    ECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.

    AR models lack explicit mechanisms to capture such global temporal structures... generated signals tend to be repetitive or overly uniform.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

71 extracted references · 71 canonical work pages · 3 internal anchors

  1. [1]

    Weirding haptics: In-situ prototyping of vibrotactile feedback in virtual reality through vocalization,

    D. Degraen, B. Fruchard, F. Smolders, E. Potetsianakis, S. G ¨ung¨or, A. Kr ¨uger, and J. Steimle, “Weirding haptics: In-situ prototyping of vibrotactile feedback in virtual reality through vocalization,” inThe 34th Annual ACM symposium on user interface software and technology, 2021, pp. 936–953

  2. [2]

    Defining haptic experience: foundations for understanding, communicating, and evaluating hx,

    E. Kim and O. Schneider, “Defining haptic experience: foundations for understanding, communicating, and evaluating hx,” inProceedings of the 2020 CHI conference on human factors in computing systems, 2020, pp. 1–13

  3. [3]

    Juicy haptic design: Vibrotactile embel- lishments can improve player experience in games,

    T. Singhal and O. Schneider, “Juicy haptic design: Vibrotactile embel- lishments can improve player experience in games,” inProceedings of the 2021 chi conference on human factors in computing systems, 2021, pp. 1–11

  4. [4]

    Generating real-time, selective, and multimodal haptic effects from sound for gaming experience enhancement,

    G. Yun, M. Mun, J. Lee, D.-G. Kim, H. Z. Tan, and S. Choi, “Generating real-time, selective, and multimodal haptic effects from sound for gaming experience enhancement,” inProceedings of the 2023 CHI Conference on Human Factors in Computing Systems, 2023, pp. 1–17

  5. [5]

    Haptic experience design: What hapticians do and where they need help,

    O. Schneider, K. MacLean, C. Swindells, and K. Booth, “Haptic experience design: What hapticians do and where they need help,” International Journal of Human-Computer Studies, vol. 107, pp. 5–21, 2017

  6. [6]

    How do novice hapticians design? a case study in creating haptic learning environments,

    H. Seifi, M. Chun, C. Gallacher, O. Schneider, and K. E. MacLean, “How do novice hapticians design? a case study in creating haptic learning environments,”IEEE transactions on haptics, vol. 13, no. 4, pp. 791–805, 2020

  7. [7]

    Adaptics: A toolkit for creative design and integration of real-time adaptive mid-air ultrasound tactons,

    K. John, Y . Li, and H. Seifi, “Adaptics: A toolkit for creative design and integration of real-time adaptive mid-air ultrasound tactons,” inPro- ceedings of the 2024 CHI Conference on Human Factors in Computing Systems, 2024, pp. 1–15

  8. [8]

    Feellustrator: A design tool for ultrasound mid-air haptics,

    H. Seifi, S. Chew, A. J. Nasc `e, W. E. Lowther, W. Frier, and K. Hornbæk, “Feellustrator: A design tool for ultrasound mid-air haptics,” inPro- ceedings of the 2023 CHI Conference on Human Factors in Computing Systems, 2023, pp. 1–16

  9. [9]

    The role of prototyping tools for haptic behavior design,

    C. Swindells, E. Maksakov, K. E. MacLean, and V . Chung, “The role of prototyping tools for haptic behavior design,” in2006 14th symposium on haptic interfaces for virtual environment and teleoperator systems. IEEE, 2006, pp. 161–168

  10. [10]

    RichTap: Advanced Haptic Feedback Solution for Mobile Devices,

    AAC Technologies, “RichTap: Advanced Haptic Feedback Solution for Mobile Devices,” https://www.aactechnologies.com/, 2026, accessed: 2026-04

  11. [11]

    Studying design process and example use with macaron, a web-based vibrotactile effect editor,

    O. S. Schneider and K. E. MacLean, “Studying design process and example use with macaron, a web-based vibrotactile effect editor,” in 2016 IEEE Haptics Symposium (Haptics). IEEE, 2016, pp. 52–58

  12. [12]

    Vibviz: Organizing, visual- izing and navigating vibration libraries,

    H. Seifi, K. Zhang, and K. E. MacLean, “Vibviz: Organizing, visual- izing and navigating vibration libraries,” in2015 IEEE World Haptics Conference (WHC). IEEE, 2015, pp. 254–259

  13. [13]

    Multisensory haptic interactions: understanding the sense and designing for it,

    K. E. MacLean, O. S. Schneider, and H. Seifi, “Multisensory haptic interactions: understanding the sense and designing for it,” inThe Handbook of Multimodal-Multisensor Interfaces: Foundations, User Modeling, and Common Modality Combinations-Volume 1, 2017, pp. 97–142

  14. [14]

    The effect of frequency shifting on audio–tactile conversion for enriching musical experience,

    R. Okazaki, H. Kuribayashi, and H. Kajimoto, “The effect of frequency shifting on audio–tactile conversion for enriching musical experience,” inHaptic Interaction: Perception, Devices and Applications. Springer, 2015, pp. 45–51

  15. [15]

    Method for audio-to-tactile cross-modality generation based on residual u-net,

    Y . Zhan, X. Sun, Q. Wang, and W. Nai, “Method for audio-to-tactile cross-modality generation based on residual u-net,”IEEE Transactions on Instrumentation and Measurement, vol. 73, pp. 1–14, 2023

  16. [16]

    Meta Haptics Studio,

    Meta Platforms, Inc., “Meta Haptics Studio,” 2023, accessed: 2026-

  17. [17]

    Available: https://developers.meta.com/horizon/resources/ haptics-studio/

    [Online]. Available: https://developers.meta.com/horizon/resources/ haptics-studio/

  18. [18]

    Taking on new challenges with haptics— a technology that stimulates the sense of touch, one of the five senses,

    Sony Group Corporation, “Taking on new challenges with haptics— a technology that stimulates the sense of touch, one of the five senses,” 2020, accessed: 2026-04. [Online]. Available: https: //www.sony.com/en/SonyInfo/technology/stories/entries/Haptics/

  19. [19]

    Hapticgen: generative text- to-vibration model for streamlining haptic design,

    Y . Sung, K. John, S. H. Yoon, and H. Seifi, “Hapticgen: generative text- to-vibration model for streamlining haptic design,” inProceedings of the 2025 CHI Conference on Human Factors in Computing Systems, 2025, pp. 1–24

  20. [20]

    A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions,

    L. Huang, W. Yu, W. Ma, W. Zhong, Z. Feng, H. Wang, Q. Chen, W. Peng, X. Feng, B. Qinet al., “A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions,” ACM Transactions on Information Systems, vol. 43, no. 2, pp. 1–55, 2025

  21. [21]

    Audiogen: Textually guided audio gen- eration.arXiv preprint arXiv:2209.15352, 2022

    F. Kreuk, G. Synnaeve, A. Polyak, U. Singer, A. D ´efossez, J. Copet, D. Parikh, Y . Taigman, and Y . Adi, “Audiogen: Textually guided audio generation,”arXiv preprint arXiv:2209.15352, 2022

  22. [22]

    Multiplexing stimulus information through rate and temporal codes in primate somatosensory cortex,

    M. A. Harvey, H. P. Saal, J. F. Dammann III, and S. J. Bensmaia, “Multiplexing stimulus information through rate and temporal codes in primate somatosensory cortex,”PLoS biology, vol. 11, no. 5, p. e1001558, 2013

  23. [23]

    Spatial and temporal codes mediate the tactile perception of natural textures,

    A. I. Weber, H. P. Saal, J. D. Lieber, J.-W. Cheng, L. R. Manfredi, J. F. Dammann III, and S. J. Bensmaia, “Spatial and temporal codes mediate the tactile perception of natural textures,”Proceedings of the National Academy of Sciences, vol. 110, no. 42, pp. 17 107–17 112, 2013

  24. [24]

    Evidence for the duplex theory of tactile texture perception,

    M. Hollins and S. R. Risner, “Evidence for the duplex theory of tactile texture perception,”Perception & psychophysics, vol. 62, no. 4, pp. 695– 705, 2000

  25. [25]

    Audioldm: Text-to-audio generation with latent diffusion models,

    H. Liu, Z. Chen, Y . Yuan, X. Mei, X. Liu, D. Mandic, W. Wang, and M. D. Plumbley, “Audioldm: Text-to-audio generation with latent diffusion models,”arXiv preprint arXiv:2301.12503, 2023

  26. [26]

    FastSpeech 2: Fast and high-quality end-to-end text to speech

    Y . Ren, C. Hu, X. Tan, T. Qin, S. Zhao, Z. Zhao, and T.-Y . Liu, “Fastspeech 2: Fast and high-quality end-to-end text to speech,”arXiv preprint arXiv:2006.04558, 2020

  27. [27]

    Align before fuse: Vision and language representation learning with momentum distillation,

    J. Li, R. Selvaraju, A. Gotmare, S. Joty, C. Xiong, and S. C. H. Hoi, “Align before fuse: Vision and language representation learning with momentum distillation,”Advances in neural information processing systems, vol. 34, pp. 9694–9705, 2021

  28. [28]

    Learning transferable visual models from natural language supervision,

    A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clarket al., “Learning transferable visual models from natural language supervision,” inInternational conference on machine learning. PmLR, 2021, pp. 8748–8763

  29. [29]

    Large-scale contrastive language-audio pretraining with feature fusion and keyword-to-caption augmentation,

    Y . Wu, K. Chen, T. Zhang, Y . Hui, T. Berg-Kirkpatrick, and S. Dubnov, “Large-scale contrastive language-audio pretraining with feature fusion and keyword-to-caption augmentation,” inICASSP 2023-2023 IEEE 11 International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2023, pp. 1–5

  30. [30]

    The Llama 3 Herd of Models

    A. Grattafiori, A. Dubey, A. Jauhri, A. Pandey, A. Kadian, A. Al-Dahle, A. Letman, A. Mathur, A. Schelten, A. Vaughanet al., “The llama 3 herd of models,”arXiv preprint arXiv:2407.21783, 2024

  31. [31]

    Qwen2.5: A party of foundation models,

    Q. Team, “Qwen2.5: A party of foundation models,” September 2024. [Online]. Available: https://qwenlm.github.io/blog/qwen2.5/

  32. [32]

    Stable audio open,

    Z. Evans, J. D. Parker, C. Carr, Z. Zukowski, J. Taylor, and J. Pons, “Stable audio open,” inICASSP 2025-2025 IEEE International Confer- ence on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2025, pp. 1–5

  33. [33]

    Meta Quest 3: New Mixed Reality VR Headset,

    Meta, “Meta Quest 3: New Mixed Reality VR Headset,” https://www. meta.com/quest/quest-3/, 2023, accessed: 2026-04

  34. [34]

    Feel effects: enriching storytelling with haptic feedback,

    A. Israr, S. Zhao, K. Schwalje, R. Klatzky, and J. Lehman, “Feel effects: enriching storytelling with haptic feedback,”ACM Transactions on Applied Perception (TAP), vol. 11, no. 3, pp. 1–17, 2014

  35. [35]

    A haptic texture database for tool-mediated texture recognition and classification,

    M. Strese, J.-Y . Lee, C. Schuwerk, Q. Han, H.-G. Kim, and E. Steinbach, “A haptic texture database for tool-mediated texture recognition and classification,” in2014 IEEE International Symposium on Haptic, Audio and Visual Environments and Games (HAVE) Proceedings. IEEE, 2014, pp. 118–123

  36. [36]

    Modeling and rendering realistic textures from unconstrained tool-surface interactions,

    H. Culbertson, J. Unwin, and K. J. Kuchenbecker, “Modeling and rendering realistic textures from unconstrained tool-surface interactions,” IEEE transactions on haptics, vol. 7, no. 3, pp. 381–393, 2014

  37. [37]

    Rechap: an interactive recommender system for navigating a large number of mid- air haptic designs,

    K. Theivendran, A. Wu, W. Frier, and O. Schneider, “Rechap: an interactive recommender system for navigating a large number of mid- air haptic designs,”IEEE Transactions on Haptics, vol. 17, no. 2, pp. 165–176, 2023

  38. [38]

    Hapticcap: A multimodal dataset and task for understanding user experience of vibration haptic signals,

    G. Hu, D. Hershcovich, and H. Seifi, “Hapticcap: A multimodal dataset and task for understanding user experience of vibration haptic signals,” arXiv, 2025

  39. [39]

    Talking about tactile experiences,

    M. Obrist, S. A. Seah, and S. Subramanian, “Talking about tactile experiences,” inProceedings of the SIGCHI conference on human factors in computing systems, 2013, pp. 1659–1668

  40. [40]

    Foundations of transparency in tactile information design,

    K. E. MacLean, “Foundations of transparency in tactile information design,”IEEE Transactions on Haptics, vol. 1, no. 2, pp. 84–95, 2008

  41. [41]

    Wavcaps: A chatgpt-assisted weakly-labelled audio captioning dataset for audio-language multimodal research,

    X. Mei, C. Meng, H. Liu, Q. Kong, T. Ko, C. Zhao, M. D. Plumbley, Y . Zou, and W. Wang, “Wavcaps: A chatgpt-assisted weakly-labelled audio captioning dataset for audio-language multimodal research,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 32, pp. 3339–3354, 2024

  42. [42]

    Dynamical variational autoencoders: A comprehensive review,

    L. Girin, S. Leglaive, X. Bie, J. Diard, T. Hueber, and X. Alameda- Pineda, “Dynamical variational autoencoders: A comprehensive review,” Foundations and Trends in Machine Learning, vol. 15, no. 1-2, pp. 1– 175, 2022

  43. [43]

    Generative adversarial networks in time series: A systematic literature review,

    E. Brophy, Z. Wang, Q. She, and T. Ward, “Generative adversarial networks in time series: A systematic literature review,”ACM Computing Surveys, vol. 55, no. 10, pp. 1–31, 2023

  44. [44]

    Deep generative modelling: A comparative review of vaes, gans, normalizing flows, energy-based and autoregressive models,

    S. Bond-Taylor, A. Leach, Y . Long, and C. G. Willcocks, “Deep generative modelling: A comparative review of vaes, gans, normalizing flows, energy-based and autoregressive models,”IEEE transactions on pattern analysis and machine intelligence, vol. 44, no. 11, pp. 7327– 7347, 2021

  45. [45]

    Jukebox: A Generative Model for Music

    P. Dhariwal, H. Jun, C. Payne, J. W. Kim, A. Radford, and I. Sutskever, “Jukebox: A generative model for music,”arXiv preprint arXiv:2005.00341, 2020

  46. [46]

    Audi- olm: a language modeling approach to audio generation,

    Z. Borsos, R. Marinier, D. Vincent, E. Kharitonov, O. Pietquin, M. Shar- ifi, D. Roblek, O. Teboul, D. Grangier, M. Tagliasacchiet al., “Audi- olm: a language modeling approach to audio generation,”IEEE/ACM transactions on audio, speech, and language processing, vol. 31, pp. 2523–2533, 2023

  47. [47]

    Transformers in time series: A survey.arXiv preprint arXiv:2202.07125, 2022

    Q. Wen, T. Zhou, C. Zhang, W. Chen, Z. Ma, J. Yan, and L. Sun, “Trans- formers in time series: A survey,”arXiv preprint arXiv:2202.07125, 2022

  48. [48]

    Ai-enabled text-to-music generation: A comprehensive review of methods, frameworks, and future directions,

    Y . Zhao, M. Yang, Y . Lin, X. Zhang, F. Shi, Z. Wang, J. Ding, and H. Ning, “Ai-enabled text-to-music generation: A comprehensive review of methods, frameworks, and future directions,”Electronics, vol. 14, no. 6, p. 1197, 2025

  49. [49]

    Simple and controllable music generation,

    J. Copet, F. Kreuk, I. Gat, T. Remez, D. Kant, G. Synnaeve, Y . Adi, and A. D ´efossez, “Simple and controllable music generation,”Advances in neural information processing systems, vol. 36, pp. 47 704–47 720, 2023

  50. [50]

    Bert: Pre-training of deep bidirectional transformers for language understanding,

    J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,” inPro- ceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers), 2019, pp. 4171–4186

  51. [51]

    Non-autoregressive neural machine translation.arXiv preprint arXiv:1711.02281, 2017

    J. Gu, J. Bradbury, C. Xiong, V . O. Li, and R. Socher, “Non-autoregressive neural machine translation,”arXiv preprint arXiv:1711.02281, 2017

  52. [52]

    Soundstorm: Efficient parallel audio generation,

    Z. Borsos, M. Sharifi, D. Vincent, E. Kharitonov, N. Zeghidour, and M. Tagliasacchi, “Soundstorm: Efficient parallel audio generation,” arXiv preprint arXiv:2305.09636, 2023

  53. [53]

    The vibrations of texture,

    S. J. Bensma ¨ıa and M. Hollins, “The vibrations of texture,”Somatosen- sory & motor research, vol. 20, no. 1, pp. 33–43, 2003

  54. [54]

    Natural scenes in tactile texture,

    L. R. Manfredi, H. P. Saal, K. J. Brown, M. C. Zielinski, J. F. Dammann III, V . S. Polashock, and S. J. Bensmaia, “Natural scenes in tactile texture,”Journal of neurophysiology, vol. 111, no. 9, pp. 1792– 1802, 2014

  55. [55]

    The proposition bank: An annotated corpus of semantic roles,

    M. Palmer, D. Gildea, and P. Kingsbury, “The proposition bank: An annotated corpus of semantic roles,”Computational linguistics, vol. 31, no. 1, pp. 71–106, 2005

  56. [56]

    Sentiment analysis and opinion mining,

    B. Liu, “Sentiment analysis and opinion mining,”Synthesis Lectures on Human Language Technologies, vol. 5, no. 1, p. 1, 2012

  57. [57]

    Efficient estimation of word representations in vector space,

    T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient estimation of word representations in vector space,” 2013

  58. [58]

    A circumplex model of affect

    J. A. Russell, “A circumplex model of affect.”Journal of personality and social psychology, vol. 39, no. 6, p. 1161, 1980

  59. [59]

    Haptics in virtual environments: Taxonomy, research status, and challenges,

    M. A. Srinivasan and C. Basdogan, “Haptics in virtual environments: Taxonomy, research status, and challenges,”Computers & Graphics, vol. 21, no. 4, pp. 393–404, 1997

  60. [60]

    Vibrotactile display: Perception, technology, and applications,

    S. Choi and K. J. Kuchenbecker, “Vibrotactile display: Perception, technology, and applications,”Proceedings of the IEEE, vol. 101, no. 9, pp. 2093–2104, 2012

  61. [61]

    Affective haptics: Current research and future directions,

    M. A. Eid and H. Al Osman, “Affective haptics: Current research and future directions,”IEEE Access, vol. 4, pp. 26–40, 2015

  62. [62]

    A first look at individuals’ affective ratings of vibrations,

    H. Seifi and K. E. Maclean, “A first look at individuals’ affective ratings of vibrations,” in2013 World Haptics Conference (WHC). IEEE, 2013, pp. 605–610

  63. [63]

    Sound-to-touch crossmodal pitch matching for short sounds,

    D.-G. Kim, J. Lee, G. Yun, H. Z. Tan, and S. Choi, “Sound-to-touch crossmodal pitch matching for short sounds,”IEEE Transactions on Haptics, vol. 17, no. 1, pp. 2–7, 2023

  64. [64]

    Para- phrasing evades detectors of ai-generated text, but retrieval is an effective defense,

    K. Krishna, Y . Song, M. Karpinska, J. Wieting, and M. Iyyer, “Para- phrasing evades detectors of ai-generated text, but retrieval is an effective defense,”Advances in neural information processing systems, vol. 36, pp. 27 469–27 500, 2023

  65. [65]

    Exploring the limits of transfer learning with a unified text-to-text transformer,

    C. Raffel, N. Shazeer, A. Roberts, K. Lee, S. Narang, M. Matena, Y . Zhou, W. Li, and P. J. Liu, “Exploring the limits of transfer learning with a unified text-to-text transformer,”Journal of machine learning research, vol. 21, no. 140, pp. 1–67, 2020

  66. [66]

    Parallel wavegan: A fast waveform generation model based on generative adversarial networks with multi-resolution spectrogram,

    R. Yamamoto, E. Song, and J.-M. Kim, “Parallel wavegan: A fast waveform generation model based on generative adversarial networks with multi-resolution spectrogram,” inICASSP 2020-2020 IEEE In- ternational Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2020, pp. 6199–6203

  67. [67]

    Hifi-gan: Generative adversarial networks for efficient and high fidelity speech synthesis,

    J. Kong, J. Kim, and J. Bae, “Hifi-gan: Generative adversarial networks for efficient and high fidelity speech synthesis,”Advances in neural information processing systems, vol. 33, pp. 17 022–17 033, 2020

  68. [68]

    Auto-Encoding Variational Bayes

    D. P. Kingma and M. Welling, “Auto-encoding variational bayes,”arXiv preprint arXiv:1312.6114, 2013

  69. [69]

    Progressive Distillation for Fast Sampling of Diffusion Models

    T. Salimans and J. Ho, “Progressive distillation for fast sampling of diffusion models,”arXiv preprint arXiv:2202.00512, 2022

  70. [70]

    Diffusion model alignment using direct preference optimization,

    B. Wallace, M. Dang, R. Rafailov, L. Zhou, A. Lou, S. Purushwalkam, S. Ermon, C. Xiong, S. Joty, and N. Naik, “Diffusion model alignment using direct preference optimization,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 8228–8238

  71. [71]

    Introducing GPT-5.4,

    OpenAI, “Introducing GPT-5.4,” https://openai.com/index/ introducing-gpt-5-4/, 2026, accessed: 2026-04. 12 APPENDIXA SYSTEMPROMPTS FORTACTILE-ORIENTEDFILTERING ANDOBJECTIVEREWRITING The primary system instructions used for tactile-oriented filtering are presented below. System Prompt: Haptic Feedback Filtering You are an expert in Haptic Feedback Design a...