arxiv: 2605.09971 · v1 · submitted 2026-05-11 · 💻 cs.HC · cs.AI

Recognition: 2 theorem links

· Lean Theorem

HapticLDM: A Diffusion Model for Text-to-Vibrotactile Generation

Jiahao Xiong , Fei Wang , Anran Xu , Pinzhi Huang , Tao Wen , Lijia Pan , Cai Chen

Authors on Pith no claims yet

Pith reviewed 2026-05-12 04:08 UTC · model grok-4.3

classification 💻 cs.HC cs.AI

keywords haptic generationtext-to-vibrationdiffusion modelsvibrotactile feedbacksemantic alignmentglobal denoisinghaptic designmetaverse

0 comments

The pith

HapticLDM shows that latent diffusion models can generate accurate vibrotactile feedback directly from text descriptions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper proposes HapticLDM, the first model to apply latent diffusion techniques to the task of converting natural language into vibration patterns for haptic devices. Previous autoregressive methods struggled with maintaining consistency across entire sequences due to their step-by-step processing and limited data. The new approach uses specialized text handling to focus on motion dynamics and a denoising process that maintains stable changes over time. If successful, this enables designers to create fitting haptic effects for scenarios in virtual worlds and entertainment more quickly and with greater variety. User studies and comparisons confirm improvements in how well the vibrations match the described meaning and feel lifelike.

Core claim

HapticLDM is the first generative model for converting text into vibrotactile feedback that uses latent diffusion models. By curating data with emphasis on dynamic characteristics and applying a global denoising mechanism to ensure coherent temporal variations, it overcomes the sequential limitations of autoregressive models in modeling global dependencies. Extensive tests confirm superior realism, semantic alignment, and usability in simplifying haptic design.

What carries the argument

A global denoising mechanism within the latent diffusion model that regulates coherent and stable variations in the temporal envelope of the generated vibrations.

Load-bearing premise

That emphasizing dynamic characteristics in text processing and applying global denoising adequately resolves the global dependency and data constraint problems that limit autoregressive models.

What would settle it

If a follow-up study with larger participant groups or more diverse vibration scenarios shows no significant difference in perceived realism or semantic match compared to the autoregressive baseline, the superiority claim would be falsified.

Figures

Figures reproduced from arXiv: 2605.09971 by Anran Xu, Cai Chen, Fei Wang, Jiahao Xiong, Lijia Pan, Pinzhi Huang, Tao Wen.

**Figure 1.** Figure 1: Overview of HapticLDM. A text prompt is initially provided as input. A corresponding vibration signal is then generated using HapticLDM. Output [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗

**Figure 2.** Figure 2: The overall training framework of HapticLDM. Top (VAE Training): The VAE (comprising Encoder E and Decoder G) and the Discriminator (D) [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗

**Figure 3.** Figure 3: The workflow of random A/B test for performance comparison between HapticLDM and HapticGen [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

**Figure 4.** Figure 4: (a) Box plot of the 5-Likert scale scoring results for 4 types data [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

read the original abstract

Text-to-vibration generation converts natural language into haptic feedback, enabling vibration-effect designers to get scenarios-fitted vibrations more efficiently, which shows great potentials in application fields such as metaverse, games, and film to enrich the user experience in interactive scenarios. The core challenge in this field is how to generate accurate, consistent, and complete vibrations according to textual semantics. Very recent autoregressive (AR) approaches (e.g., HapticGen) exhibit limited capacity in fully capturing global dependencies, owing to the inherent sequential nature of their modeling and prevailing data constraints. In this paper, we proposed HapticLDM, the first text-to-vibration generative model built upon Latent Diffusion Models (LDMs). Firstly, with respect to the data, we introduced a text-processing strategy that emphasizes dynamic characteristics to curate high-quality data pairs for fine-grained dynamic modeling. Secondly, HapticLDM incorporates a global denoising mechanism that regulates coherent and stable variations in the temporal envelope. Furthermore, we conduct extensive evaluations, including A/B testing against the state-of-the-art baseline and a user study involving 30 participants. The results demonstrate that our model enhances realism and semantic alignment. Qualitative feedback further indicates that HapticLDM simplifies the haptic design workflow while generating diverse, subtle, and physically precise vibrations.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

HapticLDM is the first diffusion-based model for text-to-vibration, backed by a 30-person user study, but the claimed fixes for global dependencies lack ablations or isolating metrics.

read the letter

HapticLDM is the first use of latent diffusion models for turning text into vibrotactile signals, and it reports better realism and alignment than the autoregressive HapticGen baseline in both A/B tests and a 30-participant study. The new pieces are a text-processing step that pulls out dynamic characteristics for training data and a global denoising step meant to keep the vibration envelope coherent across time. Those choices directly target the sequential limits of prior AR work under the data constraints the field faces. The evaluation setup is straightforward and the qualitative notes on workflow simplification and output diversity are useful signals for practitioners in games or VR haptics. The paper does a clean job framing the problem and showing that diffusion can be adapted to this modality without obvious contradictions in the reported results. The soft spot is exactly the one the stress-test flagged: the abstract gives no ablation results and no separate metric for long-range temporal structure, so it is not clear whether the gains trace to the diffusion backbone itself or to the two added components. Without that breakdown the causal claim stays plausible but untested. The data curation and denoising ideas are reasonable on paper, but readers will want to see controlled comparisons before treating them as solved. This is for HCI researchers and haptic tool builders who already work with generative models or need quick vibration prototypes. Anyone exploring diffusion outside images or audio will get a concrete example to build on. I would send it for peer review; the core application is new enough and the user study is there, so referees can tighten the validation around the mechanisms.

Referee Report

2 major / 1 minor

Summary. The paper introduces HapticLDM as the first latent diffusion model for text-to-vibrotactile generation. It addresses limitations of prior autoregressive models (e.g., HapticGen) in capturing global dependencies under data constraints by proposing a text-processing strategy that emphasizes dynamic characteristics for curating high-quality data pairs and a global denoising mechanism to enforce coherent temporal envelope variations. The work reports A/B testing against baselines and a 30-participant user study demonstrating gains in realism, semantic alignment, workflow simplification, and generation of diverse, subtle, physically precise vibrations.

Significance. If the empirical claims hold, the work has moderate significance as the first diffusion-based approach in text-to-haptic generation, offering a potential alternative to sequential AR models for applications in metaverse, gaming, and film. The user study provides initial evidence of practical utility, but the absence of ablations or isolated metrics for the proposed mechanisms limits attribution of gains and reduces the strength of the central contribution.

major comments (2)

Abstract: The headline claim that the dynamic-characteristic text curation plus global denoising mechanism successfully mitigate AR global-dependency limits under the stated data constraints is not supported by any ablation results, explicit metric for global temporal coherence (e.g., envelope correlation over full sequences), or controlled comparison isolating these components versus the LDM backbone itself.
Evaluation section (A/B testing and user study): The reported improvements in realism and semantic alignment rest on a 30-participant study and A/B tests, yet no details are provided on data splits, exact quantitative metrics, statistical significance tests, or how the study isolates the contribution of the text-processing and denoising additions.

minor comments (1)

Abstract: The baseline is referred to as 'state-of-the-art' before naming HapticGen; moving the specific name earlier would improve readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on our manuscript. We agree that additional empirical support and evaluation details will strengthen the paper and will incorporate revisions accordingly.

read point-by-point responses

Referee: Abstract: The headline claim that the dynamic-characteristic text curation plus global denoising mechanism successfully mitigate AR global-dependency limits under the stated data constraints is not supported by any ablation results, explicit metric for global temporal coherence (e.g., envelope correlation over full sequences), or controlled comparison isolating these components versus the LDM backbone itself.

Authors: We acknowledge that the abstract claim would benefit from direct empirical isolation of the proposed components. In the revised manuscript, we will add ablation studies comparing the full HapticLDM against (i) the LDM backbone without dynamic text curation and (ii) without the global denoising mechanism. We will also report an explicit global temporal coherence metric (temporal envelope correlation over full sequences) and controlled comparisons demonstrating how these additions address AR limitations under the given data constraints. revision: yes
Referee: Evaluation section (A/B testing and user study): The reported improvements in realism and semantic alignment rest on a 30-participant study and A/B tests, yet no details are provided on data splits, exact quantitative metrics, statistical significance tests, or how the study isolates the contribution of the text-processing and denoising additions.

Authors: We will expand the Evaluation section to specify the train/test data splits, the exact quantitative metrics used in A/B testing (including any perceptual or signal-based scores), results of statistical significance tests (e.g., paired t-tests with p-values), and a detailed description of the user study protocol that clarifies how participant tasks and comparisons isolate the contributions of the text-processing strategy and global denoising mechanism. revision: yes

Circularity Check

0 steps flagged

No circularity: model proposal, data curation, and external evaluation form independent chain

full rationale

The paper introduces HapticLDM as a latent diffusion architecture with two stated novelties (dynamic-characteristic text curation and global denoising), trains it on curated pairs, and reports gains via A/B tests and a 30-participant user study against an external baseline (HapticGen). No equations, fitted parameters, or self-citations are shown to reduce the claimed improvements in realism or semantic alignment back to the inputs by construction. The derivation remains self-contained against external benchmarks and does not invoke uniqueness theorems, ansatzes, or renamings that collapse to prior self-referential definitions.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The central claim rests on the assumption that diffusion models can be adapted to vibration signals via latent space and that the custom data curation captures necessary dynamics; no explicit free parameters or invented entities are detailed in the abstract.

pith-pipeline@v0.9.0 · 5547 in / 1216 out tokens · 34778 ms · 2026-05-12T04:08:18.625300+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel echoes

?

echoes
ECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.

HapticLDM incorporates a global denoising mechanism that regulates coherent and stable variations in the temporal envelope... Unlike autoregressive methods, our approach performs global denoising in latent space, enabling modeling under a broader receptive field.
IndisputableMonolith/Foundation/ArithmeticFromLogic.lean embed_strictMono_of_one_lt echoes

?

echoes
ECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.

AR models lack explicit mechanisms to capture such global temporal structures... generated signals tend to be repetitive or overly uniform.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

71 extracted references · 71 canonical work pages · 3 internal anchors

[1]

Weirding haptics: In-situ prototyping of vibrotactile feedback in virtual reality through vocalization,

D. Degraen, B. Fruchard, F. Smolders, E. Potetsianakis, S. G ¨ung¨or, A. Kr ¨uger, and J. Steimle, “Weirding haptics: In-situ prototyping of vibrotactile feedback in virtual reality through vocalization,” inThe 34th Annual ACM symposium on user interface software and technology, 2021, pp. 936–953

work page 2021
[2]

Defining haptic experience: foundations for understanding, communicating, and evaluating hx,

E. Kim and O. Schneider, “Defining haptic experience: foundations for understanding, communicating, and evaluating hx,” inProceedings of the 2020 CHI conference on human factors in computing systems, 2020, pp. 1–13

work page 2020
[3]

Juicy haptic design: Vibrotactile embel- lishments can improve player experience in games,

T. Singhal and O. Schneider, “Juicy haptic design: Vibrotactile embel- lishments can improve player experience in games,” inProceedings of the 2021 chi conference on human factors in computing systems, 2021, pp. 1–11

work page 2021
[4]

Generating real-time, selective, and multimodal haptic effects from sound for gaming experience enhancement,

G. Yun, M. Mun, J. Lee, D.-G. Kim, H. Z. Tan, and S. Choi, “Generating real-time, selective, and multimodal haptic effects from sound for gaming experience enhancement,” inProceedings of the 2023 CHI Conference on Human Factors in Computing Systems, 2023, pp. 1–17

work page 2023
[5]

Haptic experience design: What hapticians do and where they need help,

O. Schneider, K. MacLean, C. Swindells, and K. Booth, “Haptic experience design: What hapticians do and where they need help,” International Journal of Human-Computer Studies, vol. 107, pp. 5–21, 2017

work page 2017
[6]

How do novice hapticians design? a case study in creating haptic learning environments,

H. Seifi, M. Chun, C. Gallacher, O. Schneider, and K. E. MacLean, “How do novice hapticians design? a case study in creating haptic learning environments,”IEEE transactions on haptics, vol. 13, no. 4, pp. 791–805, 2020

work page 2020
[7]

Adaptics: A toolkit for creative design and integration of real-time adaptive mid-air ultrasound tactons,

K. John, Y . Li, and H. Seifi, “Adaptics: A toolkit for creative design and integration of real-time adaptive mid-air ultrasound tactons,” inPro- ceedings of the 2024 CHI Conference on Human Factors in Computing Systems, 2024, pp. 1–15

work page 2024
[8]

Feellustrator: A design tool for ultrasound mid-air haptics,

H. Seifi, S. Chew, A. J. Nasc `e, W. E. Lowther, W. Frier, and K. Hornbæk, “Feellustrator: A design tool for ultrasound mid-air haptics,” inPro- ceedings of the 2023 CHI Conference on Human Factors in Computing Systems, 2023, pp. 1–16

work page 2023
[9]

The role of prototyping tools for haptic behavior design,

C. Swindells, E. Maksakov, K. E. MacLean, and V . Chung, “The role of prototyping tools for haptic behavior design,” in2006 14th symposium on haptic interfaces for virtual environment and teleoperator systems. IEEE, 2006, pp. 161–168

work page 2006
[10]

RichTap: Advanced Haptic Feedback Solution for Mobile Devices,

AAC Technologies, “RichTap: Advanced Haptic Feedback Solution for Mobile Devices,” https://www.aactechnologies.com/, 2026, accessed: 2026-04

work page 2026
[11]

Studying design process and example use with macaron, a web-based vibrotactile effect editor,

O. S. Schneider and K. E. MacLean, “Studying design process and example use with macaron, a web-based vibrotactile effect editor,” in 2016 IEEE Haptics Symposium (Haptics). IEEE, 2016, pp. 52–58

work page 2016
[12]

Vibviz: Organizing, visual- izing and navigating vibration libraries,

H. Seifi, K. Zhang, and K. E. MacLean, “Vibviz: Organizing, visual- izing and navigating vibration libraries,” in2015 IEEE World Haptics Conference (WHC). IEEE, 2015, pp. 254–259

work page 2015
[13]

Multisensory haptic interactions: understanding the sense and designing for it,

K. E. MacLean, O. S. Schneider, and H. Seifi, “Multisensory haptic interactions: understanding the sense and designing for it,” inThe Handbook of Multimodal-Multisensor Interfaces: Foundations, User Modeling, and Common Modality Combinations-Volume 1, 2017, pp. 97–142

work page 2017
[14]

The effect of frequency shifting on audio–tactile conversion for enriching musical experience,

R. Okazaki, H. Kuribayashi, and H. Kajimoto, “The effect of frequency shifting on audio–tactile conversion for enriching musical experience,” inHaptic Interaction: Perception, Devices and Applications. Springer, 2015, pp. 45–51

work page 2015
[15]

Method for audio-to-tactile cross-modality generation based on residual u-net,

Y . Zhan, X. Sun, Q. Wang, and W. Nai, “Method for audio-to-tactile cross-modality generation based on residual u-net,”IEEE Transactions on Instrumentation and Measurement, vol. 73, pp. 1–14, 2023

work page 2023
[16]

Meta Haptics Studio,

Meta Platforms, Inc., “Meta Haptics Studio,” 2023, accessed: 2026-

work page 2023
[17]

Available: https://developers.meta.com/horizon/resources/ haptics-studio/

[Online]. Available: https://developers.meta.com/horizon/resources/ haptics-studio/

work page
[18]

Taking on new challenges with haptics— a technology that stimulates the sense of touch, one of the five senses,

Sony Group Corporation, “Taking on new challenges with haptics— a technology that stimulates the sense of touch, one of the five senses,” 2020, accessed: 2026-04. [Online]. Available: https: //www.sony.com/en/SonyInfo/technology/stories/entries/Haptics/

work page 2020
[19]

Hapticgen: generative text- to-vibration model for streamlining haptic design,

Y . Sung, K. John, S. H. Yoon, and H. Seifi, “Hapticgen: generative text- to-vibration model for streamlining haptic design,” inProceedings of the 2025 CHI Conference on Human Factors in Computing Systems, 2025, pp. 1–24

work page 2025
[20]

A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions,

L. Huang, W. Yu, W. Ma, W. Zhong, Z. Feng, H. Wang, Q. Chen, W. Peng, X. Feng, B. Qinet al., “A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions,” ACM Transactions on Information Systems, vol. 43, no. 2, pp. 1–55, 2025

work page 2025
[21]

Audiogen: Textually guided audio gen- eration.arXiv preprint arXiv:2209.15352, 2022

F. Kreuk, G. Synnaeve, A. Polyak, U. Singer, A. D ´efossez, J. Copet, D. Parikh, Y . Taigman, and Y . Adi, “Audiogen: Textually guided audio generation,”arXiv preprint arXiv:2209.15352, 2022

work page arXiv 2022
[22]

Multiplexing stimulus information through rate and temporal codes in primate somatosensory cortex,

M. A. Harvey, H. P. Saal, J. F. Dammann III, and S. J. Bensmaia, “Multiplexing stimulus information through rate and temporal codes in primate somatosensory cortex,”PLoS biology, vol. 11, no. 5, p. e1001558, 2013

work page 2013
[23]

Spatial and temporal codes mediate the tactile perception of natural textures,

A. I. Weber, H. P. Saal, J. D. Lieber, J.-W. Cheng, L. R. Manfredi, J. F. Dammann III, and S. J. Bensmaia, “Spatial and temporal codes mediate the tactile perception of natural textures,”Proceedings of the National Academy of Sciences, vol. 110, no. 42, pp. 17 107–17 112, 2013

work page 2013
[24]

Evidence for the duplex theory of tactile texture perception,

M. Hollins and S. R. Risner, “Evidence for the duplex theory of tactile texture perception,”Perception & psychophysics, vol. 62, no. 4, pp. 695– 705, 2000

work page 2000
[25]

Audioldm: Text-to-audio generation with latent diffusion models,

H. Liu, Z. Chen, Y . Yuan, X. Mei, X. Liu, D. Mandic, W. Wang, and M. D. Plumbley, “Audioldm: Text-to-audio generation with latent diffusion models,”arXiv preprint arXiv:2301.12503, 2023

work page arXiv 2023
[26]

FastSpeech 2: Fast and high-quality end-to-end text to speech

Y . Ren, C. Hu, X. Tan, T. Qin, S. Zhao, Z. Zhao, and T.-Y . Liu, “Fastspeech 2: Fast and high-quality end-to-end text to speech,”arXiv preprint arXiv:2006.04558, 2020

work page arXiv 2006
[27]

Align before fuse: Vision and language representation learning with momentum distillation,

J. Li, R. Selvaraju, A. Gotmare, S. Joty, C. Xiong, and S. C. H. Hoi, “Align before fuse: Vision and language representation learning with momentum distillation,”Advances in neural information processing systems, vol. 34, pp. 9694–9705, 2021

work page 2021
[28]

Learning transferable visual models from natural language supervision,

A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clarket al., “Learning transferable visual models from natural language supervision,” inInternational conference on machine learning. PmLR, 2021, pp. 8748–8763

work page 2021
[29]

Large-scale contrastive language-audio pretraining with feature fusion and keyword-to-caption augmentation,

Y . Wu, K. Chen, T. Zhang, Y . Hui, T. Berg-Kirkpatrick, and S. Dubnov, “Large-scale contrastive language-audio pretraining with feature fusion and keyword-to-caption augmentation,” inICASSP 2023-2023 IEEE 11 International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2023, pp. 1–5

work page 2023
[30]

The Llama 3 Herd of Models

A. Grattafiori, A. Dubey, A. Jauhri, A. Pandey, A. Kadian, A. Al-Dahle, A. Letman, A. Mathur, A. Schelten, A. Vaughanet al., “The llama 3 herd of models,”arXiv preprint arXiv:2407.21783, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[31]

Qwen2.5: A party of foundation models,

Q. Team, “Qwen2.5: A party of foundation models,” September 2024. [Online]. Available: https://qwenlm.github.io/blog/qwen2.5/

work page 2024
[32]

Stable audio open,

Z. Evans, J. D. Parker, C. Carr, Z. Zukowski, J. Taylor, and J. Pons, “Stable audio open,” inICASSP 2025-2025 IEEE International Confer- ence on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2025, pp. 1–5

work page 2025
[33]

Meta Quest 3: New Mixed Reality VR Headset,

Meta, “Meta Quest 3: New Mixed Reality VR Headset,” https://www. meta.com/quest/quest-3/, 2023, accessed: 2026-04

work page 2023
[34]

Feel effects: enriching storytelling with haptic feedback,

A. Israr, S. Zhao, K. Schwalje, R. Klatzky, and J. Lehman, “Feel effects: enriching storytelling with haptic feedback,”ACM Transactions on Applied Perception (TAP), vol. 11, no. 3, pp. 1–17, 2014

work page 2014
[35]

A haptic texture database for tool-mediated texture recognition and classification,

M. Strese, J.-Y . Lee, C. Schuwerk, Q. Han, H.-G. Kim, and E. Steinbach, “A haptic texture database for tool-mediated texture recognition and classification,” in2014 IEEE International Symposium on Haptic, Audio and Visual Environments and Games (HAVE) Proceedings. IEEE, 2014, pp. 118–123

work page 2014
[36]

Modeling and rendering realistic textures from unconstrained tool-surface interactions,

H. Culbertson, J. Unwin, and K. J. Kuchenbecker, “Modeling and rendering realistic textures from unconstrained tool-surface interactions,” IEEE transactions on haptics, vol. 7, no. 3, pp. 381–393, 2014

work page 2014
[37]

Rechap: an interactive recommender system for navigating a large number of mid- air haptic designs,

K. Theivendran, A. Wu, W. Frier, and O. Schneider, “Rechap: an interactive recommender system for navigating a large number of mid- air haptic designs,”IEEE Transactions on Haptics, vol. 17, no. 2, pp. 165–176, 2023

work page 2023
[38]

Hapticcap: A multimodal dataset and task for understanding user experience of vibration haptic signals,

G. Hu, D. Hershcovich, and H. Seifi, “Hapticcap: A multimodal dataset and task for understanding user experience of vibration haptic signals,” arXiv, 2025

work page 2025
[39]

Talking about tactile experiences,

M. Obrist, S. A. Seah, and S. Subramanian, “Talking about tactile experiences,” inProceedings of the SIGCHI conference on human factors in computing systems, 2013, pp. 1659–1668

work page 2013
[40]

Foundations of transparency in tactile information design,

K. E. MacLean, “Foundations of transparency in tactile information design,”IEEE Transactions on Haptics, vol. 1, no. 2, pp. 84–95, 2008

work page 2008
[41]

Wavcaps: A chatgpt-assisted weakly-labelled audio captioning dataset for audio-language multimodal research,

X. Mei, C. Meng, H. Liu, Q. Kong, T. Ko, C. Zhao, M. D. Plumbley, Y . Zou, and W. Wang, “Wavcaps: A chatgpt-assisted weakly-labelled audio captioning dataset for audio-language multimodal research,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 32, pp. 3339–3354, 2024

work page 2024
[42]

Dynamical variational autoencoders: A comprehensive review,

L. Girin, S. Leglaive, X. Bie, J. Diard, T. Hueber, and X. Alameda- Pineda, “Dynamical variational autoencoders: A comprehensive review,” Foundations and Trends in Machine Learning, vol. 15, no. 1-2, pp. 1– 175, 2022

work page 2022
[43]

Generative adversarial networks in time series: A systematic literature review,

E. Brophy, Z. Wang, Q. She, and T. Ward, “Generative adversarial networks in time series: A systematic literature review,”ACM Computing Surveys, vol. 55, no. 10, pp. 1–31, 2023

work page 2023
[44]

Deep generative modelling: A comparative review of vaes, gans, normalizing flows, energy-based and autoregressive models,

S. Bond-Taylor, A. Leach, Y . Long, and C. G. Willcocks, “Deep generative modelling: A comparative review of vaes, gans, normalizing flows, energy-based and autoregressive models,”IEEE transactions on pattern analysis and machine intelligence, vol. 44, no. 11, pp. 7327– 7347, 2021

work page 2021
[45]

Jukebox: A Generative Model for Music

P. Dhariwal, H. Jun, C. Payne, J. W. Kim, A. Radford, and I. Sutskever, “Jukebox: A generative model for music,”arXiv preprint arXiv:2005.00341, 2020

work page Pith review arXiv 2005
[46]

Audi- olm: a language modeling approach to audio generation,

Z. Borsos, R. Marinier, D. Vincent, E. Kharitonov, O. Pietquin, M. Shar- ifi, D. Roblek, O. Teboul, D. Grangier, M. Tagliasacchiet al., “Audi- olm: a language modeling approach to audio generation,”IEEE/ACM transactions on audio, speech, and language processing, vol. 31, pp. 2523–2533, 2023

work page 2023
[47]

Transformers in time series: A survey.arXiv preprint arXiv:2202.07125, 2022

Q. Wen, T. Zhou, C. Zhang, W. Chen, Z. Ma, J. Yan, and L. Sun, “Trans- formers in time series: A survey,”arXiv preprint arXiv:2202.07125, 2022

work page arXiv 2022
[48]

Ai-enabled text-to-music generation: A comprehensive review of methods, frameworks, and future directions,

Y . Zhao, M. Yang, Y . Lin, X. Zhang, F. Shi, Z. Wang, J. Ding, and H. Ning, “Ai-enabled text-to-music generation: A comprehensive review of methods, frameworks, and future directions,”Electronics, vol. 14, no. 6, p. 1197, 2025

work page 2025
[49]

Simple and controllable music generation,

J. Copet, F. Kreuk, I. Gat, T. Remez, D. Kant, G. Synnaeve, Y . Adi, and A. D ´efossez, “Simple and controllable music generation,”Advances in neural information processing systems, vol. 36, pp. 47 704–47 720, 2023

work page 2023
[50]

Bert: Pre-training of deep bidirectional transformers for language understanding,

J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,” inPro- ceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers), 2019, pp. 4171–4186

work page 2019
[51]

Non-autoregressive neural machine translation.arXiv preprint arXiv:1711.02281, 2017

J. Gu, J. Bradbury, C. Xiong, V . O. Li, and R. Socher, “Non-autoregressive neural machine translation,”arXiv preprint arXiv:1711.02281, 2017

work page arXiv 2017
[52]

Soundstorm: Efficient parallel audio generation,

Z. Borsos, M. Sharifi, D. Vincent, E. Kharitonov, N. Zeghidour, and M. Tagliasacchi, “Soundstorm: Efficient parallel audio generation,” arXiv preprint arXiv:2305.09636, 2023

work page arXiv 2023
[53]

The vibrations of texture,

S. J. Bensma ¨ıa and M. Hollins, “The vibrations of texture,”Somatosen- sory & motor research, vol. 20, no. 1, pp. 33–43, 2003

work page 2003
[54]

Natural scenes in tactile texture,

L. R. Manfredi, H. P. Saal, K. J. Brown, M. C. Zielinski, J. F. Dammann III, V . S. Polashock, and S. J. Bensmaia, “Natural scenes in tactile texture,”Journal of neurophysiology, vol. 111, no. 9, pp. 1792– 1802, 2014

work page 2014
[55]

The proposition bank: An annotated corpus of semantic roles,

M. Palmer, D. Gildea, and P. Kingsbury, “The proposition bank: An annotated corpus of semantic roles,”Computational linguistics, vol. 31, no. 1, pp. 71–106, 2005

work page 2005
[56]

Sentiment analysis and opinion mining,

B. Liu, “Sentiment analysis and opinion mining,”Synthesis Lectures on Human Language Technologies, vol. 5, no. 1, p. 1, 2012

work page 2012
[57]

Efficient estimation of word representations in vector space,

T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient estimation of word representations in vector space,” 2013

work page 2013
[58]

A circumplex model of affect

J. A. Russell, “A circumplex model of affect.”Journal of personality and social psychology, vol. 39, no. 6, p. 1161, 1980

work page 1980
[59]

Haptics in virtual environments: Taxonomy, research status, and challenges,

M. A. Srinivasan and C. Basdogan, “Haptics in virtual environments: Taxonomy, research status, and challenges,”Computers & Graphics, vol. 21, no. 4, pp. 393–404, 1997

work page 1997
[60]

Vibrotactile display: Perception, technology, and applications,

S. Choi and K. J. Kuchenbecker, “Vibrotactile display: Perception, technology, and applications,”Proceedings of the IEEE, vol. 101, no. 9, pp. 2093–2104, 2012

work page 2093
[61]

Affective haptics: Current research and future directions,

M. A. Eid and H. Al Osman, “Affective haptics: Current research and future directions,”IEEE Access, vol. 4, pp. 26–40, 2015

work page 2015
[62]

A first look at individuals’ affective ratings of vibrations,

H. Seifi and K. E. Maclean, “A first look at individuals’ affective ratings of vibrations,” in2013 World Haptics Conference (WHC). IEEE, 2013, pp. 605–610

work page 2013
[63]

Sound-to-touch crossmodal pitch matching for short sounds,

D.-G. Kim, J. Lee, G. Yun, H. Z. Tan, and S. Choi, “Sound-to-touch crossmodal pitch matching for short sounds,”IEEE Transactions on Haptics, vol. 17, no. 1, pp. 2–7, 2023

work page 2023
[64]

Para- phrasing evades detectors of ai-generated text, but retrieval is an effective defense,

K. Krishna, Y . Song, M. Karpinska, J. Wieting, and M. Iyyer, “Para- phrasing evades detectors of ai-generated text, but retrieval is an effective defense,”Advances in neural information processing systems, vol. 36, pp. 27 469–27 500, 2023

work page 2023
[65]

Exploring the limits of transfer learning with a unified text-to-text transformer,

C. Raffel, N. Shazeer, A. Roberts, K. Lee, S. Narang, M. Matena, Y . Zhou, W. Li, and P. J. Liu, “Exploring the limits of transfer learning with a unified text-to-text transformer,”Journal of machine learning research, vol. 21, no. 140, pp. 1–67, 2020

work page 2020
[66]

Parallel wavegan: A fast waveform generation model based on generative adversarial networks with multi-resolution spectrogram,

R. Yamamoto, E. Song, and J.-M. Kim, “Parallel wavegan: A fast waveform generation model based on generative adversarial networks with multi-resolution spectrogram,” inICASSP 2020-2020 IEEE In- ternational Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2020, pp. 6199–6203

work page 2020
[67]

Hifi-gan: Generative adversarial networks for efficient and high fidelity speech synthesis,

J. Kong, J. Kim, and J. Bae, “Hifi-gan: Generative adversarial networks for efficient and high fidelity speech synthesis,”Advances in neural information processing systems, vol. 33, pp. 17 022–17 033, 2020

work page 2020
[68]

Auto-Encoding Variational Bayes

D. P. Kingma and M. Welling, “Auto-encoding variational bayes,”arXiv preprint arXiv:1312.6114, 2013

work page internal anchor Pith review Pith/arXiv arXiv 2013
[69]

Progressive Distillation for Fast Sampling of Diffusion Models

T. Salimans and J. Ho, “Progressive distillation for fast sampling of diffusion models,”arXiv preprint arXiv:2202.00512, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022
[70]

Diffusion model alignment using direct preference optimization,

B. Wallace, M. Dang, R. Rafailov, L. Zhou, A. Lou, S. Purushwalkam, S. Ermon, C. Xiong, S. Joty, and N. Naik, “Diffusion model alignment using direct preference optimization,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 8228–8238

work page 2024
[71]

Introducing GPT-5.4,

OpenAI, “Introducing GPT-5.4,” https://openai.com/index/ introducing-gpt-5-4/, 2026, accessed: 2026-04. 12 APPENDIXA SYSTEMPROMPTS FORTACTILE-ORIENTEDFILTERING ANDOBJECTIVEREWRITING The primary system instructions used for tactile-oriented filtering are presented below. System Prompt: Haptic Feedback Filtering You are an expert in Haptic Feedback Design a...

work page 2026