MARRS: Masked Autoregressive Unit-based Reaction Synthesis
Pith reviewed 2026-05-22 14:44 UTC · model grok-4.3
The pith
MARRS generates coordinated human reactions by masking tokens and modulating between body and hand units in continuous space.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
MARRS generates coordinated and fine-grained reaction motions using continuous representations. It starts with a Unit-distinguished Motion Variational AutoEncoder that segments and encodes body and hand units independently. Action-Conditioned Fusion randomly masks a subset of reactive tokens and pulls specific body and hand information from the active ones. Mutual Unit Modulation then lets information from one unit adaptively modulate the other. For the diffusion stage a compact MLP serves as noise predictor for each unit and the diffusion loss models the probability distribution of each token.
What carries the argument
Mutual Unit Modulation (MUM) together with Action-Conditioned Fusion (ACF) operating on independently encoded body and hand units inside a continuous diffusion model.
If this is right
- Produces reaction motions without quantization information loss.
- Captures inter-person coordination and fine-grained hand details through unit interaction.
- Achieves superior quantitative and qualitative results over prior VQ-based autoregressive methods.
- Keeps computational cost manageable by limiting the number of units and using compact predictors per unit.
Where Pith is reading between the lines
- The same unit masking and cross-modulation pattern could be applied to generate full multi-person scenes rather than pairwise reactions.
- Continuous representations may permit direct editing or interpolation of reaction motions without decoding to discrete codes first.
- Similar masking-plus-modulation blocks might improve single-person motion forecasting by letting different body parts condition one another.
- The framework could be tested on longer sequences to check whether coordination remains stable over time.
- keywords=[
Load-bearing premise
Segmenting the body into independent body and hand units, then applying random masking and mutual modulation, will capture inter-person coordination and fine-grained details without introducing coordination artifacts or requiring prohibitive compute.
What would settle it
A test set of complex two-person interactions where the generated hand positions fail to match the body posture or timing required by the conditioning action sequence.
Figures
read the original abstract
This work aims at a challenging task: human action-reaction synthesis, i.e., generating human reactions conditioned on the action sequence of another person. Currently, autoregressive modeling approaches with vector quantization (VQ) have achieved remarkable performance in motion generation tasks. However, VQ has inherent disadvantages, including quantization information loss, low codebook utilization, etc. In addition, while dividing the body into separate units can be beneficial, the computational complexity needs to be considered. Also, the importance of mutual perception among units is often neglected. In this work, we propose MARRS, a novel framework designed to generate coordinated and fine-grained reaction motions using continuous representations. Initially, we present the Unit-distinguished Motion Variational AutoEncoder (UD-VAE), which segments the entire body into distinct body and hand units, encoding each independently. Subsequently, we propose Action-Conditioned Fusion (ACF), which involves randomly masking a subset of reactive tokens and extracting specific information about the body and hands from the active tokens. Furthermore, we introduce Mutual Unit Modulation (MUM) to facilitate interaction between body and hand units by using the information from one unit to adaptively modulate the other. Finally, for the diffusion model, we employ a compact MLP as a noise predictor for each distinct body unit and incorporate the diffusion loss to model the probability distribution of each token. Both quantitative and qualitative results demonstrate that our method achieves superior performance. Project page: https://aigc-explorer.github.io/MARRS/.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes MARRS, a framework for human action-reaction synthesis that encodes body and hand units independently via a Unit-distinguished Motion Variational AutoEncoder (UD-VAE), applies Action-Conditioned Fusion (ACF) with random masking of reactive tokens, uses Mutual Unit Modulation (MUM) for adaptive cross-unit interaction, and employs separate compact MLP diffusion predictors with diffusion loss for each unit. The central claim is that this continuous-representation approach yields superior quantitative and qualitative performance in generating coordinated, fine-grained reactions compared to prior VQ-based autoregressive methods.
Significance. If the empirical claims are substantiated, the work provides a practical alternative to vector-quantization losses in motion synthesis by retaining continuous latents while managing computational cost through unit segmentation and post-encoding modulation. The combination of random masking and mutual modulation offers a lightweight mechanism for inter-person and body-hand coordination that could transfer to related tasks such as two-person interaction generation or fine-motor control in animation.
major comments (2)
- [Abstract] Abstract: the claim that 'both quantitative and qualitative results demonstrate that our method achieves superior performance' is presented without any reported metrics, baselines, error bars, or ablation tables, so the central empirical claim cannot be evaluated from the summary alone and must be verified against the experimental section.
- [Section 3.3] Section 3.3 (MUM description): the adaptive modulation of one unit's features by the other is described as sufficient to recover inter-unit dependencies, yet no explicit joint constraint, synchronization loss, or diagnostic metric (e.g., cross-unit velocity correlation or instantaneous pose-velocity consistency) is introduced; if body-hand coupling is non-factorizable, this post-hoc modulation may only approximate rather than enforce coordination, risking artifacts that FID or MPJPE could under-detect.
minor comments (2)
- [Section 3.2] The masking ratio is listed among free parameters but no sensitivity analysis or default value is stated; a brief ablation or recommended range would clarify reproducibility.
- [Section 3.1] Notation for the continuous latent variables of body versus hand units should be introduced once and used consistently to avoid ambiguity when describing the modulation step.
Simulated Author's Rebuttal
We thank the referee for the constructive comments and positive assessment of the potential impact of MARRS. We address each major comment point by point below, indicating whether revisions have been made to the manuscript.
read point-by-point responses
-
Referee: [Abstract] Abstract: the claim that 'both quantitative and qualitative results demonstrate that our method achieves superior performance' is presented without any reported metrics, baselines, error bars, or ablation tables, so the central empirical claim cannot be evaluated from the summary alone and must be verified against the experimental section.
Authors: We agree that the abstract serves as a high-level summary and does not contain specific numerical results. The detailed quantitative evaluation, including metrics such as FID and MPJPE, comparisons against baselines, error bars, and ablation studies, is fully reported in Section 4 with supporting tables and figures. To address the concern, we have revised the abstract to include a concise reference to the observed performance gains on these metrics. revision: yes
-
Referee: [Section 3.3] Section 3.3 (MUM description): the adaptive modulation of one unit's features by the other is described as sufficient to recover inter-unit dependencies, yet no explicit joint constraint, synchronization loss, or diagnostic metric (e.g., cross-unit velocity correlation or instantaneous pose-velocity consistency) is introduced; if body-hand coupling is non-factorizable, this post-hoc modulation may only approximate rather than enforce coordination, risking artifacts that FID or MPJPE could under-detect.
Authors: We appreciate the referee's careful analysis of the MUM module. MUM provides adaptive cross-unit modulation within the continuous latent space to facilitate interaction between body and hand units in a computationally efficient manner, complementing the random masking in ACF. No additional synchronization loss was introduced to avoid increasing model complexity, but the joint diffusion training with compact per-unit predictors encourages coordinated outputs, as evidenced by our quantitative results and qualitative motion visualizations. We have revised the description in Section 3.3 for greater clarity on this mechanism and added a diagnostic analysis of cross-unit velocity correlations in the experiments to better validate coordination. revision: partial
Circularity Check
No circularity: architectural components are constructive extensions without reduction to inputs or self-citations
full rationale
The paper defines UD-VAE for independent body/hand encoding, ACF for random masking of reactive tokens, MUM for adaptive cross-unit modulation, and per-unit MLP diffusion predictors as sequential novel modules. These are presented as design choices to address VQ limitations and neglected mutual perception, with performance asserted via quantitative/qualitative results rather than any equation that reduces a claimed prediction to a fitted parameter or prior self-result by construction. No load-bearing uniqueness theorems, ansatzes smuggled via citation, or self-definitional loops appear in the derivation chain; the framework remains self-contained against external motion benchmarks.
Axiom & Free-Parameter Ledger
free parameters (1)
- masking ratio
axioms (1)
- domain assumption Independent encoding of body and hand units preserves all necessary inter-unit dependencies for reaction synthesis.
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/DimensionForcing.leanreality_from_one_distinction (8-tick period forcing D=3) echoes?
echoesECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.
ACF and MUM form the basic model blocks, of which there are N (8 in MARRS-Base).
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel (J-cost coupling) echoes?
echoesECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.
we propose Mutual Unit Modulation (MUM) to facilitate interaction between body and hand units by using the information from one unit to adaptively modulate the other
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Parent,Computer animation: algorithms and techniques
R. Parent,Computer animation: algorithms and techniques. Newnes, 2012
work page 2012
-
[2]
Unified cross-structural motion retargeting for humanoid char- acters,
H. Zhang, Z. Chen, H. Xu, L. Hao, X. Wu, S. Xu, R. Xiong, and Y . Wang, “Unified cross-structural motion retargeting for humanoid char- acters,”IEEE Transactions on Visualization and Computer Graphics, vol. 31, no. 7, pp. 3863–3876, 2025
work page 2025
-
[3]
N. Magnenat-Thalmann, D. Thalmann, N. Magnenat-Thalmann, and D. Thalmann,Computer animation. Springer, 1985
work page 1985
-
[4]
Introduction to game development,
J. Urbain, “Introduction to game development,”Cell, vol. 414, pp. 745– 5102, 2010
work page 2010
-
[5]
Most: Motion diffusion model for rare text via temporal clip banzhaf interaction,
Y . Wang, M. Li, Z. Leng, F. W. B. Li, and X. Liang, “Most: Motion diffusion model for rare text via temporal clip banzhaf interaction,”IEEE Transactions on Visualization and Computer Graphics, vol. 31, no. 10, pp. 8994–9007, 2025
work page 2025
-
[6]
Bethke,Game development and production
E. Bethke,Game development and production. Wordware Publishing, Inc., 2003
work page 2003
-
[7]
G. Saridis, “Intelligent robotic control,”IEEE Transactions on Automatic Control, vol. 28, no. 5, pp. 547–557, 1983
work page 1983
-
[8]
Towards domain generalization for multi-view 3d object detection in bird-eye-view,
S. Wang, X. Zhao, H.-M. Xu, Z. Chen, D. Yu, J. Chang, Z. Yang, and F. Zhao, “Towards domain generalization for multi-view 3d object detection in bird-eye-view,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 13 333–13 342
work page 2023
-
[9]
Learning from noisy data for semi-supervised 3d object detection,
Z. Chen, Z. Li, S. Wang, D. Fu, and F. Zhao, “Learning from noisy data for semi-supervised 3d object detection,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 6929–6939
work page 2023
-
[10]
Stream query denoising for vectorized hd-map construction,
S. Wang, F. Jia, W. Mao, Y . Liu, Y . Zhao, Z. Chen, T. Wang, C. Zhang, X. Zhang, and F. Zhao, “Stream query denoising for vectorized hd-map construction,” inEuropean Conference on Computer Vision. Springer, 2024, pp. 203–220
work page 2024
-
[11]
Mmm: Generative masked motion model,
E. Pinyoanuntapong, P. Wang, M. Lee, and C. Chen, “Mmm: Generative masked motion model,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 1546–1555
work page 2024
-
[12]
Momask: Generative masked modeling of 3d human motions,
C. Guo, Y . Mu, M. G. Javed, S. Wang, and L. Cheng, “Momask: Generative masked modeling of 3d human motions,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 1900–1910
work page 2024
-
[13]
C. Zheng and A. Vedaldi, “Online clustered codebook,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 22 798–22 807
work page 2023
-
[14]
Edvae: Mitigating codebook collapse with evidential discrete variational autoencoders,
G. Baykal, M. Kandemir, and G. Unal, “Edvae: Mitigating codebook collapse with evidential discrete variational autoencoders,”Pattern Recognition, vol. 156, p. 110792, 2024. [Online]. Available: https: //www.sciencedirect.com/science/article/pii/S0031320324005430
work page 2024
-
[15]
Interactive character control with auto-regressive motion diffusion models,
Y . Shi, J. Wang, X. Jiang, B. Lin, B. Dai, and X. B. Peng, “Interactive character control with auto-regressive motion diffusion models,”ACM Transactions on Graphics (TOG), vol. 43, no. 4, pp. 1–14, 2024
work page 2024
-
[16]
Autoregres- sive image generation without vector quantization.arXiv preprint arXiv:2406.11838,
T. Li, Y . Tian, H. Li, M. Deng, and K. He, “Autoregressive image gen- eration without vector quantization,”arXiv preprint arXiv:2406.11838, 2024
-
[17]
Temos: Generating diverse human motions from textual descriptions,
M. Petrovich, M. J. Black, and G. Varol, “Temos: Generating diverse human motions from textual descriptions,” inEuropean Conference on Computer Vision. Springer, 2022, pp. 480–497
work page 2022
-
[18]
Generating diverse and natural 3d human motions from text,
C. Guo, S. Zou, X. Zuo, S. Wang, W. Ji, X. Li, and L. Cheng, “Generating diverse and natural 3d human motions from text,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 5152–5161
work page 2022
-
[19]
Generating human motion from textual descriptions with discrete representations,
J. Zhang, Y . Zhang, X. Cun, Y . Zhang, H. Zhao, H. Lu, X. Shen, and Y . Shan, “Generating human motion from textual descriptions with discrete representations,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2023, pp. 14 730–14 740
work page 2023
-
[20]
Mo- tiondiffuse: Text-driven human motion generation with diffusion model,
M. Zhang, Z. Cai, L. Pan, F. Hong, X. Guo, L. Yang, and Z. Liu, “Mo- tiondiffuse: Text-driven human motion generation with diffusion model,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024
work page 2024
-
[21]
G. Tevet, S. Raab, B. Gordon, Y . Shafir, D. Cohen-Or, and A. Bermano, “Human motion diffusion model,”arXiv preprint arXiv:2209.14916, 2022
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[22]
Executing your commands via motion diffusion in latent space,
X. Chen, B. Jiang, W. Liu, Z. Huang, B. Fu, T. Chen, and G. Yu, “Executing your commands via motion diffusion in latent space,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 18 000–18 010
work page 2023
-
[23]
Remodiffuse: Retrieval-augmented motion diffusion model,
M. Zhang, X. Guo, L. Pan, Z. Cai, F. Hong, H. Li, L. Yang, and Z. Liu, “Remodiffuse: Retrieval-augmented motion diffusion model,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 364–373
work page 2023
-
[24]
Sport: From zero- shot prompts to real-time motion generation,
B. Ji, Y . Pan, Z. Liu, S. Tan, and X. Yang, “Sport: From zero- shot prompts to real-time motion generation,”IEEE Transactions on Visualization and Computer Graphics, vol. 31, no. 10, pp. 7171–7183, 2025
work page 2025
-
[25]
Guess: Gradually enriching synthesis for text-driven human motion generation,
X. Gao, Y . Yang, Z. Xie, S. Du, Z. Sun, and Y . Wu, “Guess: Gradually enriching synthesis for text-driven human motion generation,”IEEE Transactions on Visualization and Computer Graphics, vol. 30, no. 12, pp. 7518–7530, 2024
work page 2024
-
[26]
Simulating competitive interactions using singly captured motions,
H. P. Shum, T. Komura, and S. Yamazaki, “Simulating competitive interactions using singly captured motions,” inProceedings of the 2007 ACM symposium on Virtual reality software and technology, 2007, pp. 65–72
work page 2007
-
[27]
Animating reactive motion using momentum-based inverse kinematics,
T. Komura, E. S. Ho, and R. W. Lau, “Animating reactive motion using momentum-based inverse kinematics,”Computer Animation and Virtual Worlds, vol. 16, no. 3-4, pp. 213–223, 2005
work page 2005
-
[28]
Human motion diffusion as a generative prior,
Y . Shafir, G. Tevet, R. Kapon, and A. H. Bermano, “Human motion diffusion as a generative prior,”arXiv preprint arXiv:2303.01418, 2023
-
[29]
Role-aware interaction generation from textual description,
M. Tanaka and K. Fujiwara, “Role-aware interaction generation from textual description,” inProceedings of the IEEE/CVF international conference on computer vision, 2023, pp. 15 999–16 009
work page 2023
-
[30]
Intercontrol: Generate hu- man motion interactions by controlling every joint,
Z. Wang, J. Wang, D. Lin, and B. Dai, “Intercontrol: Generate hu- man motion interactions by controlling every joint,”arXiv preprint arXiv:2311.15864, 2023
-
[31]
Intergen: Diffusion- based multi-human motion generation under complex interactions,
H. Liang, W. Zhang, W. Li, J. Yu, and L. Xu, “Intergen: Diffusion- based multi-human motion generation under complex interactions,” International Journal of Computer Vision, pp. 1–21, 2024
work page 2024
-
[32]
Freemotion: A unified framework for number-free text-to-motion synthesis,
K. Fan, J. Tang, W. Cao, R. Yi, M. Li, J. Gong, J. Zhang, Y . Wang, C. Wang, and L. Ma, “Freemotion: A unified framework for number-free text-to-motion synthesis,” inEuropean Conference on Computer Vision. Springer, 2024, pp. 93–109
work page 2024
-
[33]
Timotion: Temporal and interactive framework for efficient human- human motion generation,
Y . Wang, S. Wang, J. Zhang, K. Fan, J. Wu, Z. Xue, and Y . Liu, “Timotion: Temporal and interactive framework for efficient human- human motion generation,” in2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025, pp. 7169–7178
work page 2025
-
[34]
Intermask: 3d human interaction generation via collaborative masked modeling,
M. G. Javed, chuan guo, L. cheng, and X. Li, “Intermask: 3d human interaction generation via collaborative masked modeling,” inThe Thirteenth International Conference on Learning Representations, 2025. [Online]. Available: https://openreview.net/forum?id=ZAyuwJYN8N
work page 2025
-
[35]
Regennet: Towards human action-reaction synthesis,
L. Xu, Y . Zhou, Y . Yan, X. Jin, W. Zhu, F. Rao, X. Yang, and W. Zeng, “Regennet: Towards human action-reaction synthesis,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 1759–1769
work page 2024
-
[36]
Reactffusion: Physical contact- guided diffusion model for reaction generation,
Z. Zhang, S. Zhang, Y . Wang, and S. Li, “Reactffusion: Physical contact- guided diffusion model for reaction generation,” inProceedings of the 33rd ACM International Conference on Multimedia, 2025, pp. 9677– 9685. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 12
work page 2025
-
[37]
Mardini: Masked autoregressive diffusion for video generation at scale,
H. Liu, S. Liu, Z. Zhou, M. Xu, Y . Xie, X. Han, J. C. P ´erez, D. Liu, K. Kahatapitiya, M. Jiaet al., “Mardini: Masked autoregressive diffusion for video generation at scale,”arXiv preprint arXiv:2410.20280, 2024
-
[38]
Mmar: Towards lossless multi-modal auto-regressive probabilistic modeling,
J. Yang, D. Yin, Y . Zhou, F. Rao, W. Zhai, Y . Cao, and Z.-J. Zha, “Mmar: Towards lossless multi-modal auto-regressive probabilistic modeling,” arXiv preprint arXiv:2410.10798, 2024
-
[39]
Rethinking diffusion for text-driven human motion generation,
Z. Meng, Y . Xie, X. Peng, Z. Han, and H. Jiang, “Rethinking diffusion for text-driven human motion generation,”arXiv preprint arXiv:2411.16575, 2024
-
[40]
Diverse motion in-betweening from sparse keyframes with dual posture stitching,
T. Ren, J. Yu, S. Guo, Y . Ma, Y . Ouyang, Z. Zeng, Y . Zhang, and Y . Qin, “Diverse motion in-betweening from sparse keyframes with dual posture stitching,”IEEE Transactions on Visualization and Computer Graphics, vol. 31, no. 2, pp. 1402–1413, 2025
work page 2025
-
[41]
Expressive body capture: 3d hands, face, and body from a single image,
G. Pavlakos, V . Choutas, N. Ghorbani, T. Bolkart, A. A. Osman, D. Tzionas, and M. J. Black, “Expressive body capture: 3d hands, face, and body from a single image,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 10 975–10 985
work page 2019
-
[42]
Parco: Part-coordinating text-to-motion synthesis,
Q. Zou, S. Yuan, S. Du, Y . Wang, C. Liu, Y . Xu, J. Chen, and X. Ji, “Parco: Part-coordinating text-to-motion synthesis,” inEuropean Conference on Computer Vision. Springer, 2024, pp. 126–143
work page 2024
-
[43]
Improved denoising diffusion probabilis- tic models,
A. Q. Nichol and P. Dhariwal, “Improved denoising diffusion probabilis- tic models,” inInternational conference on machine learning. PMLR, 2021, pp. 8162–8171
work page 2021
-
[44]
Inter-x: Towards versatile human-human interaction analysis,
L. Xu, X. Lv, Y . Yan, X. Jin, S. Wu, C. Xu, Y . Liu, Y . Zhou, F. Rao, X. Shenget al., “Inter-x: Towards versatile human-human interaction analysis,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 22 260–22 271
work page 2024
-
[45]
Decoupled Weight Decay Regularization
I. Loshchilov and F. Hutter, “Decoupled weight decay regularization,” arXiv preprint arXiv:1711.05101, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[46]
Auto-encoding variational bayes,
D. P. Kingma, M. Wellinget al., “Auto-encoding variational bayes,” in The 2nd International Conference on Learning Representations, 2014. [Online]. Available: https://openreview.net/forum?id=33X9fd2-9FyZd
work page 2014
-
[47]
Avatars grow legs: Generating smooth human motion from sparse tracking inputs with diffusion model,
Y . Du, R. Kips, A. Pumarola, S. Starke, A. Thabet, and A. Sanakoyeu, “Avatars grow legs: Generating smooth human motion from sparse tracking inputs with diffusion model,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 481– 490
work page 2023
-
[48]
Synthesis of compositional animations from textual descriptions,
A. Ghosh, N. Cheema, C. Oguz, C. Theobalt, and P. Slusallek, “Synthesis of compositional animations from textual descriptions,” in Proceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 1396–1406
work page 2021
-
[49]
Action-conditioned 3d human motion synthesis with transformer vae,
M. Petrovich, M. J. Black, and G. Varol, “Action-conditioned 3d human motion synthesis with transformer vae,” inProceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 10 985–10 995
work page 2021
-
[50]
What is the best automated metric for text to motion generation?
J. V oas, Y . Wang, Q. Huang, and R. Mooney, “What is the best automated metric for text to motion generation?” inSIGGRAPH Asia 2023 Conference Papers, 2023, pp. 1–11
work page 2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.