Beyond Imitation: Generative and Variational Choreography via Machine Learning

Chase Shimmin; Douglas Duhaime; Ilya Vidrin; Mariel Pettee

arxiv: 1907.05297 · v1 · pith:T2467NYLnew · submitted 2019-07-11 · 💻 cs.LG · cs.MM· stat.ML

Beyond Imitation: Generative and Variational Choreography via Machine Learning

Mariel Pettee , Chase Shimmin , Douglas Duhaime , Ilya Vidrin This is my paper

Pith reviewed 2026-05-24 23:07 UTC · model grok-4.3

classification 💻 cs.LG cs.MMstat.ML

keywords machine learningchoreographyrecurrent neural networksautoencodersmotion capturegenerative modelsdance generationvariational models

0 comments

The pith

Recurrent neural networks and autoencoders trained on 53 three-dimensional motion points generate novel choreography sequences and tunable variations on input sequences.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows how recurrent neural networks can produce entirely new sequences of dance movements while autoencoders allow controlled adjustments to existing sequences. Training uses motion data recorded as 53 points in three dimensions at each time step. The authors, working as a team of dancers, physicists, and machine learning researchers, built configurable tools that output these sequences without requiring step-by-step human design. If the approach holds, it supplies a practical method for exploring large numbers of movement possibilities that would otherwise demand extensive manual trial. Readers would find this relevant because it applies generative modeling directly to an art form where originality and variation are central.

Core claim

We have developed several original, configurable machine-learning tools using recurrent neural network and autoencoder architectures trained on a dataset of movements captured as 53 three-dimensional points at each timestep to generate novel sequences of choreography as well as tunable variations on input choreographic sequences.

What carries the argument

Recurrent neural network and autoencoder architectures that process sequences of 53 three-dimensional points representing body positions at successive timesteps.

If this is right

Choreographers gain a method to produce starting material that can be refined by hand rather than composed entirely from scratch.
The autoencoder component enables systematic exploration of variations around a given movement phrase without manual redrawing of each frame.
Interactive use of the models supports real-time generation of animation sequences for rehearsal or performance planning.
The same architectures can be retrained on new motion datasets to adapt the tools to different dance styles or body types.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same point-based representation could support generation tasks in related movement arts such as figure skating or martial arts forms.
Patterns discovered by the models in the 53-point data might highlight structural regularities in human locomotion that are difficult to articulate verbally.
Coupling the generators with motion-capture hardware in a studio could create a feedback loop where live performers influence and respond to machine outputs.
Extending the input representation to include multiple interacting bodies would allow the tools to address group choreography.

Load-bearing premise

Generated sequences will be perceived as coherent, novel, and artistically valuable choreography by human dancers and audiences.

What would settle it

A rating study in which professional dancers and audiences score the generated sequences against human-composed choreography on scales of coherence, novelty, and artistic value, with consistently low scores for the machine outputs.

read the original abstract

Our team of dance artists, physicists, and machine learning researchers has collectively developed several original, configurable machine-learning tools to generate novel sequences of choreography as well as tunable variations on input choreographic sequences. We use recurrent neural network and autoencoder architectures from a training dataset of movements captured as 53 three-dimensional points at each timestep. Sample animations of generated sequences and an interactive version of our model can be found at http: //www.beyondimitation.com.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Standard RNN and autoencoder models applied to 3D dance data with no metrics, baselines, or evidence that outputs are novel or coherent.

read the letter

The main takeaway is that this paper points recurrent networks and autoencoders at 53-point 3D motion capture sequences for choreography generation and variation, yet the abstract supplies zero quantitative support for those claims. The techniques themselves predate the work by years, and nothing here reorganizes the field or demonstrates a clear advance in sequence modeling. What the paper does accomplish is assemble an interdisciplinary group and release sample animations plus an interactive demo at their site. That gives readers a direct look at the outputs without needing to reimplement anything. The soft spots are large and central. No training details, loss functions, reconstruction errors, diversity statistics, or comparisons to baselines appear. The assertion that the models produce novel sequences therefore rests only on the existence of the animations, which cannot distinguish memorization or interpolation from genuine generation. The further claim that variations are tunable in artistically useful ways is likewise untested by any reported measure. This is aimed at researchers working on machine learning for performing arts who want a concrete example in dance. A reader seeking reproducible methods or rigorous evaluation of generative quality will find little to use. I would not bring it to a reading group, would not cite it, and would not send it for peer review in this form because the core claims lack any supporting evidence.

Referee Report

2 major / 0 minor

Summary. The manuscript describes an interdisciplinary effort to apply recurrent neural networks and autoencoders to a dataset of 53 three-dimensional motion-capture points per timestep, with the goal of producing novel choreography sequences and tunable variations on input sequences. Sample animations and an interactive model are referenced via an external website.

Significance. If the generated sequences could be shown to be both novel and coherent beyond simple memorization or interpolation of the training data, the work would provide a practical demonstration of generative models for artistic motion synthesis. The collaboration between artists, physicists, and ML researchers is a positive aspect, but the absence of any quantitative evaluation makes it difficult to assess the technical advance relative to existing motion-generation literature.

major comments (2)

[Abstract] Abstract: the central claim that the models 'generate novel sequences of choreography' is unsupported; no training procedure, loss functions, optimization details, or quantitative metrics (e.g., reconstruction error, diversity statistics, or comparison to nearest-neighbor baselines) are supplied to demonstrate that outputs differ meaningfully from the training set.
[Abstract] The manuscript provides no description of how the 53-point 3D sequences are preprocessed, encoded, or decoded, nor any architecture diagram, hyperparameter values, or training/validation split; without these the reproducibility of the claimed generative and variational capabilities cannot be assessed.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments. We agree that the manuscript requires additional technical details and quantitative support to substantiate its claims, and we will revise accordingly to improve reproducibility and clarity.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim that the models 'generate novel sequences of choreography' is unsupported; no training procedure, loss functions, optimization details, or quantitative metrics (e.g., reconstruction error, diversity statistics, or comparison to nearest-neighbor baselines) are supplied to demonstrate that outputs differ meaningfully from the training set.

Authors: We agree that the abstract claim of generating novel sequences requires empirical support. In the revised manuscript we will add a methods section detailing the training procedure, loss functions, and optimization, along with quantitative metrics including reconstruction error on held-out data, diversity statistics across generated samples, and explicit comparisons against nearest-neighbor baselines from the training set to demonstrate that outputs are not simple memorization or interpolation. revision: yes
Referee: [Abstract] The manuscript provides no description of how the 53-point 3D sequences are preprocessed, encoded, or decoded, nor any architecture diagram, hyperparameter values, or training/validation split; without these the reproducibility of the claimed generative and variational capabilities cannot be assessed.

Authors: We acknowledge that these implementation details are currently absent. The revised manuscript will include a complete description of the 53-point 3D data preprocessing pipeline, network architectures (with diagrams), all hyperparameter values, and the training/validation split used, thereby enabling full reproducibility of the generative and variational results. revision: yes

Circularity Check

0 steps flagged

No circularity in derivation; standard ML application to motion data

full rationale

The paper applies established recurrent neural network and autoencoder architectures to a dataset of 53-point 3D motion capture sequences to produce generated choreography. No equations, parameters, or first-principles derivations are presented that reduce claimed outputs (novel sequences or variations) to re-expressions of the inputs by construction. The approach relies on direct training and inference of known models without self-definitional steps, fitted quantities renamed as predictions, or load-bearing self-citations. The central claims concern the practical generation of sequences from trained models and are self-contained against external benchmarks such as the training data and standard architectures.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

Abstract-only review prevents enumeration of concrete free parameters or axioms; the central claim implicitly assumes that 53-point 3D trajectories are a sufficient representation of dance and that standard sequence models will produce artistically coherent output.

free parameters (1)

RNN and autoencoder architecture hyperparameters
Network depth, hidden size, and training schedule are required for any such model yet are not reported.

axioms (1)

domain assumption 53 three-dimensional points per timestep capture the essential structure of dance movement
The abstract states this representation is used for training without further justification.

pith-pipeline@v0.9.0 · 5608 in / 1284 out tokens · 25499 ms · 2026-05-24T23:07:31.151610+00:00 · methodology

Beyond Imitation: Generative and Variational Choreography via Machine Learning

Core claim

What carries the argument

If this is right

Where Pith is reading between the lines

Load-bearing premise

What would settle it

discussion (0)