Go witheFlow: Real-time Emotion Driven Audio Effects Modulation

Edmund Dervakos; Giorgos Stamou; Jason Liartis; Spyridon Kantarelis; Vassilis Lyberatos

arxiv: 2510.02171 · v3 · pith:E3UAUPRQnew · submitted 2025-10-02 · 💻 cs.SD · cs.AI· eess.AS

Go witheFlow: Real-time Emotion Driven Audio Effects Modulation

Edmund Dervakos , Spyridon Kantarelis , Vassilis Lyberatos , Jason Liartis , Giorgos Stamou This is my paper

Pith reviewed 2026-05-22 11:47 UTC · model grok-4.3

classification 💻 cs.SD cs.AIeess.AS

keywords real-time music performanceemotion recognitionbiosignal processingaudio effects modulationhuman-machine collaborationlive audio systems

0 comments

The pith

The witheFlow system automatically modulates audio effects in real time by reading emotional features from biosignals and the audio signal itself.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Music performance links the performer's emotional state to sound in ways machines cannot replicate alone. The paper presents a proof-of-concept system that pulls features from both biosignals and the audio stream, then uses them to adjust effects on the fly during live play. The design runs locally on ordinary hardware and requires no manual recalibration once started. This setup lets the performer focus on expression while the machine handles effect changes that match the detected emotion. A reader would care because the work treats human-machine music making as a shared affective process rather than a one-way control task.

Core claim

The paper introduces the witheFlow system, designed to enhance real-time music performance by automatically modulating audio effects based on features extracted from both biosignals and the audio itself. The system is currently in a proof-of-concept phase, is designed to be lightweight and able to run locally on a laptop, and is open-source given the availability of a compatible Digital Audio Workstation and sensors.

What carries the argument

The witheFlow system, which extracts emotional features from biosignals and audio to drive automatic changes to audio effects.

If this is right

Performers gain the ability to focus on playing without pausing to adjust effects by hand.
The local, lightweight design allows the system to run during ordinary gigs without extra servers.
Open-source release lets other musicians and developers adapt the mapping for different instruments or sensors.
The approach treats music performance as a collaboration in which the machine contributes to affective expression.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar emotion-driven modulation could be tested in non-music settings such as live visual mixing or spoken-word performance.
Longer sessions with multiple performers would show whether the feature extraction remains stable when fatigue or changing conditions appear.
Pairing the current mapping with additional sensor types could produce finer-grained effect control without increasing hardware demands.

Load-bearing premise

Emotional features can be reliably extracted from biosignals and audio in a live performance setting and mapped to meaningful audio effect changes without manual intervention or extensive calibration.

What would settle it

A live performance trial in which the automatic effect changes fail to track the performer's intended emotional shifts or the system cannot maintain real-time response on standard laptop hardware.

Figures

Figures reproduced from arXiv: 2510.02171 by Edmund Dervakos, Giorgos Stamou, Jason Liartis, Spyridon Kantarelis, Vassilis Lyberatos.

**Figure 2.** Figure 2: The witheFlow System. beta power correlates with heightened attention [12], while elevated alpha power indicates greater relaxation [16]. These metrics are defined as: Attention = Beta Power Alpha Power + Beta Power , Relaxation = Alpha Power Alpha Power + Beta Power . The ECG signal is sampled at 1000 Hz. Stress is estimated using a sliding window approach: every 0.5 seconds (500 samples), RR intervals fr… view at source ↗

read the original abstract

Music performance is a distinctly human activity, intrinsically linked to the performer's ability to convey, evoke, or express emotion. Machines cannot perform music in the human sense; they can produce, reproduce, execute, or synthesize music, but they lack the capacity for affective or emotional experience. As such, music performance is an ideal candidate through which to explore aspects of collaboration between humans and machines. In this paper, we introduce the witheFlow system, designed to enhance real-time music performance by automatically modulating audio effects based on features extracted from both biosignals and the audio itself. The system, currently in a proof-of-concept phase, is designed to be lightweight, able to run locally on a laptop, and is open-source given the availability of a compatible Digital Audio Workstation and sensors.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces the witheFlow system as a proof-of-concept for real-time music performance enhancement. It automatically modulates audio effects by extracting features from biosignals and the audio signal itself, with the goal of supporting emotional expression through human-machine collaboration. The system is presented as lightweight, locally executable on a laptop, and open-source when paired with a compatible DAW and sensors.

Significance. A validated implementation could advance affective computing applications in live music by providing an automatic, low-overhead link between performer state and sonic processing. The emphasis on local execution and open-source availability would support reproducibility and accessibility if the core mapping is shown to be robust.

major comments (2)

[Abstract] Abstract: The claim that the system enhances real-time performance by automatically modulating effects based on emotional features is presented without any supporting data, validation results, error analysis, latency measurements, or perceptual evaluations, leaving the central assertion unsupported.
[System description] System description: The assumption that biosignal and audio features can be reliably mapped to musically coherent effect changes in live settings without per-user calibration or manual intervention is stated as a design goal but receives no discussion of robustness to movement artifacts, sensor noise, or real-time constraints.

minor comments (2)

The manuscript would benefit from explicit comparison to prior work on biosignal-driven music systems and emotion recognition in performance contexts to clarify novelty.
Clarify the exact biosignals employed and the feature extraction methods, as these details are essential for assessing feasibility even in a proof-of-concept.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments on our proof-of-concept manuscript. We have revised the paper to clarify the scope of our claims and to expand discussion of practical challenges in the system description.

read point-by-point responses

Referee: [Abstract] Abstract: The claim that the system enhances real-time performance by automatically modulating effects based on emotional features is presented without any supporting data, validation results, error analysis, latency measurements, or perceptual evaluations, leaving the central assertion unsupported.

Authors: We agree that the abstract overstates the current evidence. As this is explicitly a proof-of-concept implementation, we have revised the abstract to describe the system as designed to support real-time modulation rather than claiming demonstrated enhancement. We have also added a dedicated limitations section that acknowledges the lack of validation data, error analysis, latency measurements, and perceptual evaluations, and outlines these as priorities for future work. revision: yes
Referee: [System description] System description: The assumption that biosignal and audio features can be reliably mapped to musically coherent effect changes in live settings without per-user calibration or manual intervention is stated as a design goal but receives no discussion of robustness to movement artifacts, sensor noise, or real-time constraints.

Authors: We accept that the original manuscript did not sufficiently address these robustness issues. The revised system description now includes explicit discussion of movement artifacts, sensor noise, and real-time constraints, describing the lightweight feature extraction choices and noting that the current mapping is a fixed initial implementation without per-user calibration. We have clarified that these aspects represent acknowledged limitations of the proof-of-concept and are targeted for future investigation. revision: partial

Circularity Check

0 steps flagged

No circularity: descriptive systems introduction with no derivations or fitted predictions

full rationale

The paper is a proof-of-concept systems description of the witheFlow architecture for modulating audio effects from biosignals and audio features in live performance. No equations, parameter fittings, predictions, or derivation chains appear in the abstract or manuscript text. The central claim is simply the introduction of a lightweight, locally runnable, open-source system; this does not reduce to any self-definition, fitted input renamed as prediction, or self-citation load-bearing step. The absence of any mathematical or statistical modeling means there is no opportunity for the circularity patterns enumerated in the guidelines.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The available text is a high-level system description with no mathematical model, fitted parameters, background axioms, or newly postulated entities.

pith-pipeline@v0.9.0 · 5679 in / 1115 out tokens · 55395 ms · 2026-05-22T11:47:52.763657+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/ArithmeticFromLogic.lean LogicNat.induction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

mixing logic ... rulesets encoded in YAML files ... conditions of the form a < x < b where x is one of stress, attention, valence, or arousal
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Attention = Beta Power / (Alpha Power + Beta Power)

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.