Body Music

The Body Music Project, developed in Barcelona, Catalonia, Spain, and completed in 2010, explored the creation of real-time interactive musical/artistic experiences through the Microsoft Kinect sensor. This document provides a detailed look at the technical aspects of the system.

System Architecture and Data Flow

The system's architecture was designed for low-latency real-time processing, crucial for musical interaction. The data flow can be summarized as follows:

Kinect Data Acquisition: The Kinect sensor provided depth and skeletal tracking data at a frame rate of approximately 30 Hz. The skeletal tracking identified and provided 3D coordinates for up to 20 joints per user.
Motion Data Pre-processing:
- Noise Filtering: A Kalman filter was implemented to smooth out jitter and noise in the raw skeletal data, ensuring more stable and predictable control signals for the audio engine.
- Coordinate System Transformation: The Kinect's native coordinate system was transformed into a user-centric coordinate system, with the origin potentially adjustable based on the user's initial position. This facilitated more intuitive mappings between movements and sound parameters.
Motion Feature Extraction and Normalization (Axis Module):
- Head-Pelvis Ratio Analysis: The "axis" module focused on the dynamic relationship between the head and pelvis, calculating ratios of their X, Y, and Z coordinates. The initial reference pose $R0$ captured these ratios: $R0[i] = (T_x/P_x, T_y/P_y, T_z/P_z)$ for relevant joint pairs $i$.
- Temporal Normalization: The current ratios $Rt[i]$ at time $t$ were divided by the initial ratios $R0[i]$ to produce normalized control signals $s[i] = Rt[i] / R0[i]$, making the system less dependent on the user's absolute position.
Sound Parameter Mapping and Control (Engine and Modules):
- Sound Engine: A custom-built or a high-performance audio library (e.g., PortAudio, SuperCollider client) was used for real-time sound synthesis and playback. The engine supported various synthesis techniques (initially sine waves, later more complex waveforms and sample playback).
- Dynamics Module (Volume Control): The "dynamics" module implemented a mapping function that considered the velocity and acceleration of specific body parts (e.g., hand movement along the Z-axis) to control the overall volume. A non-linear mapping curve was likely employed to provide a more expressive dynamic range. The rate of change of $s_z$ (the Z-axis ratio) over time, $ds_z/dt$, and potentially its second derivative, $d^2s_z/dt^2$, were key input parameters.
- Tone Module (Frequency Control): The initial tone was determined by the formula $t = \sin(\sqrt{(\frac{G_{dx}}{G_{sx}})^2+(\frac{R_{dx}}{R_{sx}})^2} \cdot \frac{T_y}{P_y})$, mapping the result to a discrete set of frequencies. Dynamic frequency modulation was achieved by mapping other motion features (e.g., hand height, arm extension) to pitch bend or frequency multipliers applied to the base tone.
- Envelope Module (Amplitude Shaping): The "envelope" module controlled the attack, decay, sustain, and release (ADSR) parameters of the generated sounds. Motion features like the speed of a gesture or the extent of a limb movement were likely mapped to these parameters, allowing for dynamic shaping of the sound's temporal characteristics.
- Sonority Module (Timbre Selection): This module managed the loading and switching of different audio samples. The selection of sample banks (European, Arabic, East Asian) could be triggered by specific gestures or through the user interface. The audio engine supported polyphonic playback and sample looping for sustained sounds.
User Interface and Feedback:
- Symbolic Representation: The display showed abstract "elementary symbols" that corresponded to different active parameter mappings or control modes, providing visual feedback to the user about the system's state.
- Calibration and Mapping Configuration: The user interface likely included options for calibrating the system to their body size and for customizing the mappings between specific movements and sound parameters, enhancing the personalization and expressiveness of the interaction.
Audio Output: The processed audio signals were outputted through standard audio interfaces, supporting various output configurations (stereo, multi-channel).

Addressing Initial Challenges - Technical Solutions

The development of the Body Music Project involved overcoming initial technical hurdles through targeted solutions:

Initial Sound Generation: This was addressed by ensuring correct initialization of the audio output stream and verifying that the generated frequencies were within the audible range (20 Hz - 20 kHz). Debugging tools were used to monitor the audio signal generation process.
Dynamic Sound Modification: The issue of a confusing sound output during dynamic frequency and volume modification was resolved by carefully designing the mapping functions between motion data and sound parameters. This involved:
- Implementing smoothing algorithms on the motion control signals to prevent abrupt and jarring changes in sound parameters.
- Using non-linear mapping curves to provide finer control over smaller movements and larger changes for more exaggerated gestures.
- Potentially introducing thresholds or dead zones to prevent unintended sound changes from minor involuntary movements.
Volume Control (Dynamics Module): The volume control was refined by:
- Implementing a calibration process where the system learned the user's typical range of motion to map it effectively to the volume range.
- Combining multiple motion features (e.g., hand velocity along Z and overall body movement intensity) to provide a more nuanced and expressive volume control. The direct acceleration in the forward direction of specific body parts was likely given higher weight for triggering sound "attacks."
- Introducing damping or smoothing to the volume changes to avoid sudden jumps.
Module Integration: Robust data passing mechanisms and clear communication protocols were established between the different modules to ensure smooth and predictable interaction. Unit tests were likely employed to verify the functionality of individual modules, followed by integration tests to ensure the entire system worked harmoniously.

Body Music Project (Achieved in 2010) - Technical Deep Dive

System Architecture and Data Flow

Addressing Initial Challenges - Technical Solutions

Conclusion