Body Music Project (Achieved in 2010) - Technical Deep Dive

The Body Music Project, developed in Barcelona, Catalonia, Spain, and completed in 2010, explored the creation of real-time interactive musical/artistic experiences through the Microsoft Kinect sensor. This document provides a detailed look at the technical aspects of the system.

System Architecture and Data Flow

The system's architecture was designed for low-latency real-time processing, crucial for musical interaction. The data flow can be summarized as follows:

  1. Kinect Data Acquisition: The Kinect sensor provided depth and skeletal tracking data at a frame rate of approximately 30 Hz. The skeletal tracking identified and provided 3D coordinates for up to 20 joints per user.
  2. Motion Data Pre-processing:
    • Noise Filtering: A Kalman filter was implemented to smooth out jitter and noise in the raw skeletal data, ensuring more stable and predictable control signals for the audio engine.
    • Coordinate System Transformation: The Kinect's native coordinate system was transformed into a user-centric coordinate system, with the origin potentially adjustable based on the user's initial position. This facilitated more intuitive mappings between movements and sound parameters.
  3. Motion Feature Extraction and Normalization (Axis Module):
    • Head-Pelvis Ratio Analysis: The "axis" module focused on the dynamic relationship between the head and pelvis, calculating ratios of their X, Y, and Z coordinates. The initial reference pose $R0$ captured these ratios: $R0[i] = (T_x/P_x, T_y/P_y, T_z/P_z)$ for relevant joint pairs $i$.
    • Temporal Normalization: The current ratios $Rt[i]$ at time $t$ were divided by the initial ratios $R0[i]$ to produce normalized control signals $s[i] = Rt[i] / R0[i]$, making the system less dependent on the user's absolute position.
  4. Sound Parameter Mapping and Control (Engine and Modules):
    • Sound Engine: A custom-built or a high-performance audio library (e.g., PortAudio, SuperCollider client) was used for real-time sound synthesis and playback. The engine supported various synthesis techniques (initially sine waves, later more complex waveforms and sample playback).
    • Dynamics Module (Volume Control): The "dynamics" module implemented a mapping function that considered the velocity and acceleration of specific body parts (e.g., hand movement along the Z-axis) to control the overall volume. A non-linear mapping curve was likely employed to provide a more expressive dynamic range. The rate of change of $s_z$ (the Z-axis ratio) over time, $ds_z/dt$, and potentially its second derivative, $d^2s_z/dt^2$, were key input parameters.
    • Tone Module (Frequency Control): The initial tone was determined by the formula $t = \sin(\sqrt{(\frac{G_{dx}}{G_{sx}})^2+(\frac{R_{dx}}{R_{sx}})^2} \cdot \frac{T_y}{P_y})$, mapping the result to a discrete set of frequencies. Dynamic frequency modulation was achieved by mapping other motion features (e.g., hand height, arm extension) to pitch bend or frequency multipliers applied to the base tone.
    • Envelope Module (Amplitude Shaping): The "envelope" module controlled the attack, decay, sustain, and release (ADSR) parameters of the generated sounds. Motion features like the speed of a gesture or the extent of a limb movement were likely mapped to these parameters, allowing for dynamic shaping of the sound's temporal characteristics.
    • Sonority Module (Timbre Selection): This module managed the loading and switching of different audio samples. The selection of sample banks (European, Arabic, East Asian) could be triggered by specific gestures or through the user interface. The audio engine supported polyphonic playback and sample looping for sustained sounds.
  5. User Interface and Feedback:
    • Symbolic Representation: The display showed abstract "elementary symbols" that corresponded to different active parameter mappings or control modes, providing visual feedback to the user about the system's state.
    • Calibration and Mapping Configuration: The user interface likely included options for calibrating the system to their body size and for customizing the mappings between specific movements and sound parameters, enhancing the personalization and expressiveness of the interaction.
  6. Audio Output: The processed audio signals were outputted through standard audio interfaces, supporting various output configurations (stereo, multi-channel).

Addressing Initial Challenges - Technical Solutions

The development of the Body Music Project involved overcoming initial technical hurdles through targeted solutions:

Conclusion

The Body Music Project, completed in 2010, represented a significant exploration into the realm of embodied musical interaction. Through careful attention to technical details encompassing data processing, mapping strategies, and system integration, the project provided a platform for real-time sound manipulation driven by human movement. The technical aspects outlined here illustrate the complexities involved in creating a responsive link between gesture and sonic expression.

Back to Home