Gil Sperling

video, stage and music

Week 2 – mock up

February 11, 2020 by gilsperling

Refined Project Prompt

An interactive installation that allows the user to hear themselves singing in the voice of a famous opera singer.

Target audience: anyone with some interest in music. There is a range of musical background/expertise for which this interaction is most effective – I’m assuming between very little and not too much.
There is another potential target audience which is people with speech/voice disabilities, but I am not designing for that group exclusively.

System Sketch

The user mouths (without using their voice) into a camera. Their mouth positions are encoded into vowels. The height of their hand controls the height of the sung pitch.
Sound playback – speakers so that the user can hear their singing resonate in space (and an audience can hear them); and at the same time, having it feel to the user like the voice is theirs – perhaps through a combination of headphones and something that transfers vibration to the neck/chest.

A system diagram containing visual inputs on the left, processing/synthesis in the middle and audio output on the right
animated gif – a man mouthing into a webcam and moving his hand vertically above an ultrasonic range sensor


The audio source I used is a recording of Maria Callas singing “Ave Maria” by Franz Schubert. Using the sampler in Ableton, I extracted samples of Callas singing something close to pure vowels. Then I used the midi keyboard to modulate the pitch of the samples, and recorded the result.

The videos below show mockups of two user paths.
Path no. 1: The user improvises their own tune, and hears themselves in Maria Callas’ voice.

video demonstration – man mouthing silently and moving one hand vertically

Path no. 2: The user “sings” a known tune, in this case the aria of the queen of the night from Mozart’s “Magic Flute”.

Initial Interaction testing

I used Teachable Machine to test whether it could be used to identify mouth positions for vowels. I trained it on images of my mouth, classifying for three distinct vowels. The results were very reliable when testing on myself and fairly good when testing with a different person’s mouth.
The video below is a screengrab from a p5.js sketch that has the trained model imported.

Posted in Spring 20 - Music Interaction Design |

Comments are closed.