MicrowaveXT/II Vocal Sound Workshop
Part 1

Originally published at:

Author: Snyxol (Alexander Eslava)
English translation: vanHouten (Fabian Logemann)
Final English proof: Carbon111 (James Maier)

"...how to make your MWII sing, moan or belch..."

Do you wish to create mystical, ghostlike or even angelic, wispy female to deep male "oooooohhhhs" and "aaaaaahhhs" like those popular in trance and ambient tracks? Consider yourself lucky to have an MW II/XT at your side as its a sublime tool to handle this - even better than a sampler. Its no surprise indeed, as you should bear in mind its an heir in the dynasty of the PPG-Wave-series, legendary for their synthetic choir sounds.

This article will explore several possibilities of synthesizing vocal-like sounds, from natural-human to alien-like and even metallic voices. Once in a while after you turn the knobs at random for a while, a more or less human sounding voice emanates from your machine. Every time this happens, you create (intentionally or not) the dominant formants like those of a human voice speaking a vowel. Formants are specific peaks in the frequency-spectrum that build up a kind of signature in the sound. For the synthesis of vowel sounds, only the loudest of the formants are relevant. If they were not expressed this way, people with different voices would hardly be able to understand each other because of the differences in their voice spectra. This vowel recognition paradigm paves the way for us to simulate vowels more easily. The vowel-formants are independent of pitch, which means while singing an "aa-aa-aa-..." melody, the "aa"-related formants always remain at the same frequencies.

1) Using pure wavetables

Let's start at the most obvious place:
We'll use wavetable #64 ("chorus 2"). This is kind of a pseudo multi-sample of a spoken "Aaaa" in 61 different pitches, such a collection of waves is necessary to make our "Aaaa" a realistic sounding vowel. If we played back a single wave using different pitches, it's formant would be shifted as well, resulting in a mickey-mouse or monster-like voice. This wavetable was set up in the following way to counteract this:
Make a Fourier-analysis of a deep male choir sample and derive a wave from the resulting spectrum using Fourier-synthesis (these waves sound like the original choir, but produces only a harmonic, thin and sterile tone). The other waves were made in the same manner, with their formants being shifted accordingly within the spectrum. For each note there is the correct wave, so each formant stays in tune. For an example, let's have a look at the patch "Stenzels Chor". This pad plays the right "Aaa" on each note, thus modulating Wave1 and Wave2. In this case it is useful to link Wave2-Startwave to Wave1-Startwave (using the Wave2-menu), to do that you only need to modulate Wave1, Wave2 will follow automatically.

There are two ways of Modulating Wave1 :
* Wave1-menu: keytrack: +100% (elegant)
* ModMatrix: keytrack/keyfollow +56 -> Wave1 Pos (keyfollow also considers pitchbend and glide as well as tracking, thus its better)
Reasonably, both Startwaves are set at 30 to cover the whole wave-range of 0-61; and don't forget to "limit" the wavetable (Wave1/2-menu) to avoid triggering the analog waveforms at the end of the each wavetable! Let's get to work and tune "Stenzels Chor" a little bit:


First we should attempt to get a little more of a "floating" sound and make it more dense and blurred. The "basic" (Send-/Global-)Chorus is already working for us as it should be. Further detuning of the osc's might not be enough because it still sounds kind of weak and electronic. So, first we need to increase the number of voices :

Effectively we now have 4 oscillators and to make them sound more dense, we detune their pitches equally from each other (as is done with UnisonoAssign): Consider the whole detune-width to be "n" units. Our 4 pitches (4 osc's) have 3 gaps between them ->each gap is n/3 detune-units wide -> The rule for equidistant detuning in DualAssign with detune-width "n":
1st method:     osc1 detune= -n/6
                          osc2 detune= +n/6
                          trigger2 menu: detune= 2/3*n
2nd method:    osc1 detune= -n/3
                          osc2 detune= +n/3
                          trigger2 menu: detune= n/3

Let's choose the 2nd method with n=45. Set osc1Detune=-15; trigger2Menu:detune=15
More stereo? -> De-Pan=64-80. (Don't show off by De-Panning too much, 'cause it thins out the floating sensation!). By engaging Pan-Delay for our purposes, the sound gets even thicker! Going from thick to thicker - there are two ways to get there; the first, UnisonoAssign, has the advantage of being extremely dense & fat sounding - and you still have some effects to play with! Its disadvantage is the loss of voices you might need for other purposes!!! The second method is to engage both chorus effects which may help you to get along quite well even with NormalAssign but you sacrifice the ability to use delay.

Considering this method, one osc should be sufficient, so turn down Wave2Level in the mixer first and enter
Trigger2-menu: Detune=40-55 ; De-Pan=127
By turning Wave2Level up again and detuning it slightly from Osc1, it becomes more dynamic and smoothly morphing:
Osc1Detune=-2 ; Osc2Detune=+2
Add some Pan-Delay and you should now have a really dense, deep choir. The beautifull "floating" is pleasent if the voice is triggered with short notes.

c)Engaging both chorus effects:
Even a monophonic, constant, sterile tone can be widened without increasing the number of voices by making use of the (Insert-)Chorus of the effects-section, combined with the (Send-)Chorus (the one without parameters). The latter one is fed by the output-sum of the first one (serial), thus you get effectively 8 delay voices added to the original voice. Well, it´s not really the same as with Unisono because the pitches of the echoes are shifting, but the result might be even warmer, more dynamic and wider in stereo seperation than using UnisonoAssign. So don't be afraid to be a miser when it comes to effectively sparing voices.
In our case we will work with NormalAssign:
Chorus: Speed=65, Depth=127, Mix=0:127
->the more Speed, the more fat and wobbly it gets. Use values up to 68 for short notes, while   slower pad-sounds go better with lower values about 53.
In the case that lower frequencies have too much tremolo, decrease the mix-ratio or engage Filter2 as a HP.

Controlling the position of the formants:

a) Phase Shifting
  For this purpose you might adjust Startwave1/2 or the pitches. For playing around and testing purposes, it is reasonable to let only Wave1 through the mixer. Of course you can make use of the ModWheel to take control of Wave1. Wave2 will also follow as long as "link" is set to "on" in the Wave2-menu. If you increase the Startwave positions, the formants will decrease and in doing so you might change the "Aaa" into an "Ooo". Have fun playing around using modulation-sources like the LFO or WaveEnvelope for formant shifting. This sounds like a kind of beautifull filter-sweep.

b) No keytrack
  As realisticly "vowel-like" that wave modulation by keytrack might sound, it is inflexible when it comes to playing melodies. It sounds just too bland, due to it's limitation to the vowel "A" - dull when its triggered by high notes, brighter when triggered by lower notes. When using analog waveforms such as sawtooth, of course all the formants are shifting along with the pitch. So set Wave1Keytrack to zero and adjust Pitch and Startwave accordingly to what kind of voice you want to hear, then play a melody which shouldn't exceed the range of two octaves.

Appropriate Wavetables

-WT 28 ("FmntVocal") is also a formant-sweep, quite narrow like a BP-filter and is useful to be "formant-shifted" by the WaveEnvelope. To get more bass into it, you might set both Startwaves to zero while tuning Wave1EnvAmount to a high value and Wave2EnvAmount to a lower one; Envelope times should be pretty slow to avoid harsh transitions.
-WT 57 ("MaleVoice") consists of the vowels "a,e,i,o,u".
-WT's 008 ("MellowSaw"), 009 ("Feedback") and 010 ("AddHarm") are also useful to create voice-patches.
-combining certain different waves of WT 001 ("Resonant") or 025 ("ResoHarms") some smooth and high voice-like sounds can be created as well. In this case Osc1 and Osc2 act as the formants.
-Even ultra-harsh waves like WT 044 ("FuzzWave") might sound voicy, if they are dampened by the 24dB-LP and/or the Filter2-LP.

That's all Folks!
...to be continued!