Using a digital device such as a processor makes it quite difficult to generate audio signals in an analogue domain. Digital devices talk about '1's and '0's whereas audio devices talk about a scalar volume as an analogue voltage. So how do we convert from one world to another?
At the end-of-the-day the speech program will end up trying to produce a sound wave from a volume in the range 0 to 15. Where 0=quiet and 15=full volume. Quite how it makes this 0 to 15 value available, or audible, to the outside world is one of the most difficult aspects.
The easy answer is that we need a DAC (digital to analogue convertor). But our processor doesn't have one!
So we will have to build one!
There are many/many ways to do this:-
1. One wire PWM
Look back at FM Radios. They used a constant carrier (high) frequency as the transport. The audio was then imposed by making small changes to the frequency. At the receiver end, you could cancel out the carrier frequency which was much higher, to be just left with the small 'audio' frequency. So if we used PWM with a very high frequency we could encode the audio on top (by changing the duty cycle) and then filter out the PWM base frequency at the recevier end to return to the audio information. Why all this encode/decode stuff? Well it allows us to transmit a signal over a single wire that can be returned back into an audio signal. But this is the most complex! So we only need one I/O pin from our controller but can still try to achieve the 16 volume levels to make the speech as clear as possible.
2. One wire Digital.
This is by far the easiest to understand and the cheapeast to implement. Given that the volume can be in the range 0 to 15 then we can turn it into a digital signal by saying: if input is 0.....7 then output=0 else if input is 8......15 then output=1. So we have changed the volume from 0...15 to 0..1. Of course, the output signal is somewhat of a distortion (or simplification) of the input signal. ie signal in = 0...15, but signal out = 0...1. So something has been lost - and it is the nuances of the sounds. But we can turn the resultant 0 or 1 into 'speaker off' or 'speaker on' commands very easily to make our sound. So we only need one I/O pin from our controllerand our resultant circuitry is simple, if less effective.
3. 4 wire Digital
The previous example is 'digital' and so can only say 'sound on' or 'sound off' - ie it 'shouts' or 'is quiet'. But since the speech core can generate sound envelope volumes from 0 to 15 then how do we implement them? This option gives us 2 to the power of 4 (ie 16) possible values. If we have 4 output pins available on our controller then we can output all of the values from 0 to 15. How we convert this back to the analogue world is done by the magic of R/2R ladders. See http://en.wikipedia.org/wiki/Resistor_Ladder. This allows us to continue to use the spectrum of volumes but requires 4 I/O pins to do so.
The additional hardware required for each of these options is given in the following sections:-