1 - The speech software

Submitted by Webbot on October 29, 2008 - 5:37pm.

Download the attached file....



The software currently requires about 12k of flash (program) memory and 500 bytes of sram.

This means that it is too big to load into an ATMega8 as per the standard $50 robot. But 'fear not' - see later for several deployment options.


The speech software is broken down into several files.


core.c - This contains all the code to generate speech either from English text or from a list of phonemes. Don't worry if you dont understand what this means yet - just think of it as the code that does the hard work! You should never need to modify this code. Those 'hackers' amongst us will find this the most 'interesting' as this is the file that 'does speech'. If you define a variable called _LOG_ then the system will output some information about the 'sounds' it is creating. On an AVR platform this is sent out via the UART and on a Windows system it is sent out via 'stdout'. This is mainly meant for me when debugging since the time outputing all this info will make the speech un-intelligble.


english.h - This contains two vocabulories. One to help convert English text into phonemes, and another to convert phonemes into the list of their primitive sounds. This file is mainly made up of constant string definitions but looks a bit of a mess - this is due to how you need to write stuff on microcontrollers for the fixed strings to be stored in program memory. A new file, say called 'french.h', could be created to help produce a 'French speech synthesizer' without changing the rest of the code. I'll worry about doing that later - my French ain't great !! You should never need to modify this file - but, if you do, then ask me about the syntax of the dictionaries as they have some 'clever bits'.


buffer.*, uart.*, rprintf.* - These files are just copied from AVRlib and are used to access the UART for user intervention.


speech.c - This is where your main program is executed and is purposely kept small. On an AVR platform it will just wait to receive some text (via the UART) ended in a carriage and/or line feed. This text input is echoed back over the UART. Once a carriage return / line feed has been received then it will attempt to speak the preceeding text. On a non-AVR platform it will create a file to save the details of the speech synthesis. If this is a new 'sentence' then it will create a file in the newly created 'ref' folder. But if the 'ref' folder already contains a file for this sentence then it will create a new file in the 'test' folder and then compare its contents with that in the 'ref' folder. This is basically a test suite - where I can make changes to the code and verify if the software is still trying to do the same thing (ie I haven't messed it up!!).




The source code for the TextToSpeech synthesizer is the exclusive copyright of Clive Webster (Webbot) whose contact details are registered on this site. The source code is made available to individual hobbyists for their own use. These users are allowed to freely modify the source code for their own use but are forbidden from publishing any versions of the software to anyone else in any form including, but not exclusively,:


1. Human readable source files via any means,

2. Compiled code via Programmed integrated circuits or any other means,


without the express permission of the original author. Attempts to publish this code for individual or commercial profit will be treated as a violation of these terms.


Speech-v1.0.zip38.74 KB