Hello everyone. I've decided to start a project to develop a robotic talking head. I'm using a female head and I've named her Alysha. From a previous thread I started there appears to be some interest in robotic human heads. So I thought I'd start a thread dedicated to this type of project. Everyone is welcome to comment, contribute, or add information from similar projects.
In the meantime I thought I'd share what I've done thus far. To begin with I just found a fairly decent female mask that I might be able to use for the actual project. CyberJeff contributed a link to a "Party City" mask that may prove to be useful. So I would like to thank CyberJeff for that important contribution. The mask it designed to be worn by an actual human however. It's meant to be a Halloween mask even though it appears to cover the entire head and even has its own ears and hair! But it has no eyes, teeth, or tongue. However, I already have plans for securing those additional items.
In the meantime I thought I would begin with the actual design work. This is going to be a HUGE project. Make no mistake about that. But at the same time I believe that it is doable and within the reach of the average hobbyist. The main thing to realize is that this head is indeed going to appear as a "Talking Manikin". It's in no way going to be convincing as an actual living entity. But hey, what should hobbyists expect right?
In any case, I've been working out what it might take to make this thing talk. I'm not concerned with speech synthesis, since computers already do that. I plan on using standard speech synthesis and a speaker. So the actual speech synthesis is basically already done. There are tons of resources for that.
In this project the goal is to make a head that merely "appears
" to be talking. It is with this goal in mind that I offer the following brainstorming ideas:
To begin with I started a spreadsheet and simply typed down one column the letters of the alphabet. Then I set about saying each letter and trying to decide how many articulation motions I would need. Obviously I could get carried away if I got lost in trying to recreate too many details. So instead I chose the exact opposite extreme. What is the absolute LEAST motion that I could get away with and still have a fairly convincing talking head?
Well, after analyzing the entire alphabet and the first ten numerical digits I came to the realization that I only need 6 basically motions. Imagine that. This means that the entire speech mechanism could be accomplished with as few as 6 servos.
Here's the six motions that I came up with:
The first column marked Key_# is just the number of the important motions. As I went through the alphabet I realized that most letters would look the same in terms of visual appearance so I marked those as duplicates of previous letters. When all is said and done I only ended up with these six "Primal Motions".
The second column is just the letter of the Alphabet where these motions FIRST appeared. Believe it or not you can visually reproduce the entire alphabet with just these six motions of your mouth, teeth, and tongue. And it does require all three of these objects. The third column tells how many motions are required for a letter. Notice that I have A marked as zero. That's because A will be the "resting position" or the "home" position of the mouth. So nothing needs to be done to articulate the letter A (or any other key #1 letter). (by the way I have the full spreadsheet if anyone would like to see the entire 26 letters and how they are made up of just these six motions). Keep in mind we are only talking about "Visual appearance" here. Not actually making any sounds. Sound making is not required for this project. That will be taken care of by speakers and a computer simulated voice.
All we need to do here is construct at head that "appears" to be making these sounds. And these six motions will suffice.
The column marked Comb_1 describes how the motions are made up from other primal motions.
The column marked "Mouth Pos" is the position of the mouth (simply O-pen, or C-losed). That's all we need. Except for the O sound which requires a special effects on the lips.
The column marked "Teeth Pos" is the position of the teeth (again simply O-pen, or C-losed). That's all we need except for the special F sound which requires the lower jaw to retreat under the upper teeth. So that's another "special effect".
And finally we have the "Tongue Pos" where the tongue is either laying on the bottom of the mouth (its home or N-ormal position) or it curves up to touch the roof of the mouth, the UP position. Moving the tongue will be subtle but could make the head look quite a bit more realistic having a tongue actually taking part in speech.
So that's it. Just six motions for the speech. Six Servos! Not too bad.
I also have in mind an idea for a jaw that kind of floats around a bit on it's own to add a more realistic bounce to the speech. The head itself may have other feature that could change even during speech. But right now I'm just focusing on what it takes to articulate speech motion.What's Next?
Well, after I figured out these motions. I decided to draw them up and then make a simple animated GIF of them.
Here are the six basic motions:
You'll probably notice there are actually seven drawings here, but E is the same as A. Visually they are the same. And keep in mind, I'm reducing this to the bare essentials. Clearly a person could easily get carried away trying to reproduce every little human motion. But let's not go there. It's simply not required. These Six motions will do, at least for a hobby head.
We're not working on a Hollywood movie BUDGET! (or at least I'm not) Gotta keep things simple.
I made an animated GIF of these six motions. But my photo hosting site doesn't seem to handle animated GIFs, I'll try to add it here as an attachment. Not sure if that will work or not. Where do we go from here?
I guess the next step is to get out some servos and some prototyping materials to see what it's going to take to articulate these six motions.