Author Topic: Robotic Vocal Tract Concept  (Read 2947 times)

0 Members and 1 Guest are viewing this topic.

Offline SomeNorthernerTopic starter

  • Beginner
  • *
  • Posts: 4
  • Helpful? 2
Robotic Vocal Tract Concept
« on: September 11, 2015, 09:04:37 AM »
Greetings all!

I figured I would post this in misc as it is merely theoretical.

I'm sure many of us have seen the demonstrations of a robotic vocal tract designed in East Asia, constructed from some kind of silicone[?] and manipulated by a series of metal supports.  Mechanically speaking it is wonderful, but the speech it produces is eery and remains artificial [uncanny valley]... and slow... -- why bother then? why bother to replicate actual speech rather than synthesised "speech".

Well there is a push to make robots more associable, and much of human speech recognition relies on the mechanical articulation of sound.

But I personally find it fallacious to try and replicate truly human speech, as much I find bipedal, humanoid robots to be somewhat of a pipe-dream fuelled by nostalgic science fiction.

As an extension of this I find it silly that we attempt to code AI that can fluently speak "English", or any other natural human language for that matter.

So my aim is to conceptualise a vocal tract, inspired by humans', but that is distinctly robotic, so as I may also construct a mother language for the robot that we have to learn, rather than the other way around.  I believe this will hopefully encourage self-actuating language and environmental labelling, as the AI fills in blanks by combining existing vocabulary extant within its mothertongue. 

So here is a hasty diagram of how I imagine the workings of such a voxtrac:



any questions, thoughts, etc. are much appreciated ^_^

Thanks for reading

Offline cyberjeff

  • Full Member
  • ***
  • Posts: 114
  • Helpful? 7
Re: Robotic Vocal Tract Concept
« Reply #1 on: September 11, 2015, 07:10:54 PM »
Greetings all!

I figured I would post this in misc as it is merely theoretical.

I'm sure many of us have seen the demonstrations of a robotic vocal tract designed in East Asia, constructed from some kind of silicone[?] and manipulated by a series of metal supports.  Mechanically speaking it is wonderful, but the speech it produces is eery and remains artificial [uncanny valley]... and slow... -- why bother then? why bother to replicate actual speech rather than synthesised "speech".


This was completely unknown to me:

http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2279150/

https://en.wikipedia.org/wiki/Linear_predictive_coding

http://www.takanishi.mech.waseda.ac.jp/top/research/voice/

Animals of all kinds share vocabularies where another of it's species has little trouble knowing what the other is thinking. The  vocal part of these is determined by the equipment at hand. Parrots, cats, birds and bullfrogs all have something to say about   their present situation. With a bullfrog, it seems to have but one thing to say.

Now, lets say an alien society has sent a robotic cat to find out about earthlings. He would have his language and learn some of yours. And vice versa.

I realize I have driven this off on tangent, but I live on tangents.

To construct this mechanically is quite a feat, certainly easier with software. But there is a charm to something that is real rather than replicated.

Quote

Well there is a push to make robots more associable, and much of human speech recognition relies on the mechanical articulation of sound.

But I personally find it fallacious to try and replicate truly human speech, as much I find bipedal, humanoid robots to be somewhat of a pipe-dream fuelled by nostalgic science fiction.

As an extension of this I find it silly that we attempt to code AI that can fluently speak "English", or any other natural human language for that matter.

So my aim is to conceptualise a vocal tract, inspired by humans', but that is distinctly robotic, so as I may also construct a mother language for the robot that we have to learn, rather than the other way around.  I believe this will hopefully encourage self-actuating language and environmental labelling, as the AI fills in blanks by combining existing vocabulary extant within its mothertongue. 

So here is a hasty diagram of how I imagine the workings of such a voxtrac:



any questions, thoughts, etc. are much appreciated ^_^

Thanks for reading

Sociability is far more than a robot answering questions. It's interacting and making mistakes (and asking when it does not know) and so the same with  the human. Each has an opportunity to figure out the other.

I like your idea, I think it is difficult and rather unlikely to develop far. But as a simple mechanism it has great merit. I'd say run with it, but I'd also advise you to not take my advice.

BTW, I'm working on an alien robot cat, so this is not all just theoretical on part! Human speech has great complexity far more than is needed for a companion.

Offline SomeNorthernerTopic starter

  • Beginner
  • *
  • Posts: 4
  • Helpful? 2
Re: Robotic Vocal Tract Concept
« Reply #2 on: September 12, 2015, 07:27:51 AM »
Hey!

Thanks for the reply :)

Much of what you elaborate upon resonates with me; oh, and thankyou for supplying links to the kinds of machines I was referencing.

"Human speech has great complexity far more than is needed for a companion."

This is well said, and is essentially my design premise:  a language with a tiny vocabulary will suffice (100 [root] words or so; such languages exist and are perfectly functional - more complex words are simply formed via compounding) and a CVCV (consonant-vowel) syllable structure is 'binary' enough for an artificial voxtrac to be able to 'get its mouth around' :p

If you are genuinely working on an "alien robot cat" I should very much like to hear more ^_^

But yes, I believe that an actual mechanism that manipulates air similar to a human vocal tract (an automatic wind instrument if you will): but the main design idea is that "Humans learn language easier than robots" - after all, we are programmed too.  Thus Humans could learn a "robot" language.  This I believe is superior and more realistic than trying to teach AI all the nuance of a natural Human language, and by giving AI its own mothertongue, I believe we would replicate much of what we regard as "intelligence".  Indeed, at the moment Ai seems "stupid" simply because it struggles to believably reproduce accurate "English", but is this not bigotry on our behalf? - the same kind of bigotry that convinces people that non-native speakers of their mothertongue are "stupid" when they make grammatical mistakes?

" The  vocal part of these is determined by the equipment at hand."

Nailed it ;) -- if Robots had their own vocal apparatus, that was derivative of our own, I feel we would be far more ready to accept them and converse with them - I suppose it feels more 'organic' to us - and thus, more 'trustworthy'.

"To construct this mechanically is quite a feat, certainly easier with software. But there is a charm to something that is real rather than replicated."

Indeed: I am no engineer sadly, and am solely a theorist. But, if I develop my idea substantially, I'm sure I could find a team of fellow students to help me on the project.  I mean, I have a rough idea of how such a mechanism might work, even what components it would have, but not how to assemble of power or tune.

I would too be reliant on software for the "brain" of the device; I doubt there exists any kind of reactive engineering for sound in the way that I might need it (like there is for self-stabilising bipedals for instance) - but this is no problem.

"Sociability is far more than a robot answering questions. It's interacting and making mistakes (and asking when it does not know) and so the same with  the human. Each has an opportunity to figure out the other."

I think that this is [hopefully] the direction non-military robotics teams are heading in.  I saw a few TEDTalks (I will reference if requested) where leading designers basically say things like "robots need to be allowed to make mistakes if they are to have character" - and interestingly enough, in tests where two machines identical in everyway but their AI, and were made to do tasks alongside humans. The humans responded more positively to the "imperfect" AI - as if it had a personality, and was thus more "intelligent" - which I believe is promising for our particular school of thought.

Thanks again for the reply,

I look forward to continuing the chat!

Offline cyberjeff

  • Full Member
  • ***
  • Posts: 114
  • Helpful? 7
Re: Robotic Vocal Tract Concept
« Reply #3 on: September 12, 2015, 11:47:02 AM »
Hey!

Thanks for the reply :)

Much of what you elaborate upon resonates with me; oh, and thankyou for supplying links to the kinds of machines I was referencing.

"Human speech has great complexity far more than is needed for a companion."

This is well said, and is essentially my design premise:  a language with a tiny vocabulary will suffice (100 [root] words or so; such languages exist and are perfectly functional - more complex words are simply formed via compounding) and a CVCV (consonant-vowel) syllable structure is 'binary' enough for an artificial voxtrac to be able to 'get its mouth around' :p

If you are genuinely working on an "alien robot cat" I should very much like to hear more ^_^


The distance between our language and Arabic is enormous, and yet it is the common language of a diverse population. It is an abjad with no uppercase, no punctuation and no vowels. Completely condensed and yet can be written so beautifully.

How is that even possible? And yet it works magnificently.

On another note.

I've had occasion to think about sleeve valves for low pressure air:

https://en.wikipedia.org/wiki/Sleeve_valve

Such valves could probably be made cheaply from 2 mil mylar inside a slice of pvc.  Cut them to the shapes you need and modulate as needed.

And finally:

The CatBot.

I'm in the process of assembling this. With the resources at hand, he is sliced from a white pine 2x4 (9 oz, so light) and cut up on a borrowed scroll saw (french curves). The proportions are drawn from those of house cats and it will amble the way a cat would. It's in the geometry. The servos are fed quadratic beziers,  the return of the french curve. That seems like the way to do it, to me.

The mechanical brain is an Arduino DUE. Cheap, light and mucho processing power.

A Raspberry Pi 2 with a Kinect for vision. I am just now thinking about the vocal end and haven't hashed out the details. Just what should Rowl, the Intergalactic Catbot sound like? He'll have an inquisitive personality but be rather stupid of our ways. A 200 word vocabulary should be more than enough.

That's the plan, nearly through with the mechanics, and much of the DUE software.

We should collaborate some time.

Offline SomeNorthernerTopic starter

  • Beginner
  • *
  • Posts: 4
  • Helpful? 2
Re: Robotic Vocal Tract Concept
« Reply #4 on: September 12, 2015, 12:40:25 PM »
Your sleeve valve suggestion intrigues me - i shall research further.

Your cat sounds fascinating indeed! - i like the notion of using the mechanics found in nature already as method of engineering; if aint broke...

Have you any pictures to share?

Sure, if you need help on the language (deciding whether it is just a simplified english or something totally new) - I can be of service; i'm something of a linguist by trade ;p

Offline cyberjeff

  • Full Member
  • ***
  • Posts: 114
  • Helpful? 7
Re: Robotic Vocal Tract Concept
« Reply #5 on: September 13, 2015, 05:06:34 AM »
Your sleeve valve suggestion intrigues me - i shall research further.

Your cat sounds fascinating indeed! - i like the notion of using the mechanics found in nature already as method of engineering; if aint broke...


Nature works because  the geometry works.  I have a partial understanding of the physics behind it. It's moving coordinates with a shifting center of gravity throws me for a loop. I can almost model this with Barycentric coordinates which  is based on vertixes of triangles.

A wide variety of creatures, birds, cats, horses and people have the same set of bones but with proportions very different. The proportions have adapted to that creatures niche, whether it needs speed or agility, or in our case (or a birds) bipedalism. There are only a handful of robots that follow  that architecture but they include the most successful, including Boston Dynamics work with it's Big Dog.

It's not so much the widely quoted number of degrees of freedom, but what they are and how they work.

For example. In our case and cats's too the lower arm dominates the vertical axis and the upper arm dominates "x" distance. Ie, elbow to lift and shoulder to stretch out.  This vastly simplifies the tasks at hand.

Quote
Have you any pictures to share?


Not yet. I have not taken any, for one.

For another I'm not yet positive this will all work.

 I nearly have this assembled, as parts drift in mostly from China. Today I got the servo hubs I need for shoulders and hips. I am still waiting on a prototype for my Arduino DUE to clean up the wiring. I  thought I could use my existing but when it came time to assemble I saw I was wrong. There's been a slew of similar.

Because this is different than the way most robots look, I really desire to put up the working model rather than the one that I say should work. Within the week, I think.

Quote

Sure, if you need help on the language (deciding whether it is just a simplified english or something totally new) - I can be of service; i'm something of a linguist by trade ;p

Good, I like  that.  I'd like a simplified English with a sentence structure that is easy to deconstruct and a pointer at some software that can do that and run on some flavor of *nix. I have my own primitive code but there is a lot, off the shelf, that can run on the Pi or similar:

https://wolfpaulus.com/journal/embedded/raspberrypi2-sr/

I don't have a good grasp on this yet.

Offline cyberjeff

  • Full Member
  • ***
  • Posts: 114
  • Helpful? 7
Re: Robotic Vocal Tract Concept
« Reply #6 on: September 14, 2015, 04:24:53 AM »
Quote

Sure, if you need help on the language (deciding whether it is just a simplified english or something totally new) - I can be of service; i'm something of a linguist by trade ;p

I don't know enough yet to know what I want. I want a language of it's own at some point. It can have, and should have it's own phonology. A fusional language like English is too complex, I think. Perhaps much of your original point?

I've used dictionaries and word lists before, but for either spell checking or searching for related keywords or something similar.

I have no interest in doing work that  has already been done. It doesn't need to know the time or the weather or how to look up something or, in general, be a servant. That path is well worn.

If you have a fairly small vocabulary you can plot distances between individual words but more importantly, how an object acts on a subject, how an adjective acts on a noun or how an adverb acts on a verb. I would call those concepts and descriptions, not knowing the correct terminology. If you store these as barycentric coordinates you can find similar concepts or descriptions and that is at the root of intelligence and also personality as there is variation on not only on what  source material is but how it is initially scored and whether that ever gets adjusted.

That is as far as I have it, and that may be a dead end path...

I'll leave this lie for the time being.



Offline SomeNorthernerTopic starter

  • Beginner
  • *
  • Posts: 4
  • Helpful? 2
Re: Robotic Vocal Tract Concept
« Reply #7 on: September 17, 2015, 06:40:39 AM »
Hi again,

You should check out Toki Pona; it's an artificial language of about 120 words and uses particles to indicate lexical roles (much like japanese) - and it is of the consonant-vowel structure I propose as superior for our purposes.

If 120 is to minimal for you, I suggest you look at "Swadesh Lists"; google it*.  they are wordlists composed of 100-1000 words for concepts that are most frequent in human languages.

regards minimalism however, I would personally be quite philosophical about what a vocabulary actually needs. Consider how many idiomatic phrases there are that exist from our pre-scientific roots: despite knowing the Sun is a stationary body, we still call it a "sunrise" - it could more accurately be a "fullface" (for midday; as it is we who are 'fully facing the sun') ;D; consider too the words "dark" and "cold" - neither truly exists, but are both the state of absence of heat and light respectively - they are default states and therefore having words for them is somewhat redundant, as by specifying there is 'no light/heat' for instance, their existence is implied, meaning we only need words for 'light' and 'heat' plus a negator.

what I am proposing is that you endeavour to be economic in your word choice: for the sake of efficiency and practicality; and ease of programming.

As you have already affirmed, a large and complex language is beyond requirement for a simple robotic companion.

I too would not worry ever about how one writes a language, and simply focus on the spoken aspect: contrary to popular belief, written language is merely just a means of representing spoken language; much to my peers despair i have often declared that English could be just as well depicted by "Chinese" characters.  Thus, unless you intend your robot to write (which I suspect you don't, as that is crazy difficult to implement xD; not least for a cat ;P) - dont get hung up on aesthetics.

You know the software available better than I; if you inform me of the capabilities and limitations of speech producing software, I can certainly help you construct a bespoke phonology.

*[EDIT] Here is the wiktionary archive of Swadesh Lists; choose a language and size and you're away - note that much of what humans regard as distinct concepts are things like natural phenomena (fire, wind, etc.); human anatomy (hands; heads.  I believe this is a most important category for human-robot relations - something Toki Pona struggles to convey accurately); and 'person' (first, second, etc; I personally favour a speaker-listener model).

https://en.wiktionary.org/wiki/Appendix:Swadesh_lists
« Last Edit: September 17, 2015, 06:47:49 AM by SomeNortherner »

Offline cyberjeff

  • Full Member
  • ***
  • Posts: 114
  • Helpful? 7
Re: Robotic Vocal Tract Concept
« Reply #8 on: September 17, 2015, 04:26:11 PM »
Hello,

Hi again,

You should check out Toki Pona; it's an artificial language of about 120 words and uses particles to indicate lexical roles (much like japanese) - and it is of the consonant-vowel structure I propose as superior for our purposes.


This looks about right. I'll tinker with it.

My understanding of linguistics is still rather low, but I am mulling over a few things...

I think that having a simple common vocabulary with an understandable  grammar is the key. As long as each party to a conversation understands the other, each can have their own dialect, or even their own words. Imitating the others dialect seldom enhances the understandability and often just irritates the other party.

When you have two parties that are learning to understand each other, each can look to other to see if the other party looks lost, almost has it, or is sure. Small gestures is all it takes. Robots are not so good at that, but they can do other things. I have on order a color changing LED ribbon. Emotions can be translated to colors and pulsations. Thus, a confused robot may flash orange.  For that matter visual phonemes could be built and the language completely visual, a  different take on "Close Encounters of the Third Kind".

 I see that there are different kinds of consonants. I've been trying to map some of these to sounds my cats actually make. Some are easy, a hiss is a fricative. And cats certainly have "plosive"  sounds. With such a small vocabulary I can build in as much onomatopoeia  as possible. People talk to their animals in much that way.

Quote

If 120 is to minimal for you, I suggest you look at "Swadesh Lists"; google it*.  they are wordlists composed of 100-1000 words for concepts that are most frequent in human languages.

regards minimalism however, I would personally be quite philosophical about what a vocabulary actually needs. Consider how many idiomatic phrases there are that exist from our pre-scientific roots: despite knowing the Sun is a stationary body, we still call it a "sunrise" - it could more accurately be a "fullface" (for midday; as it is we who are 'fully facing the sun') ;D; consider too the words "dark" and "cold" - neither truly exists, but are both the state of absence of heat and light respectively - they are default states and therefore having words for them is somewhat redundant, as by specifying there is 'no light/heat' for instance, their existence is implied, meaning we only need words for 'light' and 'heat' plus a negator.

what I am proposing is that you endeavour to be economic in your word choice: for the sake of efficiency and practicality; and ease of programming.

As you have already affirmed, a large and complex language is beyond requirement for a simple robotic companion.

I too would not worry ever about how one writes a language, and simply focus on the spoken aspect: contrary to popular belief, written language is merely just a means of representing spoken language; much to my peers despair i have often declared that English could be just as well depicted by "Chinese" characters.  Thus, unless you intend your robot to write (which I suspect you don't, as that is crazy difficult to implement xD; not least for a cat ;P) - dont get hung up on aesthetics.


The written language came late to Arabic. The Quran started verbal and it remains that way, with schools to memorize the sound of it.

With that said, a robot can easily show the written word as well as images. Both go toward learning.

Quote

You know the software available better than I; if you inform me of the capabilities and limitations of speech producing software, I can certainly help you construct a bespoke phonology.

*[EDIT] Here is the wiktionary archive of Swadesh Lists; choose a language and size and you're away - note that much of what humans regard as distinct concepts are things like natural phenomena (fire, wind, etc.); human anatomy (hands; heads.  I believe this is a most important category for human-robot relations - something Toki Pona struggles to convey accurately); and 'person' (first, second, etc; I personally favour a speaker-listener model).

https://en.wiktionary.org/wiki/Appendix:Swadesh_lists

This will probably live on a Raspberry Pi, pocketsphinx is the choice for speech to text.

Going the other way:

https://en.wikipedia.org/wiki/ESpeak

ESpeak has two models including the Klatt synthesizer. That shapes sounds rather than building them out of sine waves. It appears to me to be along the same lines as the robotic vocal track which started this thread!

As far as the speaker/listener, I  was unaware of this term but it seems like the perfect model for this kind of emotional artificial intelligence. Here it is being used to get couples to talk and relate to each other:

http://marriagemissions.com/the-speaker-listener-technique/

That would dovetail with the primitive pattern AI I had in mind and would build on it's self as had more experiences (and conversations) it could draw on.

It's all a bit of a mystery still but I love the one off phonology! Thanks for your input.

I look at this CatBot as a test bed for ideas. I have no ideal end result in mind, I'm driven by the interactive possibilities. Most robots are introverts, I think they should be extroverts.

 


Get Your Ad Here