The ability to understand(to some limited degree) natural language aurally (through hearing).

Sci Fi

As recognising speech is an ability taken for granted by humans, it is also taken for granted that an advanced robot will do so too. In light of this, most Sci Fi robots and computers possess some level of speech processing, often far in advance of contemporary technology.

While the bot may only be able to respond in beeps and whistles, they often have the ability to follow whole conversations. some simpler ones simply obey a command.


In reality, most speech recognition will only deal with abruptly spoken commands. They rarely can follow a full conversation, as the higher level AI for that kind of system just does not exist.

They must be trained, and often fail to recognise speech when spoken with a different accent or dialect, or even when somebody is in a different mood. Having a cold may seriously affect a machines ability to recognise speech.

While speech patterns are thought to be unique, recognition technology is not advanced enough to distinguish individuals, even if it may only recognise a limited number of them in the first place.

Vocabularies are often limited. Some software will be able to learn new words, but only after trying (too hard) to make them fit with known words, often resulting in amusing misunderstandings. The interesting thing is that this is a situation that does also occur with humans - how often have you misheard song lyrics?

Current Technology

All of it uses pattern recognition and analysis techniques, as speech is merely a pattern of sounds. However, pattern recognition still has a long way to go in AI.

There are a number of single chip devices on the market which offer speech recognition ability. They are often a little limited, but for a device which requires simple one or two word commands, they more than suffice.

Universities and Research groups are constantly striving to improve the current offerings, and remove many of the known limitations. Desktop software solutions like IBM Via Voice have certainly improved as the software is developed, and the machines that run it become a great deal more powerful. It has began to move on from simple command recognition towards dictation and capture. Of course its ability to deal with this is still limited by its inability to infer context or deal with any new words.


It is used a lot now in the toy and hobbyist robot community with Cybot, and some of the more expensive toy robot offerings possessing the ability to deal with basic commands. They are often based upon the single chip recognition systems.

Some offices use them for dictation, although a well trained typist will be excessively faster.

Mobile phones commonly have basic recognition of names for a lookup, though this is quite an unreliable system.

Perhaps a more interesting use at the moment is in Subtitling. A current technique known as respeaking means a human operator listens to a program, and the respeaks the dialogue, without inflection, dialect or accents into a capture program. It is a fledgling technology, and the final goal is to take the human operator out of the loop. This use will certainly start to push the bounds of context inference.

Our Future

Certainly for domestic robots, it is worth having this feature, and as the technology gets better, there will come a point where you can instruct your robot vocally - having the robot respond in all manner of ways (a simply beep pattern would suffice).

Industrial robots in dangerous environments should have enough to at least recognise the STOP command yelled at them. this would enhance safety a great deal.

In-car navigation systems may be enhanced to allow a destination to be spoken to it for it to route to.