Speech Processing

Learning to write speech controlled systems

Danny Staple’s Learn robotics Programming includes a chapter on the Mycroft speech recognition system, writing code for a Raspberry Pi interface to control a robot through talking with Python.

Definition

The ability to understand(to some limited degree) natural language aurally (through hearing).

Sci Fi

As recognising speech is an ability taken for granted by humans, it is also taken for granted that an advanced robot will do so too. In light of this, most Sci Fi robots and computers possess some level of speech processing, often far in advance of contemporary technology.

While the bot may only be able to respond in beeps and whistles, they often have the ability to follow whole conversations. some simpler ones simply obey a command.

Limitations

While it is rapidly improving, most speech recognition still prefer abruptly spoken commands. They can rarely follow a full conversation, as the higher level AI for that kind of system just does not exist.

They must be trained, and can fail to recognise speech when spoken with a different accent or dialect, or when someone has a cold.

Current Technology

The Google, Siri and Alexa systems have started to make robots that can listen and talk commonplace, with their abilities and speed growing rapidly.

The question with these is how much of the processing is done locally on the device, and how much involves sending your words to remote computers in the cloud, which has speed and privacy concerns, as well as robustness if it relies on a connection to the internet.

There are a number of single chip devices on the market which offer speech recognition ability. They are often a little limited, but for a device which requires simple one or two word commands, they more than suffice.

Universities and Research groups are constantly striving to improve the current offerings, and remove many of the known limitations.

Uses

It is used in the hobbyist robot community with various Raspberry pi offerings, with the Learn Robotics Programming book showing a way to add this into your own projects.

Some offices use them for dictation, although a well trained typist will be excessively faster.

Mobile phones commonly have basic recognition of names for a lookup, though this is quite an unreliable system.

Perhaps a more interesting use at the moment is in Subtitling. A current technique known as respeaking means a human operator listens to a program, and the respeaks the dialogue, without inflection, dialect or accents into a capture program. It is a fledgling technology, and the final goal is to take the human operator out of the loop. This use will certainly start to push the bounds of context inference.

In-car navigation systems allow a destination to be spoken to it for it to route to.

Our Future

Certainly for domestic robots, it is worth having this feature, and as the technology gets better, there will come a point where you can instruct your robot vocally - having the robot respond in all manner of ways (a simply beep pattern would suffice).

Industrial robots in dangerous environments should have enough to at least recognise the STOP command yelled at them. this would enhance safety a great deal.