Voice Recognition

Voice recognition (also some people know it by the name automatic voice recognition or computer voice recognition) turns spoken words to text. The expression is typically used to refer to when the recognition system is taught a particular speaker voice - as in the case for the majority of desktop recognition software; recognizing the person speaking in order to better recognize their spoken words. Voice recognition is an extensive expression that, today, can recognize almost anybody's speech – for example, a call-centre system intended to recognize many voices. 

There is a wide variety of applications however, the typical voice recognition applications are voice dialing (e.g., "Call home"), call routing (e.g., "I would like to make a collect call"), home appliance control and content-based spoken audio search (e.g., find a podcast where certain words were said), simple data entry (e.g., inserting a credit card number), research of structured documents (e.g., a radiology statement), speech-to-text processing (e.g., word processors or emails), and in aircraft cockpits (typically called Direct Voice Input).

History

The first “Voice recognition” recognizer was invented in 1952 and comprised a device for the recognition of separate spoken digits. Another basic device was the IBM Shoebox, demonstrated at the 1964 New York World's Fair.

One of the most important areas for the commercial application of voice recognition in the United States has been health care and in specifically the work of the medical transcriptionist (MT). As stated by industry experts, at its launch, voice recognition (VR) was sold as a way to entirely eliminate transcription rather than make the transcription process more effective, hence it was not admitted. It was also the case that VR at that time was often technically poor. Additionally, to be used efficiently, it needed changes to the methods physicians worked and documented clinical encounters, which many if not all were unwilling to do. The biggest restriction to voice recognition automating transcription, however, is considered the software. The nature of narrative dictation is very interpretive and typically needs judgment that may be provided by a real human but not yet by an automated system. Another restriction has been the huge amount of time needed by the user and/or system provider to train the software.

A distinction in AVR is typically made between "artificial syntax systems" which are typically domain-specific and "natural language processing" which is typically language-specific. Each of these kinds of application offers its own specific objectives and challenges.

AVR in the aspect of telephony is now commonplace and in the aspect of computer gaming and simulation is becoming more common. In spite of the high level of integration with word processing in common personal computing, however, AVR in the aspect of document production has not seen the estimated increases in use.

The development of mobile processor speeds made possible the speech-enabled Symbian and Windows Mobile Smartphones. Speech is used usually as a part of User Interface, for creating pre-defined or custom speech instructions. Leading software manufacturers in this area are: Microsoft Corporation (Microsoft Voice Command), Nuance Communications (Nuance Voice Control), Vito Technology (VITO Voice2Go), Speereo Software (Speereo Voice Translator) and SVOX.

 

More Technical Articles
Text-to-Speech and Voice Recognition Videos
Text-to-Speech Homepage

Voice Recognition

Poll

Have you ever used Text-to-Speech technology?: