TTS Software Interface Options

 If you are a more technical-type looking for some text-to-speech software (TTS software) interfaces, read on for four good options.

Speech Application Programming Interface

SAPI is an interface between applications and speech technology engines for both speech synthesis and speech recognition. The interface permits multiple applications to share the usable speech resources on a computer without any requirement to program the speech engine itself. Speech synthesis and recognition applications typically need plenty of processing resources and with SAPI technique lots of these resources may be saved. The user of an application can also select the employed synthesizer as long as it supports SAPI. Nowadays SAPIs are offered through several environments like MS-SAPI and JSAPI. 

SAPI TTS software part comprises three interfaces. The voice text interface providing methods to launch, pause, continue fast forward, rewind, and stop the TTS engine during speech. The attribute interface permits access to manage the basic behavior of the TTS engine, such as the employed audio device, the playback speed and switching the speech on and off. With some TTS systems the attribute interface may also be applied to choose the speaking style among predefined list of voices, such as female, male, child, or alien. Finally the dialog interface which is used to adjust and retrieve information related to the TTS engine i.e. to identify the TTS engine and modify the pronunciation lexicon.

Infovox

This is a formant synthesis built application and the speech is clear but seems to have some Swedish accent. The system has five variant built-in voices i.e. male, female, and child. The user can also generate and store individual voices. Aspiration and accent features are also tunable. Individual pronunciation lexicons can be built for each language. For words which do not abide to the pronunciation rules, such as foreign names, the system has certain pronunciation lexicon where the user can keep them. The speech rate can be tuned up to 400 words per minute. The text may be created also word by word or even letter by letter. Also DTMF tones can be produced for telephony applications.

DECTalk

Digital Equipment Corporation (DEC) has also extended traditions with speech synthesizers. The DECtalk system is mainly derived from MITalk and Klattalk. The current TTS software is offered in American English, German and Spanish and offers nine different voice individuals, four male, four female and one child. The current system has may be one of the best designed text preprocessing and pronunciation attributes. The system is capable to pronounce most names properly, e-mail and URL addresses and allows an adapted pronunciation dictionary. It has also punctuation attributes for pauses, pitch and loudness and the voice control instructions may be included in a text file for use by DECtalk software applications. The speaking rate is tunable between 75 to 650 words per minute. Also the production of single tones and DTMF signals for telephony applications is offered.

Bell Labs Text-to-Speech

The current TTS software is offered for English, French, Spanish, Italian, German, Russian, Romanian, Chinese, and Japanese. Other languages are still under research. The development is concentrating mainly on American English language with several voices, but the system is multilingual where the software is similar for all languages, except English. Some language customized information is typically needed, which is stored externally in independent tables and attribute files.

The system has also excellent text-analysis facilities, as well as good word and proper name pronunciation, prosodic phrasing, accenting, seg

 

More Technical Articles
Text-to-Speech and Voice Recognition Videos
Text-to-Speech Homepage

tts software

Poll

Have you ever used Text-to-Speech technology?: