The Speech Platform Prompt Text to speech Engine

The prompt text to speech engine is the element of the Speech Platform that obtains text input and creates speech output by concatenating recordings of words and phrases that correspond to the text input. The prompt engine keeps the recordings it uses on disk and indexes them in a single or multiple prompt database files.

Requests for capturing text input and producing speech output are produced by functions in the speech application code. A prompt text to speech engine request typically includes prompt engine markup language (PEML) to state the database or databases that the prompt text to speech engine should utilize, and the text input for which the prompt engine should create a speech output. This engine can search many prompt databases concurrently. Databases are kept on disk, and comprise an index of available word and phrase segments and the corresponding audio data.

When text is forwarded to the prompt text to speech engine, it searches the relevant databases for prerecorded sections that match the text. Note that what the engine typically searches are the indices of the databases loaded into memory, and that the voice data itself is read from the disk as required.

If the Speech Platform prompt text to speech engine is incapable of constructing a segment using the set of recorded words and phrases in the prompt database, the engine produces a "fallback" text-to-speech (TTS) engine.

The Speech Platform TTS Engine

The TTS engine (also known as a text-to-speech voice) is the element that synthesizes speech output by:

1.  Analyzing the words of the text into phonemes.

2.  Breaking down the input for occurrences of text that need conversion to symbols, such as numbers, currency amounts, and punctuation (a method known as text normalization, or TN).

3.  Creating the digital audio for playback.

 

TTS engines in general can utilize one of two methods:

• Formant TTS

• Concatenative TTS

 

Generally, developers think of using TTS when:

• Audio recordings are too large to keep on disk, or are too expensive to record.

• The developer cannot predict what reactions users will need from the application (such as instructions to read e-mail over the telephone)

• The number of substitute responses required makes recording and keeping prompts unmanageable or too expensive.

• The user would rather use audible feedback or notification from the application.

 

Major Applications for Text-to-Speech Synthesis

The typical uses of TTS technology depend on the application category. Some of the applications that are good candidates for employing TTS include:

Telephony

Text-to-speech synthesis has a crucial role in telephony applications. Since telephony applications have no visual interface, using TTS is a perfect method for confirming customer selections. TTS can also be the recommended method for delivering information that users asked for.

Data Entry

Developers can employ a text-to-speech reading of data values as users insert data in a spreadsheet or database application as a method of verifying correct entry of data that can be tiresome to verify.

Games and Edutainment

Text-to-speech synthesis enables the characters in an application to talk to the user instead of purely displaying speech balloons. Though it is possible to use digital recordings of speech, using TTS rather than recordings can be recommended in certain cases.

 

In addition, TTS can be typically useful for application prototyping. In some cases TTS may even be the only practical option.

 

More Technical Articles
Text-to-Speech and Voice Recognition Videos
Text-to-Speech Homepage

text to speech engine

Poll

Have you ever used Text-to-Speech technology?: