Speech Synthesis

The voice of the story teller


For this project, Acapela offers a new computer voice designed for reading stories. The figure below illustrates how a computer voice is created.

This software gives several options such as correcting errors, improving the audio result by adding sounds, as well as creating characters and associating a voice to each character. In a first step we chose Antoine's voice (cf. Corpus).

The upper part of the figure illustrates how the text is analyzed in order to reach the various linguistic base units to gather for reading from a database.

On the lower part, the elements of the database by linguistic base units can be seen: an actor registers a large corpus of texts (the recording can last several weeks). The voice is then split into elementary blocks used for the synthesis. The project partners selected together the story steller among several candidates.

For the needs of the project, Acapela successfully optimized the text corpus to reduce the duration of the recording (to one week). This made it possible to record several types of voices with the same story teller: in addition to a neutral voice, the following voices were also recorded:

The table below shows these corpora with figures

Corpus Sentences Phonemes Duration (sec.)
Neutral 5742 94421 11032
Happy 1122 37319 3907
Sad 1033 34262 3834
Projected 1301 35692 4188
Close 1380 45705 4861

This allows to choose one of the voices according to the wanted expressiveness during the synthesis.

More elements specific to reading a story were added:

In the specific case of a robot voice, a numeric modulation will be applied for adapting Antoine's voice to the personality.

Manual alteration


The computer-based text analysis (introduced at the Linguistic Aspects page) gives a first annotation of the text by automatically inserting vocal and gesture interpretation instructions. As this first automatic annotation can be incorrect or the robot or avatar developer may want to add instructions manually, Aldebaran and Acapela developed dedicated tools. Aldebaran developed the tool Narrateur (meaning "Narrator" or "Story Teller" in French) to add markups in the raw text. Acapela proposes the tool Virtual Story Teller, involved in a later step in the text processing.

Graphical interface of the application Virtual Story Teller by Acapela.