Speech Synthesis
The voice of the story teller
For this project, Acapela offers a new computer voice designed for reading stories. The figure below illustrates how a computer voice is created.
This software gives several options such as correcting errors, improving the audio result by adding sounds, as well as creating characters and associating a voice to each character. In a first step we chose Antoine's voice (cf. Corpus).

The upper part of the figure illustrates how the text is analyzed in order to reach the various linguistic base units to gather for reading from a database.
On the lower part, the elements of the database by linguistic base units can be seen: an actor registers a large corpus of texts (the recording can last several weeks). The voice is then split into elementary blocks used for the synthesis. The project partners selected together the story steller among several candidates.
For the needs of the project, Acapela successfully optimized the text corpus to reduce the duration of the recording (to one week). This made it possible to record several types of voices with the same story teller: in addition to a neutral voice, the following voices were also recorded:
- happy voice
- sad voice
- projected voice
- close voice
The table below shows these corpora with figures
| Corpus | Sentences | Phonemes | Duration (sec.) |
| Neutral | 5742 | 94421 | 11032 |
| Happy | 1122 | 37319 | 3907 |
| Sad | 1033 | 34262 | 3834 |
| Projected | 1301 | 35692 | 4188 |
| Close | 1380 | 45705 | 4861 |
This allows to choose one of the voices according to the wanted expressiveness during the synthesis.
More elements specific to reading a story were added:
- Typical story expressions (Watch out, All's well that ends well, Once upon time, They were married and had many children, …)
- Specific sounds (crying, laughing, breathing, snoring, coughing, sneezing, yawning, panting, disgust, mouth noises, …)
In the specific case of a robot voice, a numeric modulation will be applied for adapting Antoine's voice to the personality.
Manual alteration
The computer-based text analysis (introduced at the Linguistic Aspects page) gives a first annotation of the text by automatically inserting vocal and gesture interpretation instructions. As this first automatic annotation can be incorrect or the robot or avatar developer may want to add instructions manually, Aldebaran and Acapela developed dedicated tools. Aldebaran developed the tool Narrateur (meaning "Narrator" or "Story Teller" in French) to add markups in the raw text. Acapela proposes the tool Virtual Story Teller, involved in a later step in the text processing.
Graphical interface of the application Virtual Story Teller by Acapela.