Corpus

Several corpora were gathered for the project needs:


A corpus of stories read by Antoine.

The aim of this corpus is to analyze the way a professional story teller uses their voice to make narration more expressive. The LIMSI developped a tool for annotating an audio corpus for systematic analysis of the 12 stories read "correctly".

The annotation starts by tagging the signal using the following graphical interface:

Then the tool allows to align the text with the linguistic analysis performed first on the raw text.

The following results were obtained due to the prosodic analysis:

The aim of this corpus is to achieve, in the stories read "correctly":

This annotated corpus is a good start to elaborate the first rules of prosodic text annotation.

Audio-visual corpus

The LIMSI gathered the video corpus (named "ContAct", the contraction of Acted-Story in French) with actors telling the story "The 3 little pieces of the night". The aim is to create a reference base for the expressive behaviors associated to texts and to give a lexicon of the expressed gestures usable by NAO and Greta in a context close to the target application.

Each of the 6 actors told the story twice (in French) while being filmed with two different angle shots (straight on and profile view).

The annotation of the videos was made with the tool ANVIL developed by Kipp.

The gestures were annotated in terms of categories, hands used and lexeme.

The table below shows the individual variations between actors.

Actor 1 Actor 2 Actor 3
Number of gestures 163 82 94
Duration of gestures 5’41 3’33 4’17
Gestures/min 19 13 13
% of two hands 72 56 90
% of right hand only 21 16 10
% of left hand only 7 28 0