The ETAPE corpus for the evaluation of speech-based TV content processing in the French languageReport as inadecuate




The ETAPE corpus for the evaluation of speech-based TV content processing in the French language - Download this document for free, or read online. Document in PDF available to download.

1 TEXMEX - Multimedia content-based indexing IRISA - Institut de Recherche en Informatique et Systèmes Aléatoires, Inria Rennes – Bretagne Atlantique 2 Traitement du Langage parlé LIMSI - Laboratoire d-Informatique pour la Mécanique et les Sciences de l-Ingénieur 3 ELDA - Evaluations and Language resources Distribution Agency 4 DGA - Délégation générale de l-armement 5 LNE- INM - Laboratoire National de Métrologie et d-Essais - Institut National de Métrologie

Abstract : The paper presents a comprehensive overview of existing data for the evaluation of spoken content processing in a multimedia framework for the French language. We focus on the ETAPE corpus which will be made publicly available by ELDA at the end of 2012, after completion of the evaluation, and recall existing resources resulting from previous evaluation campaigns. The ETAPE corpus consists of 30 hours of TV and radio broadcasts, selected to cover a wide variety of topics and speaking styles, emphasizing spontaneous speech and multiple speaker areas.

Keywords : evaluation speech recognition speaker diarization named entity





Author: Guillaume Gravier - Gilles Adda - Niklas Paulson - Matthieu Carré - Aude Giraudel - Olivier Galibert -

Source: https://hal.archives-ouvertes.fr/



DOWNLOAD PDF




Related documents