These pages are no longer up-to-date.
The activities are now carried on in the HLT research unit


Comprehensive Automatic SPEech Recognition



Casper is actively involved in the TC-STAR (Technology and Corpora for Speech to Speech Translation) (

The TC-STAR project, financed by European Commission within the Sixth Program, is envisaged as a long-term effort to advance research in all core technologies for Speech-to-Speech Translation (SST). SST technology is a combination of Automatic Speech Recognition (ASR), Spoken Language Translation (SLT) and Text to Speech (TTS) (speech synthesis). The objectives of the project are ambitious: making a breakthrough in SST that significantly reduces the gap between human and machine translation performance.

The project targets a selection of unconstrained conversational speech domains?speeches and broadcast news?and three languages: European English, European Spanish, and Mandarin Chinese. Accurate translation of unrestricted speech is well beyond the capability of today's state-of-the-art research systems. Therefore, advances are needed to improve the state-of the-art technologies for speech recognition and speech translation.

Long-term research goals of the project are:

  • Effective SLT of unrestricted conversational speech on large domains of discourse.
  • Speech recognition able to perform reliably under varying speaking styles, recording conditions, and for different user communities, and able to adapt in a transparent manner to the particular conditions.
  • Effective integration of speech recognition and translation into a unique statistically sound framework. A major challenge will be the effective extension of current statistical machine translation models to account for multiple sentence hypotheses produced by the speech recognition algorithm.
  • General expressive speech synthesis imitating the human voice. In order to overcome the barriers of reading and talking style and languages, a breakthrough in speech synthesis requires the development of new models for prosody, emotions and for expressive speech in general.

Updated: August 4, 2006
by myself