The TAUS Asia Conference in Beijing on March 22-23 is shaping up as exciting. Here’s a sneak preview from my own vantage point.
First and most significant, the meeting will host a milestone in speech translation: for the first time anywhere, real-time automatic on-screen transcription and on-device multilingual text translation will be provided for all conference speakers. (Human interpreters for English and Mandarin will also be on the job.) Microsoft’s Presentation Translator, an add-on to Powerpoint, will power the experiment. The software represents a considerable development effort, supervised by translation chief Chris Wendt. Of particular interest for speech translation is the TrueText component, designed to handle the disfluencies of spontaneous speech – repetitions, hesitation syllables, stutters, etc. TrueText revises and cleans the preliminary speech recognition result for each utterance before displaying the regularized text on-screen as subtitles and then handing it off to machine translation. The translated text then streams to audience members on their own devices. The Presentation Translator also contains a built-in training tool, which can pre-scan the slides and speaker notes of a Powerpoint set to familiarize itself with the vocabulary. Speech recognition error rate reductions of as much as 30% are claimed. I'll be handling that training, along with the setup of wireless mics, etc., for the Beijing presentations.
As my second contribution, I'll be continuing my advocacy for TAUS collection of speech as well as text data. I'll reprise my San Jose talk on the topic, departing from the uncontroversial recognition that big data has been a decisive enabler in the separate development of machine translation and speech recognition. I'll go on to make the case that big (and good quality) data will be equally important in developing the combination of these technologies – speech translation – and that TAUS can play an important role in coordinating data collection and sharing. The hope is to foster a virtuous circle, in which speech translation data can improve speech translation, which can then produce more data. I'll also argue the importance of correction data – naturally enough, since my own company has concentrated on verification and correction of speech translation. I'll consider the economic impact of speech and speech translation data: Who will own it? How will it eventually affect employment? I'll touch on related ethical issues (privacy, security, prejudice), and on the use cases which might be lucrative or important enough to justify TAUS’s data collection efforts. Lastly, I'll consider the types of speech translation data that can be collected (monolingual vs. bilingual; spontaneous vs. scripted; etc.).
My last contribution will address a topic especially dear to my heart: the role of semantics in machine translation, as it has been played in the past and as I think it may be played going forward. For this talk, the departure point will be the argument by John Searle and other influential theorists that machine translation and other natural language processing programs can never appreciate meaning in the deepest sense – in other words, that they can never really exhibit semantics at all. I'll concede that MT and many other NLP systems have made steady and impressive progress while use of explicit semantic processing has undergone a rise and fall; and I'll recognize that researchers haven’t yet agreed on the meaning of meaning. Still, I'll point to renewed interest in semantic representation and processing. And – my central point – I also foresee gradual adoption throughout natural language processing of semantic approaches grounded upon audio, visual, or other sensor-based input. These perceptually-grounded semantic approaches are distinct from most current perception-free methods.
From a philosophical viewpoint, I'll risk the suggestion that perceptually-grounded approaches to MT and other NLP can display intentionality, and consequently can provide the foundation for truly meaningful semantics. I think that perceptual grounding of this sort depends on the ability to learn and associate categories, and that this ability, in turn, is a necessary – though not sufficient – condition for higher cognitive processes. To lay the groundwork for these somewhat presumptuous claims, I'll be surveying the role of semantics in machine translation until now in terms of three paradigms: rule-based, statistical, and neural MT. Within the rule-based paradigm, we'll revisit direct, transfer-based, and interlingua-based variants.
Well, that ought to be enough for one conference!
To hear more from Mark Seligman and be a part of the live automatic interpretation experience while hearing what the Asian translation/localization market is up to, join us in Beijing at the TAUS Asia Conference!
Dr. Mark Seligman is founder, President, and CEO of Spoken Translation, Inc. His early research concerned automatic generation of multi-paragraph discourses, inheritance-based grammars, and automatic grammar induction. During the 1980’s, he was the founding software trainer at IntelliCorp, Inc., a forefront developer of artificial intelligence programming tools. His research associations include ATR Institute International near Kyoto, where he studied numerous aspects of speech-to-speech translation; GETA (the Groupe d’Étude pour la Traduction Automatique) at the Université Joseph Fourier in Grenoble, France; and DFKI (Deutsches Forschungszentrum für Künstliche Intelligenz) in Saarbrücken, Germany. In the late 1990s’, he was Publications Manager at Inxight Software, Inc., commercializing linguistic and visualization programs developed at PARC. In 1997 and 1998, in cooperation with CompuServe, Inc., he organized the first speech translation system demonstrating broad coverage with acceptable quality. He established Spoken Translation, Inc. in 2002.