Datafication in Europe: A TAUS Webinar

4 October, 2017, 05:00 - 06:00 pm CEST

Overview

The datafication of translation started with the unreasonable effectiveness of data article written by the Google scientists Fernando Pereira, Peter Norvig and Alon Halevy in 2009, or perhaps even earlier when the TAUS Data Cloud was launched in 2008. Translation learns from data. In those early days indeed there was no better data than ‘more data’. The English-French Google machine translation engine was trained by a corpus of 100 billion words. Now, with the new generation of Neural MT, very large quantities of data belong to the past. The pursuit of high-quality in-domain translation data will challenge the protectionists and create opportunities for pirates.

Data become an obsession, either way, in the translation industry. And it does not stop with translation memory data. We need speech data too. And we want to have the edits and annotations on human as well as machine translations, plus the attributes for content types, industry sectors, translators’ locations, the process applied, the technology used. And why not correlate it with the weather reports, the social graphs of the translators and their eye movement tracking? There is always something we can learn from new data.

The internet giants had a competitive edge in translation data, but they spoiled it by polluting their own fishing grounds with machine translations. Now, the hunt is open for new data marketplaces. The European Commission is investing in the Connecting European Facility. But watch out also for the greenfield translation data ventures in China, or perhaps closer to home: the TAUS Data Cloud.

Agenda

  1. The story of the translation industry in 2022: Datafication of Translation. Quantity or Quality? By Jaap van der Meer, director of TAUS
  2. Datafication in Europe: Meeting the needs for multilingual data at the EU, national, and industry levels. By Rihards Kalnins (Tilde)
    Over the years Tilde has helped the European Commission, EU public administrations, and private companies in various industries to identify, gather, process, and leverage multilingual data for developing highly customized language technology solutions. What are the data requirements and expectations of these organizations? How can their needs be met – and their expectations managed (perhaps a greater challenge!) – by language technology providers? What tools, skills, workflows, and processes are necessary to meet these requirements? Tilde will answer these and other questions in this brief presentation.
  3. The European Language Resource Coordination: a Europe-wide data collection action for CEF eTranslation. By Maria Giagkou (ILSP)
    eTranslation is a Digital Service Infrastructure of the Connecting Europe Facility, helping overcome one of the main challenges that European and national public administrations are facing today: the exchange of information across language barriers. In order to fine-tune eTranslation to the needs of national administrations and to enable multilingualism of public digital services, the European Commission has initiated an unprecedented data collection action, the European Language Resource Coordination. The presentation will go through the methodology adopted by the ELRC consortium for mobilizing the public sector in CEF-affiliated countries and for identifying, collecting and managing language resources for CEF eTranslation.
  4. Q&A with panelists
  5. Next steps: TAUS reports and User Groups
  6. Question & answers

Speakers

Maria Giagkou | ILSP

Dr. Maria Giagkou is a Corpus Linguist. Her research interests mainly focus on the fields of Quantitative Linguistics, Natural Language Processing, and Technology-Enhanced Learning. Maria is a research associate at the Institute for Language and Speech Processing, “Athena” Research Center, Athens, Greece. She is currently acting as Site Project Manager for the European Language Resource Coordination action.


Rihards Kalnins | Tilde

Rihards Kalnins is the Head of MT Solutions at Tilde, a leading European language technology and localization services company that specializes in custom machine translation. At Tilde, Kalnins manages key accounts and strategic partnerships, steers the MT product development roadmap, and coordinates the implementation of custom MT solutions for global customers. He is currently overseeing development of a Neural MT service for the 2017-2018 EU Council Presidencies and is helping the European Commission extend its automated translation platform CEF eTranslation. A former Fulbright scholar with a degree in Philosophy, Kalnins has written about language and multilingual policy for EurActiv.com and The Guardian.


Event Properties

Event Date 04-10-2017 5:00 pm
Event End Date 04-10-2017 6:00 pm
Capacity Unlimited
Individual Price Free
Created By Anne-Maj van der Meer
Registration link https://attendee.gotowebinar.com/register/7704567530860435971
Secondary text
Overview

The datafication of translation started with the unreasonable effectiveness of data article written by the Google scientists Fernando Pereira, Peter Norvig and Alon Halevy in 2009, or perhaps even earlier when the TAUS Data Cloud was launched in 2008. Translation learns from data. In those early days indeed there was no better data than ‘more data’. The English-French Google machine translation engine was trained by a corpus of 100 billion words. Now, with the new generation of Neural MT, very large quantities of data belong to the past. The pursuit of high-quality in-domain translation data will challenge the protectionists and create opportunities for pirates.

Data become an obsession, either way, in the translation industry. And it does not stop with translation memory data. We need speech data too. And we want to have the edits and annotations on human as well as machine translations, plus the attributes for content types, industry sectors, translators’ locations, the process applied, the technology used. And why not correlate it with the weather reports, the social graphs of the translators and their eye movement tracking? There is always something we can learn from new data.

The internet giants had a competitive edge in translation data, but they spoiled it by polluting their own fishing grounds with machine translations. Now, the hunt is open for new data marketplaces. The European Commission is investing in the Connecting European Facility. But watch out also for the greenfield translation data ventures in China, or perhaps closer to home: the TAUS Data Cloud.

Agenda
  1. The story of the translation industry in 2022: Datafication of Translation. Quantity or Quality? By Jaap van der Meer, director of TAUS
  2. Datafication in Europe: Meeting the needs for multilingual data at the EU, national, and industry levels. By Rihards Kalnins (Tilde)
    Over the years Tilde has helped the European Commission, EU public administrations, and private companies in various industries to identify, gather, process, and leverage multilingual data for developing highly customized language technology solutions. What are the data requirements and expectations of these organizations? How can their needs be met – and their expectations managed (perhaps a greater challenge!) – by language technology providers? What tools, skills, workflows, and processes are necessary to meet these requirements? Tilde will answer these and other questions in this brief presentation.
  3. The European Language Resource Coordination: a Europe-wide data collection action for CEF eTranslation. By Maria Giagkou (ILSP)
    eTranslation is a Digital Service Infrastructure of the Connecting Europe Facility, helping overcome one of the main challenges that European and national public administrations are facing today: the exchange of information across language barriers. In order to fine-tune eTranslation to the needs of national administrations and to enable multilingualism of public digital services, the European Commission has initiated an unprecedented data collection action, the European Language Resource Coordination. The presentation will go through the methodology adopted by the ELRC consortium for mobilizing the public sector in CEF-affiliated countries and for identifying, collecting and managing language resources for CEF eTranslation.
  4. Q&A with panelists
  5. Next steps: TAUS reports and User Groups
  6. Question & answers
Speakers (10725, 19684)

Event Properties

Event Date 04-10-2017 5:00 pm
Event End Date 04-10-2017 6:00 pm
Capacity Unlimited
Individual Price Free
Created By Anne-Maj van der Meer
Registration link https://attendee.gotowebinar.com/register/7704567530860435971
Secondary text
Overview

The datafication of translation started with the unreasonable effectiveness of data article written by the Google scientists Fernando Pereira, Peter Norvig and Alon Halevy in 2009, or perhaps even earlier when the TAUS Data Cloud was launched in 2008. Translation learns from data. In those early days indeed there was no better data than ‘more data’. The English-French Google machine translation engine was trained by a corpus of 100 billion words. Now, with the new generation of Neural MT, very large quantities of data belong to the past. The pursuit of high-quality in-domain translation data will challenge the protectionists and create opportunities for pirates.

Data become an obsession, either way, in the translation industry. And it does not stop with translation memory data. We need speech data too. And we want to have the edits and annotations on human as well as machine translations, plus the attributes for content types, industry sectors, translators’ locations, the process applied, the technology used. And why not correlate it with the weather reports, the social graphs of the translators and their eye movement tracking? There is always something we can learn from new data.

The internet giants had a competitive edge in translation data, but they spoiled it by polluting their own fishing grounds with machine translations. Now, the hunt is open for new data marketplaces. The European Commission is investing in the Connecting European Facility. But watch out also for the greenfield translation data ventures in China, or perhaps closer to home: the TAUS Data Cloud.

Agenda
  1. The story of the translation industry in 2022: Datafication of Translation. Quantity or Quality? By Jaap van der Meer, director of TAUS
  2. Datafication in Europe: Meeting the needs for multilingual data at the EU, national, and industry levels. By Rihards Kalnins (Tilde)
    Over the years Tilde has helped the European Commission, EU public administrations, and private companies in various industries to identify, gather, process, and leverage multilingual data for developing highly customized language technology solutions. What are the data requirements and expectations of these organizations? How can their needs be met – and their expectations managed (perhaps a greater challenge!) – by language technology providers? What tools, skills, workflows, and processes are necessary to meet these requirements? Tilde will answer these and other questions in this brief presentation.
  3. The European Language Resource Coordination: a Europe-wide data collection action for CEF eTranslation. By Maria Giagkou (ILSP)
    eTranslation is a Digital Service Infrastructure of the Connecting Europe Facility, helping overcome one of the main challenges that European and national public administrations are facing today: the exchange of information across language barriers. In order to fine-tune eTranslation to the needs of national administrations and to enable multilingualism of public digital services, the European Commission has initiated an unprecedented data collection action, the European Language Resource Coordination. The presentation will go through the methodology adopted by the ELRC consortium for mobilizing the public sector in CEF-affiliated countries and for identifying, collecting and managing language resources for CEF eTranslation.
  4. Q&A with panelists
  5. Next steps: TAUS reports and User Groups
  6. Question & answers
Speakers (10725, 19684)
Share this event: