icons-social-media-facebook-circleicons-social-media-twitter-circleicons-social-media-linked-in-circle
If AI is a Fancy New Car, Data is the Fuel that Feeds it!
icons-action-calendar01/05/2019
8 minute read
In a world full of machines and robots, will human interaction still matter? Will the evolving workforce of the 21st century inevitably contain a higher percentage of employees who do not work in a central office? Does data ownership contribute to a more fragmented or a more consolidated industry? We've gathered the answers from the TAUS Global Content Summit Tokyo.

Japan in springtime is a truly magical place. Every year thousands of visitors come to Japan to celebrate the iconic sakura season, when the whole country lits up in the rosy glow of cherry blossoms. Strolling around the petal-covered streets of Tokyo, many of us don’t realize that sakura has a deep symbolic meaning in the Japanese culture. In Japan, the cherry blossom represents the fragility and the beauty of life. Carpe diem, seize the moment while it lasts.

Naturally, there would be no better time for TAUS to return to Tokyo and mobilize the local translation and localization community with a slogan “Nunc Est Tempus (now it’s time) to fix the translation ecosystem”. Opening the day, TAUS director Jaap van der Meer pointed out: “We are too process-oriented and inward-looking. Now that we are more aware of the importance of reaching the end-user in the right way and in the right language, it is our task to re-engineer our systems and workflows to become more solution-driven”.

Presenting Microsoft’s endeavours to reach the human parity in machine translation, Chris Wendt explained that the gap between the quality of human and machine translation will always remain, however, this gap is steadily decreasing and therefore it’s important to establish objective, reliable methods to measure the progress. Human evaluation remains the most effective MT evaluation method, albeit expensive and time-consuming. With Microsoft as the automatic translation provider on Twitter, Chris’ team has the benefit (and burden) of receiving immediate end-user feedback on the quality of automatically translated tweets. It allows them to correct potentially catastrophic mistakes, caused by, for example, the lack of real-world knowledge of the neural MT engines.

In a world full of machines and robots, will human interaction still matter? Definitely, argued Eiji Sano from SAP. His presentation was a plea for an open, fear-free vendor-customer relationship. With content volumes drastically growing year by year, SAP Language Services (SLS) is looking for a new collaboration model with their translation vendors. “Our current infrastructure and workflow cannot support it,'' pointed out Eiji. SLS works with more than 120 suppliers in 41 countries and translates the SAP content in 61 languages and locales. To manage and ‘fix’ this complex translation ecosystem, SLS has concluded they don’t only depend on ‘real’ APIs (application programming interface) that enable smooth connections between various business tools, but also on symbolic APIs between people who work together in harmony. Everyone knows that building excellent customer experience (CX) is an important element of a successful business, said Eiji. In the same fashion, he suggested, we should consider partner and supplier experience (PX and SX) to be equal indicators of the maturity of every well-established company.

Gig-Economy Becomes a Norm

The evolving workforce of the 21st century will inevitably contain a higher percentage of employees, or contractors, who do not work in a central office, but distributed all over the world. What’s more, “changes in the world of work through digitization are not limited to the ways in which work is executed, but have also created new forms of work organization.” We see more and more businesses responding to this trend and some of them were present at the TAUS Summit in Tokyo, namely SDL, Flitto and Knowledge on Demand.

Customer experience has changed radically in the last ten years, argued Jim Saunders, Chief Product Officer at SDL. Intelligent content is crucial for all the phases of the new customer journey, from pre-purchase to post-purchase. Language is the most important dimension of personalization, argued Jim. It is core to the human experience, and core to the consumer experience. Today more than ever the consumer expects the content to be delivered in their language, be it a user manual or a post on Facebook. This new content reality strains the traditional content production supply chain, as it requires short turnaround times while the volumes rise exponentially, distributed in small chunks, which complicates the process even further.

The new SDL Language Cloud, to be released later this year, will be a highly automated, AI-driven content management platform, that will pull together three core areas of the company - content technology, language technology and linguistic AI. The linguistic AI will influence each step of the content supply chain - it will connect content repositories to ensure that content is created, used and re-used intelligently. Machines will help manage and accelerate the translation process, while the AI component will transform the content delivery, ensuring that it is personalized and optimized. Every step in the workflow will follow the “machine-first, human-assisted” approach.

AI is data-hungry and needs more and more data to guarantee its quality. Crowdsourcing offers the promise of being the fastest and most effective way to collect massive data. However, how can you assure the quality of crowd-sourced data? Simon Lee from Flitto showed how the company has gamified the translation and evaluation process to distill good quality (mostly colloquial) data. The company provides text-to-text, image-to-text and speech-to-text content production features, which allows translating various content types almost instantly using the crowdsourcing platform. Flitto app users earn points for every new translation that they produce or verify and get paid when the translated content is used by the requester.

We have heard of cases where translators working on highly-sensitive, confidential content are asked to work in secure office rooms with no access to Internet or their phones. Can a similar level of security be achieved when translators work remotely?

While most of us still remain somewhat sceptical about the usage of blockchain technology in the translation ecosystem, some companies are already using it, particularly in the field of supply management. Tomoki Miyashita, CEO of Knowledge on Demand, talked about an experiment his team conducted in cooperation with Honyaku Center Inc. The goal of the project was to explore how blockchain can help prevent information leakage in distributed supply chains. As Tomoki explained, blockchain was utilized to ensure a proper usage of anti-virus software and to prevent content sharing. The translator receives a job assignment in encrypted form and decrypts it using a specific program. From that moment on, every activity on the decrypted content is logged by the blockchain technology and shared with the project initiator in case any suspicious activity gets detected. Interestingly, while some of the translators did express the concerns of being monitored, others did praise the new methodology, saying that it ensured they indeed worked in a secure environment and this way they could prove their trustworthiness to their clients.

AI and Data

As Andrew Joscelyne argues in a recent TAUS blog post, it is worth exploring “whether data ownership contributes to a more fragmented or a more consolidated industry”. At the Tokyo Summit, we heard three testimonials of new data collection and sharing initiatives, presented by NICT, Systran and Toyohashi University of Technology.

Eiichiro Sumita from NICT shared the results of a project conducted together with the Japanese government that aimed at gathering researchers from Japanese companies and establishing an advanced speech translation research and development center to improve the quality of Japanese machine translation. They intend to use the data donated by private companies and governmental bodies to improve the quality of NMT output. In a pilot run in cooperation with AstraZeneca, a pharmaceutical company, the group succeeded cutting the translation turnaround time by 50%.

Now that MT is more and more seen as a mainstream rather than a nice product, it is essential that resources are shared with the wider community, contended Satoshi Enoue from Systran. “The value is no longer in technology itself, but rather in the fact that it’s offered in a complete end-to-end solution”, he said. Systran MarketPlace, to be available to the public this summer, is an open online platform where language experts can build, share and sell their language models. The platform is based on the OpenNMT framework and integrated with Systran Pure Neural Server, which means that the marketplace users can benefit from state-of-the-art technology without any prior investment. Asked about the advantages of the new marketplace, Satoshi argued that building new language models from scratch is very expensive, and making use of the available data library can offer great savings. Systran also hopes to get more data in the long tail languages, for which the data availability is still not optimal.

AI Data Consortium is yet another data collection initiative, bringing together Japanese academia, government representatives, as well as businesses in various industries. Hitoshi Isahara from Toyohashi University of Technology explained that the consortium was created to help accelerate the development of AI through the smooth distribution of data, open innovation and resolving social issues. The main activities of the consortium include construction of intellectual property and contract models to enable smooth data distribution, building a cloud-infrastructure for diverse data distribution, and promoting the implementation and utilization of data distribution foundations in society and enterprises.

Data seems to be the hottest topic in our industry today. Can we consolidate our efforts, share the tricks and work together on building better language technology? 

Author
dace-dzeguze

Dace is a product and operations management professional with 15+ years of experience in the localization industry. Over the past 7 years, she has taken on various roles at TAUS ranging from account management to product and operations management. Since 2020 she is a member of the Executive Team and leads the strategic planning and business operations of a team of 20+ employees. She holds a Bachelor’s degree in Translation and Interpreting and a Master’s degree in Social and Cultural Anthropology.

Related Articles
icons-action-calendar21/07/2023
Embrace the GenAI revolution at the TAUS Annual Conference 2023 to thrive in the LLM era. Join this defining moment in the language & localization industry.
icons-action-calendar25/10/2022
Notes from the TAUS Massively Multilingual Conference 2022
icons-action-calendar16/08/2022
Join 100+ language data for AI industry leaders and gurus from around the world to brainstorm together, tackle common challenges and share innovative solutions