Data is no longer just a good idea — now that so many businesses are using and monetizing it, driving business through data and adopting this new trend have become essential for keeping up with the competition. According to the technology adoption lifecycle model, the first group of people to use new technology is called “innovators,” followed by “early adopters”. Next come the “early majority” and “late majority”, and the last group to eventually adopt a new technology are called “laggards”. Being among the early adopters brings up the head start advantage.
A head start advantage can be simply defined as a company’s ability to be better off than its competitors as a result of being first to market in a new category. Although no advantage lasts forever, companies that succeed in building durable head start advantages tend to dominate their categories for many years, from a market’s infancy until well into its maturity.
The use of language data in AI and ML applications is a fast-growing technology that has almost passed the early adoption phase. The linguists and language service providers who see the business opportunity this new data realm has in store for them will have the advantage over the late majority and laggards.
TransLink, based in Russia and #84 in the global LSPs ranking, is one of the early adopters of the language data monetization business trend. “Having been among the earliest adopters of advanced technologies such as NMT, our team couldn’t have let the opportunity the Data Marketplace offers slip,” says Mikhail Gilin, Head of R&D at TransLink. They see their participation in the Data Marketplace as a business opportunity where they can monetize the language data that they generate as a company and they see their early adoption as a significant head start to grow in this new space. Additionally, they believe that by making their multilingual data available, they will be providing a great resource to be used in the research for the technologies benefitting the overall language industry.
On Data Marketplace, they have published five corpora in the news and sports domain from Russian to English, German, Spanish and French. These are bilingual translation data they have collected over the course of the 2018 World Cup for which TransLink was the main language service provider for written communications. Whenever data sharing is the topic of conversation, the ever-returning question is the data ownership and privacy question for many LSPs. “It’s significant to differentiate between the Translation Memories (TM) and the corpora we upload,” says Mikhail. “The datasets can be crawled or based on an existing TM and in that case the segments are altered beyond recognition.” He also highlights the fact that they ensure all numeric units are reduced in a way that they no longer represent any personally identifiable information and private information is removed or replaced before making that dataset available for purchase.
As the Data Marketplace continues to grow, so do the expectations of the existing and potential data sellers and buyers. “We have multilingual corpora, and we are considering uploading multilingual data into the Data Marketplace. That improvement could let LSPs form quality multilingual MT systems, both for production and research purposes,” says Mikhail and adds that “the latter is very important, too, because the collection of the exact same corpus in different languages allows for adequate research of the NMT algorithms. We see that as a great scientific opportunity for those doing linguistic research in the translation industry.”
By joining the Data Marketplace ahead of many other LSPs, TransLink manages to secure its share in the emerging new market for the language data for AI. “This platform is unique for the moment. And by the time there are any other similar platforms, Data Marketplace will provide a huge advantage both in terms of data processing expertise and overall volume of the uploaded corpora,” says Mikhail Gilin.
Şölen is the Head of Digital Marketing at TAUS where she leads digital growth strategies with a focus on generating compelling results via search engine optimization, effective inbound content and social media with over seven years of experience in related fields. She holds BAs in Translation Studies and Brand Communication from Istanbul University in addition to an MA in European Studies: Identity and Integration from the University of Amsterdam. After gaining experience as a transcreator for marketing content, she worked in business development for a mobile app and content marketing before joining TAUS in 2017. She believes in keeping up with modern digital trends and the power of engaging content. She also writes regularly for the TAUS Blog/Reports and manages several social media accounts she created on topics of personal interest with over 100K followers.
TAUS Data Marketplace has brought new opportunities to everyone, from individual linguists and LSPs to data and publishing companies, to leverage and monetize their content. The key to being a part of the surging trend of language data for AI is the successful conversion of available multilingual content into language data that is directly usable for AI model training.
Globalization lies at the core of the contemporary age. Knowing this, it’d be ideal to think that the enterprise of academic research would capitalize on contributions from researchers globally and also wants these contributions to be accessible by all students and academics all around the world. Yet, language barriers still present a considerable stumbling block when it comes to the global circulation of academic findings. English is the dominant language in the academic world, which means that researchers around the world are under pressure to publish their findings in English and academic students are expected to understand and digest all of these significant findings in English. This overall contributes to the creation of an academic monoculture.
The total volume of data created worldwide is expected to reach 149 zettabytes by 2045. Therefore, capitalizing on data has become as important as human, financial, or any other capital. Data as capital has gained even more importance now that data-trained systems start to dominate all imaginable aspects of the world we live in.