Interaction between humans and computers has greatly intensified as we sail through the twenty-first century. The ability to access computers and the internet has become increasingly important to completely immerse oneself in the economic, political, and social aspects of the world. However, not everyone has access to this technology. The idea of the "digital divide" refers to the growing gap between the underprivileged members of society who do not have access to computers or the internet; and those who do have access. Education and learning lie at the heart of these issues and their solutions. Learning can only happen through available resources in the language one can understand.
Adéṣínà Ayẹni is a Nigerian Yorùbá language and culture advocate, anthropological researcher and translator. His lifelong efforts are a great testimony to his passion for bridging the cultural and linguistic digital divide for his native language, Yorùbá.
Yorùbá is a language spoken in West Africa, most prominently Southwestern Nigeria. It is spoken by the ethnic Yorùbá people. The number of Yorùbá speakers is estimated at between 45 and 55 million.” Despite so many speakers, it’s still a low-resource language because of eurocentrism and colonialism,” adds Adéṣínà .
Over the years, he advocated for digitally underrepresented languages such as Yorùbá to be considered by global service providers and institutions. He volunteered at the United Nations to provide Yorùbá translations to help his community access significant information. He provided Yorùbá translations for several TedTalk episodes and sent numerous emails to companies to request their products to be localized for Yorùbá as well. “Why can’t the kids in a Nigerian village have a Yorùbá option next to English and French in their video game console?” asks Adéṣínà , “There was no response from most of those global companies. Data Marketplace is therefore a great opportunity to put digitally rare languages like Yorùbá more in front of the technology providers to consider”.
Some might think that if a language is available in Google Translate it’s not so much subject to digital discrimination. Even though Yorùbá is among the available languages in Google Translate, “it’s not perfect and needs a lot of work,” says Adéṣínà, “and it alone doesn’t mean much while many other African and/or indigenous languages are often forgotten in AI applications and online platforms as opposed to popular languages.” For instance, the automatic translation option on Twitter still recognizes Yorùbá as Vietnamese. He rightfully states that his community also wants to take part in online discussions and benefit from the latest technologies, such as using a self-driving car that takes commands in Yorùbá.
Adéṣínà heard of the Data Marketplace through an online conference and thought of this opportunity as an unmissable one. All the Yorùbá translation data he had offered to companies and institutions without any compensation over the years could now be put on a global marketplace at a price point that he can set himself side by side with Spanish, German, and Italian datasets. “My main motivation has always been contributing to the digital survival and representation of my native language Yorùbá. It’s just a plus that I can make money while serving this higher purpose,” says Adéṣínà .
Currently, he sells a dataset of about 2000 segments in the English-Yoruba language pair in the science, technology, and medicine domains. 80% of his datasets include his own translations and 20% include translations done by other translators on his online Yorùbá platforms.
As for the data privacy and ownership concerns, he manages to bypass such issues by working with open-source data. “For instance, I translated a great deal on climate change using English resources available on Wikipedia,” says Adéṣínà. “In Nigeria, people don’t know about climate change as much as the rest of the world so I am providing some kind of public service to educate my community through my translations.” By making them available for commercial use, he hopes more and more services and products will be available in the language that his community best understands.
Going forward, he is excited about the prospect of more resources being available in Yorùbá. “Most high-resource languages have been online for a long time. For languages like Yorùbá, Data Marketplace is a unique opportunity to compete for the good of our digitally underrepresented communities” says Adéṣínà “We have stories to tell, stories the world has never seen before, give us the opportunity to tell them and the world would be a better place for all.”
His datasets are available for purchase on the Data Marketplace, waiting for AI and ML services providers to acquire them and train their systems to function in Yorùbá, creating an equal level of access to the underprivileged communities such as Yorùbá.
Şölen is the Head of Digital Marketing at TAUS where she leads digital growth strategies with a focus on generating compelling results via search engine optimization, effective inbound content and social media with over seven years of experience in related fields. She holds BAs in Translation Studies and Brand Communication from Istanbul University in addition to an MA in European Studies: Identity and Integration from the University of Amsterdam. After gaining experience as a transcreator for marketing content, she worked in business development for a mobile app and content marketing before joining TAUS in 2017. She believes in keeping up with modern digital trends and the power of engaging content. She also writes regularly for the TAUS Blog/Reports and manages several social media accounts she created on topics of personal interest with over 100K followers.
TAUS Data Marketplace has brought new opportunities to everyone, from individual linguists and LSPs to data and publishing companies, to leverage and monetize their content. The key to being a part of the surging trend of language data for AI is the successful conversion of available multilingual content into language data that is directly usable for AI model training.
Globalization lies at the core of the contemporary age. Knowing this, it’d be ideal to think that the enterprise of academic research would capitalize on contributions from researchers globally and also wants these contributions to be accessible by all students and academics all around the world. Yet, language barriers still present a considerable stumbling block when it comes to the global circulation of academic findings. English is the dominant language in the academic world, which means that researchers around the world are under pressure to publish their findings in English and academic students are expected to understand and digest all of these significant findings in English. This overall contributes to the creation of an academic monoculture.
The total volume of data created worldwide is expected to reach 149 zettabytes by 2045. Therefore, capitalizing on data has become as important as human, financial, or any other capital. Data as capital has gained even more importance now that data-trained systems start to dominate all imaginable aspects of the world we live in.