Human Language Project contributes to the digital representation of Igbo


Participation in the language data projects offered by the Human Language Project leads to new professional opportunities.

TAUS kick-started the Human Language Project back in 2019 in efforts to bridge the language data gap for digitally less-represented languages and dialects. We work with contributors from around the world to generate, annotate, post-edit, evaluate text and speech data for various MT, ML and AI applications. There is no limit to the language coverage and since its very beginnings, the HLP has had a strong focus on long-tail languages that are often spoken by millions of people, but are less represented in the language services market. We have worked on data creation in languages such as Nepali, Sinhalese, Kurmanji, to name a few, on speech collection in Bambara, and most recently, on translation projects in the Nigerian languages Igbo, Hausa and Yoruba. 

For some of the contributors, participation in the HLP led to new professional opportunities. One of them is Onyekachi Raphael Ogbu, a native Igbo language speaker living in Nigeria. He recently graduated from the university and currently works as an Igbo language translator, transcriptionist, voiceover artist and online tutor. 

The HLP Igbo project was his first paid translation assignment. “I decided to sign up since I have a passion for the Igbo language and its representation. Igbo is a low-resourced language; thus, there needs to be more representation of the language in the mainstream. This led me to contribute to the HLP project.”

Asked to describe his HLP experience, Onyekachi responds: “I loved how the content of the tasks were from different industries, allowing translators to be flexible and showcase their skills from diverse specializations. Every new task was a new adventure and the word count was adequate with enough time to work on them. It was a bit challenging, though. Igbo being a low-resourced language demanded intensive research and neologism to provide the correct, or at least close-to-correct, translation of the segments.”

This HLP project was the biggest and highest-paid project that Onyekachi has worked on so far. “Igbo language translation projects don’t come by easily and so there is often little to work on. And due to the high number of translators, most clients offer little pay knowing that translators would still fight for the bones they toss. The HLP project ensured that translators were listened to and everyone involved in the project was treated equally.”

Onyekachi adds, “Actually, I was able to get a new laptop for myself and start a clothing business from what I earned from this project. After working on the HLP projects, I decided to take language services seriously by developing my translation skills further with other services like transcription, voiceover and others.” 

According to UNESCO, 29 Nigerian minor languages have become extinct, while another 29 minor languages are in danger of extinction. UNESCO World Atlas of Languages lists Igbo as ‘potentially vulnerable’.

“Colonization, rapid globalization and migration is forcing our language into the shadows”, argues Onyekachi. “Digital representation is very important. You’re encouraged to use your language more when you see that it matters. Increased representation would help increase the development and generation of resources for the growth of the language. Taking part in the HLP project felt like I was contributing to something really big, something that would benefit my people of all generations. I also felt empowered.”

In March, Microsoft added 13 new African languages, including Igbo, Yoruba and Hausa, to Microsoft Translator. Machine translation to and from Igbo is supported by a few more MT providers. Onyekachi hopes to see more initiatives that focus on strengthening the digital representation of African languages. “Initiatives that involve long-tail languages help to facilitate language development. HLP, and other platforms, can begin by making their websites available in Nigerian languages and organizing programs (webinars, events, conversations) related to native Nigerian languages and other African languages too. It’s important that platforms involved in languages reflect language diversity in their resources and initiatives too.”

On 21- 25 August, Adéṣínà Ayẹni, Nigerian language activist and the HLP Ambassador in Nigeria, organizes the  Nigerian Languages Data and Sci-Tech Conference. It will discuss the development of the Nigerian languages, and inclusion in Sci-tech and will also feature a TAUS speaker. Contact Adéṣínà to learn more about the event and contribute to the digital representation of Nigerian languages.


Dace is a product and operations management professional with 15+ years of experience in the localization industry. Over the past 7 years, she has taken on various roles at TAUS ranging from account management to product and operations management. Since 2020 she is a member of the Executive Team and leads the strategic planning and business operations of a team of 20+ employees. She holds a Bachelor’s degree in Translation and Interpreting and a Master’s degree in Social and Cultural Anthropology.

Related Articles
Interview with Kathleen Kownacki: Bridging Language Divides as a Local Data Collector.
Lexicala has emerged from the publishing world, as a provider of quality lexicographic content for leading dictionary publishers worldwide, and joined the TAUS Data Marketplace as a language data seller.
See how a medical doctor trying to break the linguistic monoculture in academia as a student ended up creating a dataset of medical glossaries and translation memories that brings about a 90% BLEU score improvement for the English and Arabic language pair.