Milica is a marketing professional with over 10 years in the field. As TAUS Head of Product Marketing she manages the positioning and commercialization of TAUS data services and products, as well as the development of taus.net. Before joining TAUS in 2017, she worked in various roles at Booking.com, including localization management, project management, and content marketing. Milica holds two MAs in Dutch Language and Literature, from the University of Belgrade and Leiden University. She is passionate about continuously inventing new ways to teach languages.
The amount of content that is being produced worldwide and needs translation has been surging for years now. For the vast majority of players in the language industry, the COVID-19 pandemic didn’t slow things down but rather accelerated them. According to Nimdzi, the language services industry reached USD 55B in 2020 and is on a growth path expected to hit a whopping USD 73.6B by 2025. This will only be possible if the industry keeps on showing the same amount of resilience and adaptability, embracing new technologies and digital transformation.
People create vast amounts of data daily through many touchpoints in their usage of the IoT (Internet of Things) devices, often without even realizing it. Think of all the apps you use, messages you send, pictures you take and share. And these are just the byproducts of your leisure activities (that you might not necessarily want to use elsewhere). Now, try to imagine how much data you create as part of your work. If you are a translator or a language service provider, the amount of language data you generated over time while working on projects, building glossaries or translation memories, or even just translating your favorite song or paragraph from a book for fun, is immense. More importantly, even if some years have passed and those specific lines of text are no longer used for their original intended purpose, they still have value as training data for ML applications.
We live in an increasingly digitalized world, where more and more of our day-to-day decisions are being made by algorithms in our cars, phones, computers, and TVs. AI is touching on almost all aspects of our lives, from smart self-learning home systems and assistive devices to simple shopping apps suggesting us what to buy based on our previously observed behavior.
It’s been a long way since TAUS was founded in 2005 as a think tank with a mission to help businesses automate and innovate translations. The language industry and our customer base have evolved over the years, so we’ve evolved with them.
In November 2020, TAUS launched the Data Marketplace, an open data market where translators, language service providers, technology developers and enterprises come together to sell and buy language data for machine translation and other machine learning applications.
Online marketplaces are a popular business model in the digital era and some of them are the biggest and most valued tech companies today (think of eBay, Airbnb, Amazon, etc). They connect the sellers and buyers of certain types of goods and services and facilitate processes like search, transactions, ratings and more.
Automatic evaluation of Machine Translation (MT) output refers to the evaluation of translated content using automated metrics such as BLEU, NIST, METEOR, TER, CharacTER, and so on. Automated metrics emerged to address the need for objective, consistent, quick, and affordable assessment of MT output, as opposed to a human evaluation where translators or linguists are asked to evaluate segments manually.
Machine translation (MT) technology has been around for seven decades now. It is praised for its speed and cost-effectiveness, and its quality has gotten a lot better too since the arrival of neural machine translation (NMT). Higher throughput, quicker turnaround time, and the need to reduce overall cost are the main reasons for implementing MT in almost every case. Sounds great, right? Still, to understand how to implement machine translation to meet your translation needs, you should first consider a few factors.
You’ve probably heard before that there are around 7,000 spoken languages in the world, 4,000 with an established writing system. You might also know that only 23 languages account for more than half of the world’s population (view infographic by Alberto Lucas). But, did you know that a quarter of those 23, like Bengali, Tamil, Telugu, Urdu, Marathi, and Lahnda are not even in the top 40 languages on the internet when it comes to the availability of online content?
While the whole of Europe seemed to be taken by the GDPR (General Data Protection Regulation) frenzy in 2018, we welcomed it at TAUS. We always knew that data are the key to process improvements, quality control, and automation, but that it doesn’t have to come at the cost of misusing personal data.
The digital age and new technologies are giving intellectual property (copyright) and data ownership laws a hard time. Practically any idea, document or page that’s on the web can be copied instantaneously. Authored texts can be translated using Google Translate or some other machine translation (MT) technology. In fact, browser add-ons do that automatically. In the case of copying original text the breach is evident, but how about translation? While a translation of an original is automatically protected with copyright, does the fact that MT is used play a role?
Since its launch in 2012, TAUS DQF (Dynamic Quality Framework) has gone through a few rounds of changes. Originally a quality framework built around DQF-MQM error typology, it was upgraded to a translation performance analytics tool in 2015, with the release of the API and the DQF Dashboard. The API enabled CAT tools and translation management systems (TMS) to build plugins and connect directly to the DQF Dashboard, where users can see their reports in real-time. Today, DQF Dashboard is an integrated and robust tool offering reporting on various levels: segment level, project level, or aggregated as benchmarks (across organization and industry) and trends (over time).
Not so long ago, I was at the airport, listening to a voice announcing the gate change. I couldn’t help but notice that the sentences sounded somewhat unnatural as if parts of it were cut and pasted together. Shortly after, I had a chance to see backstage the technology used by a company providing natural voice announcements. I was surprised to learn that it consisted of cutting and pasting parts of pre-recorded sentences into new ones, with the guidance of native speakers. Not that I’m an expert in the field, but choosing for what seems a largely manual process appeared old-fashioned to me, in this age of data and technology.
Amsterdam, September 2019 - TAUS is delighted to announce that Unbabel joined the TAUS Partner Board. Together with eight of the largest stakeholders in the industry, Unbabel will navigate the TAUS roadmap and assist TAUS in fulfilling its mission to further expand the TAUS Language Data Network and strengthen its position as a source of data for business intelligence and translation automation.
Machine Translation (MT) is a technology that has been around for decades, but it’s only in recent years that it’s taking the stage, represented at every language-related conference and heightened with human parity claims.
The demand for post-editing of machine translation (PEMT) is growing, according to the 2018 report from Slator. But before post-editing becomes an inherent part of every production workflow, the industry should agree on the most effective methods to evaluate the quality of post-edited machine translation output.
We rarely think of linguistic quality when done right. Because then it is not necessarily seen as a differentiator or a success factor. It is when the mark gets missed that quality takes (back) the spotlight. This was one of the opening thoughts of the TAUS QE Summit by James Douglas (Microsoft), at Microsoft premises in Redmond.