The AI scene of the 2010s was shaped by breakthroughs in vision-enabled technologies, from advanced image searches to computer vision systems for medical image analysis or for detecting defective parts in manufacturing and assembly. The 2020s, however, are foreseen to be all about natural language technologies and language-based AI tasks. NLP, NLG, NLQ, NLU… The list of abbreviations starting with NL (Natural Language) seems to grow each day. Regardless of the technology domain, it’s observed that natural language technologies will be in a field-shaping position in a variety of areas from business intelligence and healthcare to fintech.
Speech recognition is a complex mélange of linguistics, mathematics and statistics. Also known as speech-to-text, it attempts to identify spoken words to then process human speech into written format. To do so in the most natural and precise way, AI and ML are used to integrate grammar, syntax, structure, and composition of audio and voice signals to best understand & process human speech.
Amsterdam, June 8, 2022 – TAUS, a leading provider of human-powered language data for AI solutions, has published the TAUS DeMT™ Evaluation Report 2022. The report compares TAUS DeMT™ performance against available major machine translation engines in 8 language pairs for the eCommerce domain, 18 language pairs for the Medical/Pharma domain, and 4 language pairs for the Financial domain. This report’s findings are based on Polyglot Technology LLC’s independent analysis to benchmark the machine translation quality of DeMT™ relative to other major online machine translation providers.
AMSTERDAM, June 8, 2022 – TAUS, a leading provider of human-powered language data for AI solutions, announced today that it has been listed as a Representative Vendor in the Gartner Market Guide for AI-Enabled Translation Services 2022 report.
Audio transcription is a service that has been seeing growing demand in recent years to help businesses communicate with their stakeholders. The rise in demand for transcription comes from the shift from written content to other types of multimedia such as video or audio - which are increasingly common in day-to-day business activities. However, written reports are still necessary for an easy conveyance of information.
It doesn’t happen very often nowadays, but every now and then I still find in my inbox a great example of what is becoming a relic from the past: a spam email with cringy translation. Like everyone else, I’m certainly not too fond of spam, but the ones with horrendous translations do get my attention. The word-by-word translation is like a puzzle to me: I want to know if I can ‘reverse-translate’ it to its original phrasing.
Amsterdam, February 16, 2022 - TAUS, the one-stop language data shop, is pleased to announce that TAUS data products and data services are now available to users of Amazon Translate on the AWS Marketplace. This first step marks the beginning of continuous collaboration between TAUS and Amazon Translate. As of today, AWS customers can review and buy 30 bilingual corpora in the Ecommerce, Medical/pharmaceutical and Finance domains. The languages available are as follows:
Amsterdam, January 19, 2022 - TAUS, the one-stop language data shop established through deep knowledge of the language industry, globally sourced community talent, and in-house NLP expertise, launches a new service: Data-Enhanced Machine Translation (DEMT) on their Data Marketplace.
This is the third article in my series on Translation Economics of the 2020s. In the first article published in Multilingual, I sketched the evolution of the translation industry driven by technological breakthroughs from an economic perspective. In the second article, Reconfiguring the Translation Ecosystem, I laid out the emerging new business models and ended with the observation that new smarter models still need to be invented. This is where I will now pick up the thread and introduce you to the next logical translation solution. I call it: Data-Enhanced Machine Translation.
Technologies such as Natural Language Processing (NLP), deep learning and computer vision have been thriving since data science has become well-established as a field of study and expertise. These developments have paved the way for the rise of machine learning (ML) to achieve the concept of artificial intelligence (AI). The transformative effects of these new technologies continue to be observed in our daily lives at a gradually increasing pace as we move into 2022.
Google Research team has recently published a paper titled Data Cascades in High-Stakes AI. The six authors of this article, Nithya Sambasivan, Shivani Kapania, Hannah Highfill, Diana Akrong, Praveen Paritosh, and Lora Aroyo, bring light to a profound pattern of data undervaluation in high-stake fields where AI models are critical and prevalent. They conclude that although there is great interest in creating MT and ML models, there is less interest in doing the actual data work.
Amsterdam, 11 November 2021- TAUS launches a new course on their eLearning Platform on language data management. The course gives a comprehensive overview of the language data for the AI sector. It comprises five modules covering extensively various language data for AI applications and language data services, the language data market and pricing, data bias, and ethics, as well as privacy and copyright concerns.
As AI is becoming more prominent in high-stakes industries like healthcare, education, construction, environment, autonomous machines, and law enforcement, we are finding an increased need to trust the decision-making process. These predictions often need to be extremely accurate, e.g. critical life or death situations in healthcare. Due to the critical and direct impact AI is having on our day-to-day lives, decision-makers need more insight and visibility into the mechanics of AI systems and the prediction process. Presently, often only technical experts such as data scientists or engineers understand the backend processes and algorithms being used, like the highly complex deep neural networks. The lack of interpretability has shown to be a means of disconnect between technical and non-technical practitioners. In an effort to make these AI systems more transparent, the field of Explainable AI (XAI) came into existence.
Amsterdam, 7 October 2021
A machine learning algorithm uses data to learn and make decisions. The algorithm develops confidence in its decisions by understanding the underlying patterns, relationships, and structures within a training dataset. The higher quality the training data is, the better the algorithm will perform. So what is training data exactly?
Training data is perhaps one of the most integral pieces of machine learning and artificial intelligence. Without it, machine learning and artificial intelligence would be impossible. Models would not be able to learn, make predictions, or extract useful information without learning from training data. It’s safe to say that training data is the backbone of machine learning and artificial intelligence.
Training data is used in three primary types of machine learning: supervised, unsupervised, and semi-supervised learning. In supervised learning, the training data must be labeled. This allows the model to learn a mapping from the label to its associated features. In unsupervised learning, labels are not required in the training set. Unsupervised machine learning models look for underlying structures in the features of the training set to make generalized groupings or predictions. A semi-supervised training dataset will have a mix of both unlabeled and labeled features, used in semi-supervised learning problems.