icons-action-calendar9 Nov 2023
Explore the crucial role of language data in training and fine-tuning LLMs and GenAI, ensuring high-quality, context-aware translations, fostering the symbiosis of human and machine in the localization sector.
icons-action-calendar19 Dec 2022
Domain Adaptation can be classified into three types - supervised, semi-supervised, and unsupervised - and three methods - model-centric, data-centric, or hybrid.
icons-action-calendar19 Dec 2022
Machine learning and AI applications need data in order to work. And in order to get good results and output, the cleaner the data, the better.
icons-action-calendar19 Dec 2022
Text Summarization can be categorized under two types: Extraction and Abstraction. With the power of AI, summarization is becoming more popular and accessible.
icons-action-calendar7 Oct 2022
Synthetic parallel data generation by back-translation as a solution for the problem of translating low-resource languages and texts from low-resource domains.
icons-action-calendar3 Mar 2022
Natural Language Technologies are on the rise: making optimal use of NLT and its subcategories is crucial to remain up-to-date with the latest AI solutions
icons-action-calendar3 Jan 2022
What can word clouds driven by NLP tell you about your training datasets? Here is how we create word clouds on TAUS Data Marketplace.
icons-action-calendar2 Dec 2021
The next logical translation solution: Data Enhanced Machine Translation (DEMT)
icons-action-calendar1 Dec 2021
Which language data for AI trends you should expect to rise in 2022: expansion of multilingual AI data and models, more companies joining the data market, data diversity and lifelong learning machines.
icons-action-calendar18 Nov 2021
A thorough overview of the paper by six Google researchers: Data Cascades in High-Stakes AI with a focus on why data-centric AI matters.
icons-action-calendar4 Nov 2021
Explaining what Explainable AI (XAI) entails and diving into five major XAI techniques for Natural Language Processing (NLP).
icons-action-calendar4 Oct 2021
A brief definition of what training data is.
icons-action-calendar4 Oct 2021
Reasons why training data is important for AI and ML practices.
icons-action-calendar4 Oct 2021
A brief introduction to types of training data including structured, unstructured, and semi-structured data.
icons-action-calendar4 Oct 2021
Here are some pointers on how much training data do you need to train your ML models.
icons-action-calendar4 Oct 2021
Data cleaning and data anonymization are very critical in training ML models. Here are the reasons why.
icons-action-calendar4 Oct 2021
Training data can be sourced via synthetic data generation, public datasets, data marketplaces, and crowd-sourced platforms.
icons-action-calendar1 Oct 2021
Web scraping is a common way to generate parallel data, making use of the immense source of multilingual data offered on the web. Here is how to do web scraping.
icons-action-calendar7 Sep 2021
Definition and common use cases of intent recognition as an essential element of NLP.
icons-action-calendar19 Aug 2021
A tutorial on automatic domain classification with NLP: from data preprocessing to the training and evaluation of an artificial neural network.