19 Dec 2022
Domain Adaptation can be classified into three types - supervised, semi-supervised, and unsupervised - and three methods - model-centric, data-centric, or hybrid.
19 Dec 2022
Machine learning and AI applications need data in order to work. And in order to get good results and output, the cleaner the data, the better.
19 Dec 2022
Text Summarization can be categorized under two types: Extraction and Abstraction. With the power of AI, summarization is becoming more popular and accessible.
7 Oct 2022
Synthetic parallel data generation by back-translation as a solution for the problem of translating low-resource languages and texts from low-resource domains.
3 Mar 2022
Natural Language Technologies are on the rise: making optimal use of NLT and its subcategories is crucial to remain up-to-date with the latest AI solutions
3 Jan 2022
What can word clouds driven by NLP tell you about your training datasets? Here is how we create word clouds on TAUS Data Marketplace.
2 Dec 2021
The next logical translation solution: Data Enhanced Machine Translation (DEMT)
1 Dec 2021
Which language data for AI trends you should expect to rise in 2022: expansion of multilingual AI data and models, more companies joining the data market, data diversity and lifelong learning machines.
18 Nov 2021
A thorough overview of the paper by six Google researchers: Data Cascades in High-Stakes AI with a focus on why data-centric AI matters.
4 Nov 2021
Explaining what Explainable AI (XAI) entails and diving into five major XAI techniques for Natural Language Processing (NLP).
4 Oct 2021
A brief definition of what training data is.
4 Oct 2021
Reasons why training data is important for AI and ML practices.
4 Oct 2021
A brief introduction to types of training data including structured, unstructured, and semi-structured data.
4 Oct 2021
Here are some pointers on how much training data do you need to train your ML models.
4 Oct 2021
Data cleaning and data anonymization are very critical in training ML models. Here are the reasons why.
4 Oct 2021
Training data can be sourced via synthetic data generation, public datasets, data marketplaces, and crowd-sourced platforms.
1 Oct 2021
Web scraping is a common way to generate parallel data, making use of the immense source of multilingual data offered on the web. Here is how to do web scraping.
7 Sep 2021
Definition and common use cases of intent recognition as an essential element of NLP.
19 Aug 2021
A tutorial on automatic domain classification with NLP: from data preprocessing to the training and evaluation of an artificial neural network.
7 Jun 2021
Understanding the popular subfield of NLP known as sentiment analysis in ML and AI including sentiment analysis definition, types and use cases.