Text summarization is the process of taking pieces from a longer text to put together a (shorter) summary, in which the key elements and meaning of the text are preserved. Doing this manually is quite a time-consuming and strenuous task. However, powered by the data and AI revolution, the automation of this task is gaining more popularity.
We can distinguish two types of text summarization: extraction and abstraction.
Extractive summarization is the easiest approach to automatic text summarization as it requires little linguistic analysis. In extractive summarization, sentences are picked directly from the document, based on their scoring, and are then put together to form a coherent summary. With this method, important sections of the text are identified, then cropped out and stitched together to produce a condensed version of the full document or text.
Extractive summarization consists of three steps:
Abstractive summarization requires more advanced NLP techniques, as it aims to produce a summary through the interpretation of the text. In abstractive summarization, important information is incorporated by AI models to generate new and rephrased sentences, parts of which may not appear in the original text. These generated summaries are more linguistically fluent and comparable to human-made summaries.
Abstractive summarization can be regarded as a “sequence mapping task”, where the source text should be mapped to the target summary, and take advantage of the advancements in deep learning techniques and “sequence to sequence models”. Just like with machine translation models, these sequence-to-sequence models consist of an encoder and a decoder, where a neural network reads the text, encodes it, and then generates the target text.
Because it involves complex language modeling, building automatic human-like abstractive summaries remains a challenging task.
Anne-Maj van der Meer is a marketing professional with over 10 years of experience in event organization and management. She has a BA in English Language and Culture from the University of Amsterdam and a specialization in Creative Writing from Harvard University. Before her position at TAUS, she was a teacher at primary schools in regular as well as special needs education. Anne-Maj started her career at TAUS in 2009 as the first TAUS employee where she became a jack of all trades, taking care of bookkeeping and accounting as well as creating and managing the website and customer services. For the past 5 years, she works in the capacity of Events Director, chief content editor and designer of publications. Anne-Maj has helped in the organization of more than 35 LocWorld conferences, where she takes care of the program for the TAUS track and hosts and moderates these sessions.