Chris Dell is Director of Booking.com’s in-house Content Agency. Starting nine years ago in the travel platform’s writing team in Barcelona, today he is based in Amsterdam and is responsible for visual content services (photos, videos), language services (translation and localization, through 200 global Language Specialists, >1,000 freelancers and an MT team), geodata services (defining regions and landmarks in 228 countries and territories); copywriting and content writing services (including property descriptions); and content moderation services (for example, of the >170 million property reviews currently live on the platform). Not what you might expect from someone who originally planned a career in art! Luckily Chris found a few free minutes to tell TAUS about the world of multilingual content development.
What kind of tech support does Booking.com use for its content?
We handle very large amounts of content, ranging from copy localization in up to 44 languages to a broad collection of user-generated content from our customers and partners. We want every part of the Booking platform to be accessible in all languages, but also relevant to every user. So an ability to manage scale, but also localize and globalize, is key.
A good example are the descriptions for the more than 28 million accommodation listings on our platform. For these, we use a homegrown software script that writes content in English - this automatic text generation uses a flexible template system with an additional logic layer to segment and potentially personalize the content. On top of this our Content Executives do some post-editing for specific features, and then we translate these descriptions using our neural MT system or with the help of freelance translators, depending on the language. We’ve also been running some internal tests on neural MT models that “translate” data into the English language. This means feeding a model with a number of existing descriptions and then asking it to write something based on certain parameters - we see that it’s about 80% effective, but potentially very cool for the future.
Our user-generated content (UGC), on the other hand, takes the form of comments and recommendations from clients and site visitors. One crucial question is how to handle offensive content that is submitted from time to time. We have to take such content seriously, but try to ensure that we can decide very quickly whether or not to display. For this, we have a large team checking content, and with the help of machine learning, over 90% of UGC will be displayed on our platform within a couple of hours of submission.
In the future, we’ll have to pay more attention to monitoring visual user-generated content such as photos. This isn’t a significant challenge so far, but the moderating process will become harder as more photo and video content goes online. For this, we’re working on an automatic moderation system, using semantic analysis of video content to identify “intent”. Happily, we don’t yet face the challenges faced around visual content moderation on platforms such as YouTube or Facebook, but as we move into more visual content we can expect more challenging material.
How did MT become integrated into the service mix?
We started in 2014 with a statistical MT (SMT) project driven by an intern, but it never really developed into a seamless process. It was first applied to property descriptions, and we were clear that we wanted to get to translated content that we could publish without post-editing (PE), as this responded to the scale and growth of our platform. In total, we dedicated about two years to the SMT approach, with some improvements, but it wasn’t good enough for PE-free publication.
We decided to make the switch to neural MT (NMT) at the end of 2016 - more or less the same time that Amazon and Google also switched approach. We reckoned our data was clean enough, and our content predictable enough that we could do it well. So in 2017, we funded the first official MT team in Booking – just three (very passionate) Data Scientists and one (equally passionate) Developer, headed up by Sathish Chander as Product Owner. One of the most critical early steps was introducing the right feedback loops from our Language Specialists to measure quality - that allowed us to improve our models quickly. So, instead of using overly abstract BLEU metrics, we had our in-house Language Specialists provide classified and actionable feedback, typically within 48 hours. This allowed us to get 12 current languages into production, and today we’re hoping to have around 30 (of our total spread of 44) languages in production by the end of 2019. Over the last six months, we’ve also started to train MT models on other types of content (e.g. customer reviews of properties), which has meant using a more general-purpose model.
Yet a further step will be localizing content such as geodata, campaigns, and visual content, and we’ll have to decide when we use humans or machines. This is a sliding scale of course, but the key parameter as we shift towards the experience end of the content spectrum will be the quality of “warmth” - empathy - which machines will struggle to deliver. So a current challenge for Emmanuelle Dumas, our Language Services Manager, is how to fulfill on a wide range of localization and translations services that the business needs using an equally wide range of resources, from raw MT to specialists.
How are your Language Specialists and freelance translators being prepared for your next content moves?
We need our in-house Language Specialists, who typically sit in their local markets (so the Simplified Chinese team sits in Shanghai, while the Argentinian Spanish team sits in Buenos Aires), to be constantly on top of local product insights and the local consumer culture - the language must always remain fresh and relevant. We recently introduced the Senior Language Specialist role to take responsibility for these sorts of insights. Personally, I think the LS role is one of the most interesting in the business - it’s incredibly diverse, working on partner- and guest-facing product and marketing content.
It’s normal that automation such as MT makes people feel a bit uneasy at first, but in Booking people tend to be more curious or intrigued than fearful. Our Specialists are really interested in tech developments and are getting involved in quality assessment of MT, as I mentioned before, so we can leverage their expertise. I love the pride they take in the quality of our MT.
Our people are critical to Booking’s success - we try to create an environment of empowerment, creativity, and ownership. This January we hired a new Change and Development Manager and have created a new senior language role dedicated to how we can train people and keep them sharp and aware of our competitors, for example, across our 44 language teams. If I tried to predict the future, I’d guess that any content that needs to scale very fast will go mostly to MT, and everything that requires empathy, emotion, nuance or something beyond straightforward translation will go to in-house specialists.
All this suggests that one art Chris and his team are mastering at Booking.com is that of effectively combining human creativity and empathy with the reach of machine intelligence. Getting this balance right is increasingly critical to a successful translation and localization strategy when you need to be agile about new content and multimedia opportunities.
Chris Dell will be the Keynote Speaker at the TAUS Global Content Summit Amsterdam on 6 March 2019. Save your seat!
Long-time European language technology journalist, consultant, analyst and adviser.