15/08/2022

Updating and tuning the engines: what is new in machine translation?

In the ever-changing field of machine translation (MT), hardly a month goes by without news: new providers, new functionalities, new languages, new thoughts on how MT can be integrated even better into everyday translation. Since we do just that and use the technology continuously in a wide variety of applications, we are close to the developments in the industry and have taken a closer look at some of the innovations.

New languages

There were not many updates for a while after DeepL – one of the most widely used generic engines in Germany – released 13 new languages in one fell swoop last year. In mid-May, however, Turkish and Indonesian were added, making the tool interesting for another 300 million native speakers worldwide. We have already completed our first tests in both languages and the results are promising, as we have come to expect from DeepL.

Rhaeto-Romanic, Switzerland’s fourth official language, has far fewer native speakers than Turkish and Indonesian. Nevertheless, Textshuttle from Zurich has added this language to its portfolio (only available in German), becoming the first provider that can map it automatically. And in Switzerland in particular, with its diversity of languages, machine translation also has a firm place in everyday life for quickly transferring documents from one language to another.
Meta with its project “No language left behind“, also demonstrates that it is not about how many native speakers a language has, but rather, on the contrary, that languages that have been significantly under-represented until now are the ones that are seeing progress in the MT world. The project involves a translation model that aims to reproduce 200 languages in high quality, including 55 African languages, for example.

New functionalities

While new languages naturally always open up new projects, research also continues into optimising the interfaces between the translation environment, translation memory system (TMS) and MT engine. It is currently standard practice to take 100 per cent matches from the TMS and give them priority over MT. The background story: Therefore, existing (“remembered”) translations were approved in a human translation or a previous post-editing job and have already been used in corporate texts. In the best case, then, they are the “gold standard” to be used for every new translation.

On the other hand, fuzzy matches, i.e. segments that deviate between one and 15 per cent from an existing translation, are controversial: although they often only require minor adjustments, they definitely require manual intervention by the post-editor. While human translators compare each segment with the TMS and adopt and adapt fuzzy matches, machines do not yet perform this comparison. Therefore, the deviations can either be left in the hands of the post-editor and adapted, or they will be translated from scratch by the machine, likely making them quite different to the previous text that was actually similar in the source language.

This is where the promising approach of the MT provider Systran comes in. Its machines compare each segment with the TMS, transfer everything that matches and only have the MT translate the new part of the sentence. According to the supplier, this functionality influences the quality and accuracy of the output in a way that can otherwise only be achieved through time-consuming and cost-intensive MT specialisation.

News from the world of terminology

However, it is not only fuzzy matches that offer a relevant area for optimising MT. Terminology specifications are particularly important. Analyses of our MTPE projects and feedback from our post-editors show time and again that: terminology is and remains the biggest source of error and the most time-consuming aspect of post-editing. Generic machines usually cannot follow the terminology specifications at all and, even in specialised machines, the specifications must have been included extensively in the training for the machine to be able to implement it. A direct connection between machines and terminology databases is still pure wishful thinking.

However, things are also happening in this area, and several providers have included the first glossary functions in their portfolios. DeepL, for example, now supports glossaries in seven language pairs, even though the function is still not integrated into CAT tools but limited to Windows and online applications.
Textshuttle is next up: it provides terminology support based on tbx data to include it in the translation in the CAT tool. However, in both cases terms are lifted, i.e. they replace terms using the specifications without considering the context. The terminology data must be prepared appropriately for it to be used meaningfully and, in the best case, only relevant terminology must be provided. So there is still room for improvement in this very important field for optimisation in post-editing.

Post-editing: soon to be superfluous or here to stay?

Finally, let us take a separate look at post-editing itself. Because, even though a lot has happened since the early days of MT, when inconsistencies, gross grammatical errors and unnatural sentence order were among the typical, and so very obvious, mistakes, even today no MT translates a text without any errors at all. The mistakes, however, are becoming more and more subtle, come along under a cloak of linguistic elegance and are sometimes difficult to recognise. After all, what reads fluently and coherently still all too often contains terminological and semantic errors, omissions or arbitrary additions. According to many experts, post-editing will therefore be just as necessary in the long term as optimal support from post-editors in their day-to-day work.

Surveys among consumers take a different view. 65 percent of them prefer a translation that has clearly been produced by a machine, for example with linguistic errors, to product texts that are not even available in their native language. 40 percent even say they will not buy anything that is only advertised in foreign languages. MT providers are therefore seeing more and more demand for pure machine translation and a “no human in the loop” approach, which is interesting wherever speed, cost, direct availability and volume are more important than linguistic quality. The output is then only checked retrospectively and on a random basis (P3: post-publish post-editing) for highly visible content such as headlines or landing pages. Instead of post-editing, the text is checked for linguistic style, non-discriminatory language and brand message, i.e. there is a clear focus on cultural over linguistic aspects.

To sum up: As always, there is a lot happening in the MT sector and there is still room for optimisation. As usual, we are keeping our finger on the pulse and are happy to support and advise our customers on how to make the best use of the new functionalities.

Would you like to be among the first to hear about industry news and receive interesting information about current topics and technical innovations in the areas of translation, terminology and localisation? Then register for the oneword newsletter.

8 good reasons to choose oneword.

Learn more about what we do and what sets us apart from traditional translation agencies.

We explain 8 good reasons and more to choose oneword for a successful partnership.

Request a quotation

    I agree that oneword GmbH may contact me and store the data that I provide.