17/05/2023

Domain-specific MT systems: specialist machine translation

Are subject- or domain-specific machine translation (MT) systems more effective than generic ones? Jasmin Nesbigall, our Head of MTPE and Terminology Management, explores this question using concrete comparative analyses.

Domain-specific MT systems are engines that have been trained by the provider with data from a specific subject area, for example, law, software or mechanical engineering. Using them promises better results with specialised terminology and text specifics than is generally the case with generic MT systems. The more specific a subject area and its text types, the greater the advantage could be in using domain-specific engines. But can they deliver what many users expect?

Our latest open Q&A session was about increasing quality when using machine translation. One area that it focused on was using subject-specific machines. Some MT providers specialise in this type of system and cover a range of subject areas and language combinations. Often, domain-specific engines also form the basis for company-specific MT training. In these scenarios, for example with a medical technology company, an existing domain-specific engine forms the basis and is further trained or ‘fine-tuned’ using the company’s own material.

Depending on the system used, it is also possible to integrate terminology as a glossary so that specialised terms are implemented according to certain specifications. To compare the two types of systems – generic vs. domain-specific – the first things that are interesting to consider are the costs, the time it takes to set up, and the quality that can be expected or is expected in concrete terms.

Expectation: Effort and costs vs. quality

Generic systems are available at low cost in monthly subscriptions or even free of charge, at least if you leave data security aside. Therefore, they can be used within just a few minutes. (Usually the T&Cs just need to be confirmed or an account needs to be created.) Although the quality varies depending on the provider and language combination, overall good to very good results can be achieved with the well-known tools.

Domain-specific engines are often somewhat more expensive in comparison, as providers have specifically trained them with certain texts and the data must be curated in greater detail in advance than is the case with generic systems. Finding the right engine that matches both the subject area and the language combination can also take a little longer. Overall, however, it does not take long until it is ready to use after setting up an account.

Given the general time and cost involved, the quality expectations for domain-specific engines are naturally higher than for generic systems.

MT system comparison: domain-specific vs. generic

Since domain-specific MT systems have been trained with data from a specific subject area, there is an expectation that specialised terminology will be implemented correctly and consistently. The same applies to an appropriate and customary style, especially if the texts are highly standardised, such as for the legal field or in medical reports. In addition, subject-specific features should be implemented correctly. For example, in software translation, it is expected that the sentence structure in a software text will remain correct even if it contains many references to the interface, because the MT system can handle these insertions appropriately.

The fundamental challenge is finding the right engine for a subject area and company. After all, domain-specific engines do contain specialised terms, but the question is whether these are also the terms that users want to use. As a general rule, companies use their own specialised terms as brand-defining proper names and their own corporate language to make their unique position clear and to distinguish themselves from the competition. If, however, a company does not provide specifications for this, as is often the case, or if these cannot be explicitly transmitted to the MT system, the engine translates even specialised terms as it has learnt. This shifts the focus towards standardising terms: Does a subject area have specialised terms that are universally applicable, meaning that correct, consistent implementation of the terms can be expected? Or does it perhaps say “Schraubendreher” at the beginning of the text and “Schraubenzieher” at the end?

We looked at all these and similar aspects in a test project and created a series of analyses and key figures for them. To do this, we used an example text from the automotive sector, from which 20 terms such as Radarsensor and Spurhalteassistent were extracted and their English equivalents (preferred terms) were identified.

In total, these 20 fixed terms appeared 59 times in the text, i.e. 59 places where the MT system could implement the terminology correctly or disregard it. Three different engines were used: a generic engine, a domain-specific engine for the automotive sector and, for comparison, a domain-specific engine for the technology sector.

All three translation results were professionally post-edited, following the specifications of the ISO 18587 standard. Afterwards, all errors – whether linguistic or content-related – must be corrected, but no unnecessary changes should be made. The results for the changes and adjustments showed differences, but no massive deviations from each other:

Domain-specific MT systems; Comparison of results

Comparison of results (source: oneword GmbH)

Orange: no changes necessary after machine translation
Petrol: minor adjustments up to 15% of the segment
Light grey: strong adjustments, up to 50% of the segment
Dark grey: over 50% adjustments, almost a retranslation

For the automotive engine, the proportion of segments that did not need to be corrected at all was slightly higher at 18 per cent, while it was 16 per cent for the generic engine and 13 per cent for the technology engine. In the area of minor adjustments, the automotive engine and the generic engine were about equal. The proportion of segments that required a lot of adjustment was the same for all three engines, at around 50 per cent. The proportion of complete revisions or retranslations varied between 6 and 13 per cent. The percentage was somewhat higher for the automotive engine than for the generic engine. This means that the automotive engine was ahead in segments that did not have to be adapted at all, but also in those that were completely retranslated.

Another question concerned the general number of differences between the domain-specific engines and between the domain-specific engine and the generic engine. So, more specifically: How different is the output when using a domain-specific engine? Is it really only the specialised terminology that changes or, for example, the entire sentence structure?

Domain-specific MT systems; Differences in the results

Differences in the results (source: oneword GmbH)

When comparing the automotive engine with the generic engine, 38 per cent of the translated segments were identical, but 62 per cent differed. The difference was even more pronounced when comparing the technical engine with the generic engine. Here, only 28 per cent of the translated segments were identical, but 72 per cent differed significantly. The discrepancies are therefore clearly due to fundamentally different translation results as well as individual terms.

The interim assessment is: a lot is changing, but not only for the better

As mentioned, the main expectation when using subject- or domain-specific MT systems is that specialised terms will be implemented correctly and consistently. However, no useful glossary could be integrated in any of the three systems used. As a result, no clear guidelines could be provided for implementing specialised terminology. It was therefore all the more important for us to find out to what extent the native translation result matches what a company would expect in this field. This involved a detailed examination and evaluation of the terminology occurrences and their implementation. Of the 59 places where the terms we specified occurred, the generic engine made 12 errors or deviations from our specifications, the automotive engine made 14 and the technology engine 17. In proportionate terms, almost 80 per cent of the specialised terms are correct with the generic engine, 76 per cent with the automotive engine and 71 per cent with the technology engine.

To our own surprise, the cherished hope that the subject- or domain-specific MT systems would perform better than the generic engine was misplaced to begin with. Therefore, it was important for us to look at the details, and specifically at the standardisation of the specified terms. The more standardised a term is, the more likely generic engines are to implement it. At the same time, looking at consistency was interesting, because consistent implementation of terminology is also expected when using domain-specific engines. Of course, this could always mean that a term has been translated consistently but actually incorrectly. This requires less time to correct because the correction can then be made across the whole document at once.

For verification purposes, we have therefore selected the terms Fahrerassistenzsystem (specified term: driver assistance system) and Spurhalteassistent (specified term: lane tracking assistant) as specified terms. Our result: Fahrerassistenzsystem was translated to the specified term and consistently throughout the document by all three systems. However, Spurhalteassistent was not translated into the specified term and was translated in up to two different ways. For other specified terms, there were as many as four different translations.

Domain-specific MT systems; Comparison of terminology

Comparing terminology (source: oneword GmbH)

This means that domain-specific engines were not found to implement the terms more consistently. They also translate specialised terms in the relevant context, as learned by the machine, meaning that the translation can differ from segment to segment.

Conclusion: Domain-specific AI needs human expertise

In our test project, using domain-specific engines did not meet the expectation: they did not translate the specialised terms more correctly or more consistently. Also, the proportion of segments that came out of the machine without any changes required was not significantly higher than with the generic system. It was not a particularly specific text structure, so in our test the main focus was on the terminology and the segments that required changes. All three engines used required a large amount of major adaptations, so the overall post-editing effort was considered high.

Would you like to learn more about using machine translation, comparative analyses, evaluating errors or training company-specific engines? Then get in touch with us at mtpe@oneword.de.

8 good reasons to choose oneword.

Learn more about what we do and what sets us apart from traditional translation agencies.

We explain 8 good reasons and more to choose oneword for a successful partnership.

Request a quotation

    I agree that oneword GmbH may contact me and store the data that I provide.