Cleaning up language data with oneCleanup

Loading...

Cleaning up language data

Cleaning up data made easy with oneCleanup

Request a free quote here

Übersetzungsbüro oneword; Leiste mit verschiedenen Logos

Data growth in translation memory and terminology database

Language data accumulates quickly during the translation process. The translation memory for each language direction grows with each translation and, depending on the process, the terminology database also grows at the same time. The data collected should ensure that translations are consistent, guarantee the correct use of specialised language and save time and money. Language data is definitely valuable and can also be used for many other company processes, such as knowledge management and customer support.

However, there is always the risk that large volumes of data will get confusing and therefore more difficult to manage. Multiple TM hits for an identical source segment, for example, take longer to check during the translation process and the segment incurs a higher charge even though the translation is actually already available. Uncontrolled database growth may also result from incorrect segmentation, importing old data without checking it, and merging different data sources.

You can find out how you can help to develop cleaner language data with your source texts in our blog post on translation-oriented writing.

Areas of application for language data

The usability of language data is no longer limited to the translation process. Data generated during translation also plays an important role in knowledge management, technical editing and when using artificial intelligence (AI). For example, the data can be used to fine-tune large language models (LLMs) in order to add company-specific content to a model pre-trained on general data. Meanwhile, terminology data is important both when creating the source text and during machine translation to ensure that the correct specialised language is used in the texts.

Terminology clean-up at HOMMEL ETAMIC

The broader the use of language data, the more important it is that the data is clean and usable. Quality comes before quantity: in neural machine translation, irrelevant training data can reduce the quality of the translation output. With LLMs, large amounts of data generate costs and may also dilute the translation output with input that the engines cannot use to learn anything or from which they can only glean ambiguous information. This turns databases into a data burden.

Jasmin Nesbigall

Head of MTPE and Terminology Management

j.nesbigall@oneword.de +49 (0)7031 714-9552

oneCleanup: language data clean-up made easy

With oneCleanup, we support companies to check, manage and clean up their language data. We combine our decades of language and technology expertise to provide a smart complete service. Your data is analysed directly from the databases or via exchange formats and it can be cleaned up both linguistically and in terms of its form. oneCleanup is suitable for databases of any size. As every database may have its own special features, all steps of the process can also be customised to start exactly where your company needs it most.

Get in touch with us!

Unlock the potential of your data!

Data is the new gold, but it needs to be uncovered first, because large databases quickly become inefficient and difficult to manage. Whether you want to clean up TM data that has grown over the years or complete missing terminology data, thanks to scripting and automation, oneCleanup allows us to analyse large volumes of data quickly and effectively. The clear results obtained from the analyses indicate the potential for cleaning up the data and can be processed to the next step immediately if required.

The aim of oneCleanup is to obtain a reduced and clean database of TM and terminology data that is precisely tailored to your use scenario. No more oversized and messy databases that create more work than benefits!

oneCleanup offers:

Analysis of the potential to clean up translation memories and terminology databases
Quick assessment of the actual effort required to clean the data up
Formal and linguistic clean-up
Customised adaptation to company specifications and priorities
Comprehensive advice on objectives, formats and schedules

The results of the analyses are evaluated by our experienced team to identify and implement the appropriate corrective measures. This emphasises our high quality standards, as we only clean up data if there is genuine potential for improvement. Our detailed analyses also make it possible to implement the clean-up steps one by one so that the data can continue to be used at all times.

Request a quotation

oneCleanup blog

fuehrende-technologien-machine-translation-software

From a tangled mess to a treasure trove: learn what oneCleanup is and what it does.

HOMMEL ETAMIC case study

How Jenoptik has cleaned up its terminology data and implemented effective processes.

Terminology for MT blog

translation-memory-systeme-machine-aided-translation

We explain how terminology can be integrated into MT and what needs to be taken into account.

8 good reasons to choose oneword.

Learn more about what we do and what sets us apart from traditional translation agencies.

We explain 8 good reasons and more to choose oneword for a successful partnership.

Explore reasons