Machine Translation Going Neural

  By: Sofiane Madani

Turn around, cost and productivity are real concerns for the translation industry and computer-aided translation tools are not enough to face these challenges. State of the art machine translation can help the industry to go the extra mile needed to reduce turn around and cost and increase productivity. Traditional machine translation systems in the past were rule-based and unable to match the fluidity of the human language. Knowing that accurate translation requires background knowledge, these classical rule-based systems cannot be efficient in resolving language ambiguity. The rule-based system uses dictionaries, grammar rules and exceptions to convert a source text to a target text. The quality with this system has always been an issue and good output is rare. For example, manger un avocat can be translated by eating a lawyer. Although exceptions can be added to the system by specifying that if avocat is preceded by the verbs manger ou déguster, it will be translated avocado, the number of rules can be vast without adding value to the system.

With the evolution of technology, hybrid systems and Statistical Machine Translation enabled much better results. These hybrid MT engines use a statistical core enriched by linguistic rules. These engines leverage client's translation memories and glossaries and by applying these rules give an output that requires only a light editing. In addition to using perfect matches in the pretranslation process, the engine learns from the existing translations and chooses words and phrases by applying statistics. For example, we can have in our big bilingual corpora both the words: lawyer and avocado as a translation for the French word avocat. The engine will scan the occurrences and will find in 100% of the occurrence of the French word avocat when followed by the verbs manger, déguster or another synonym is translated avocado. The engine will also apply reordering, spelling and formatting rules to ensure the best results possible. The whole process includes two steps: engine pre-training and post-training, the pre-training consists of all the resources above-mentioned, the post-training is the feedback provided by the linguists. The post-training step will help make corrections, update rules and avoid frequent and systematic errors. Linguists also need to assess the post-editing distance, the percentage of editing. If the post-editing distance is below 20%, the engine will help translate about 1,000 words per hour, and this is a very good result and an important increase in productivity.

The most modern approach in Machine Translation is the use of artificial intelligence, what is called deep learning. Thanks to neural language models, the system process and learn from natural language instead of using phrase-based translation by processing and tuning many sub-components separately. Large neural network leveraging big data reads a sentence and gives a correct translation. These models using complex algorithms are able to handle context issues more effectively. Although the system is able to use a fixed-size representation to capture the semantic details of a very long sentence, the output often requires light editing when the sentence is too complex. The system is not perfect; it sometimes does not translate all the words of the source sentence, especially with long sentences. The neural translation models when receiving training and combined to other resources can yield high-quality translation and will completely change the way we translate.

  Other articles

Freelancing: a way of life

Getting help with a MT system

Freelancing and localization

XML and translation technologies