Similarly, Sharif and Tse reported an overall 50% error rate for medicine labels translated from English to Spanish by computer programs. Google Translate has also exhibited a high rate of translation errors when translating content on state and national public health websites from English to Chinese . However, to date, we are unaware of any studies evaluating the outputs of a machine translation tool when translating from English to multiple languages drawn from health education material on diabetes.
Therefore, it is critical to identify and evaluate available translation tools for helping LEP speakers of different languages understand English health education material. Launched in April 2006 as a statistical machine translation service, it used United Nations and European Parliament documents and transcripts to gather linguistic data. Rather than translating languages directly, it first translates text to English and then pivots to the target language in most of the language combinations it posits in its grid, with a few exceptions including Catalan-Spanish. During a translation, it looks for patterns in millions of documents to help decide which words to choose and how to arrange them in the target language. Its accuracy, which has been criticized and ridiculed on several occasions, has been measured to vary greatly across languages.
Originally only enabled for a few languages in 2016, GNMT is now used in all 109 languages in the Google Translate roster as of September 2021, except for when translating between English and Latin. The pros and cons of Google translate not only impact professional translators in the language service industry, but rather anyone who chooses to use it as a translation tool. Certainly, online public access to a free, quick, and relatively accurate translation method represents significant progress in translation technology. But when one directly compares translation quality and accuracy using Google Translate with that of an experienced human translator, there is no real comparison.
Google Translate, like other automatic translation tools, has its limitations. Grammatically, for example, Google Translate struggles to differentiate between imperfect and perfect aspects in Romance languages so habitual and continuous acts in the past often become single historical events. Although seemingly pedantic, this can often lead to incorrect results which would have been avoided by a human translator. Knowledge of the subjunctive mood is virtually non-existent.[unreliable source?
] Moreover, the formal second person is often chosen, whatever the context or accepted usage.[unreliable source? ] Since its English reference material contains only "you" forms, it has difficulty translating a language with "you all" or formal "you" variations. In addition to ensuring human translation accuracy, improvements to machine translation tools are also necessary prior to use by patients and health care providers. Health educators should make efforts to achieve higher translation accuracy for machine tools and ultimately make sure health education information is not misinterpreted and necessary care not delayed. Mismatches between the vocabulary bank in machine translation systems and the terminologies used in the original language texts are common sources of machine translation errors . Developing a universal code system for machine translation can improve language translation accuracy .
Therefore, we call for collaborations between computer science engineers and public health/health education professionals to work on this language translation technique, which could assist LEP populations better understand health information. This pilot study evaluated the accuracy of Google Translate when translating diabetes patient education materials from English to Spanish and English to Chinese. We found that Google provided accurate translation for simple sentences, but the likelihood of incorrect translation increased when the original English sentences required higher grade levels to comprehend.
For example, the most simple sentence in our study ("Stop smoking") translated by Google received full scores on every domain for both languages, while Google received lower scores on more difficult sentences for both languages. The Chinese human translator provided much more accurate translation than Google did. The Spanish human translator, on the other hand, did not provide a significantly better translation compared to Google. Additionally, we identified some sentences translated by Google from English to Chinese that might lead to delayed patient care. Similarly, one sentence translated by the professional human translator from English to Spanish could also have a negative impact on patients. The results demonstrate that Google is capable of producing a more accurate translation from English to Spanish than English to Chinese.
Before October 2007, for languages other than Arabic, Chinese and Russian, Google Translate was based on SYSTRAN, a software engine which is still used by several other online translation services such as Babel Fish . From October 2007, Google Translate used proprietary, in-house technology based on statistical machine translation instead, before transitioning to neural machine translation. The Chinese human translator provided much more accurate translation than Google; however, the Spanish human translator did not provide a significantly better translation than Google. In contrast to our findings, Khanna et al reported that Google made more errors than human translators when translating patient education materials from English to Spanish. Zeng-Treitler et al concluded that Babelfish was not a good machine translation tool because of its high percentage of inaccuracy.
Due to differences between languages in investment, research, and the extent of digital resources, the accuracy of Google Translate varies greatly among languages. Most languages from Africa, Asia, and the Pacific, tend to score poorly in relation to the scores of many well-financed European languages, Afrikaans and Chinese being the high-scoring exceptions from their continents. No languages indigenous to Australia or the Americas are included within Google Translate. Higher scores for European can be partially attributed to the Europarl Corpus, a trove of documents from the European Parliament that have been professionally translated by the mandate of the European Union into as many as 21 languages.
A 2010 analysis indicated that French to English translation is relatively accurate, and 2011 and 2012 analyses showed that Italian to English translation is relatively accurate as well. However, if the source text is shorter, rule-based machine translations often perform better; this effect is particularly evident in Chinese to English translations. While edits of translations may be submitted, in Chinese specifically one cannot edit sentences as a whole. Instead, one must edit sometimes arbitrary sets of characters, leading to incorrect edits. Formerly one would use Google Translate to make a draft and then use a dictionary and common sense to correct the numerous mistakes. As of early 2018 Translate is sufficiently accurate to make the Russian Wikipedia accessible to those who can read English.
The quality of Translate can be checked by adding it as an extension to Chrome or Firefox and applying it to the left language links of any Wikipedia article. One can translate from a book by using a scanner and an OCR like Google Drive, but this takes about five minutes per page. In November 2016, Google transitioned its translating method to a system called neural machine translation.
It uses deep learning techniques to translate whole sentences at a time, which has been measured to be more accurate between English and French, German, Spanish, and Chinese. No measurement results have been provided by Google researchers for GNMT from English to other languages, other languages to English, or between language pairs that do not include English. We also wish to highlight that in some cases professional human translators might also make severe errors that negatively impact patients' health compared to machine translation tools.
Flores et al contend that the most common types of mistake by human interpreters, which could potentially cause medical accidents, include omission, false fluency, substitution, editorialization, and addition. For this reason, we recommend continuous training and credential practice standards for professional medical translators to enhance patient safety. For example, Michael et al developed a translation standard to guide the language-translating process for health education information with 10 key components (p. 550). When used as a dictionary to translate single words, Google Translate is highly inaccurate because it must guess between polysemic words. Most common English words have at least two senses, which produces 50/50 odds in the likely case that the target language uses different words for those different senses. The accuracy of single-word predictions has not been measured for any language.
When Google Translate does not have a word in its vocabulary, it makes up a result as part of its algorithm. Current statusActiveGoogle Translate is a multilingual neural machine translation service developed by Google, to translate text, documents and websites from one language into another. It offers a website interface, a mobile app for Android and iOS, and an application programming interface that helps developers build browser extensions and software applications. As of September 2021, Google Translate supports 109 languages at various levels and as of April 2016, claimed over 500 million total users, with more than 100 billion words translated daily. Bridging the language barrier gap isn't an easy feat, and while Google Translate is making progress, it's not ready to handle the intricacies found in business translations nor is it intended to replace professional translators. Although Google deployed a new system called neural machine translation for better quality translation, there are languages that still use the traditional translation method called statistical machine translation.
It is a rule-based translation method that utilizes predictive algorithms to guess ways to translate texts in foreign languages. It aims to translate whole phrases rather than single words then gather overlapping phrases for translation. Moreover, it also analyzes bilingual text corpora to generate statistical model that translates texts from one language to another. Google Translate produces approximations across languages of multiple forms of text and media, including text, speech, websites, or text on display in still or live video images. For some languages, Google Translate can synthesize speech from text, and in certain pairs it is possible to highlight specific corresponding words and phrases between the source and target text. Results are sometimes shown with dictional information below the translation box, but it is not a dictionary and has been shown to invent translations in all languages for words it does not recognize.
If "Detect language" is selected, text in an unknown language can be automatically identified. In the web interface, users can suggest alternate translations, such as for technical terms, or correct mistakes. These suggestions may be included in future updates to the translation process.
If a user enters a URL in the source text, Google Translate will produce a hyperlink to a machine translation of the website. Users can save translation proposals in a "phrasebook" for later use. For some languages, text can be entered via an on-screen keyboard, through handwriting recognition, or speech recognition. It is possible to enter searches in a source language that are first translated to a destination language allowing one to browse and interpret results from the selected destination language in the source language. First, we recruited ATA-certified translators as evaluators who, because of their professional training, had more credibility for scientifically evaluating translation accuracy than non-professional bilinguals such as graduate students. Translators also have different translation styles and knowledge of second language audiences.
The selection of certified translators might cause measurement bias because these professional translators are different from general LEP patients. For instance, compared to LEP patients, certified translators are bilingual, well-educated, and have higher literacy levels. Thus, sentences that are understandable to them might not be understandable to LEP patients. Future research might recruit LEP participants to evaluate these translation products, and researchers might conduct cognitive interviews while participants read these sentences.
Second, our study mainly focused on describing the translated products from a technical perspective instead of assessing message consumers' experience from a user perspective. Testing LEP diabetes patients' knowledge and behavior change after using Google Translate to process health education messages is another direction for future study. We evaluated six original English sentences and recruited 6 evaluators, which had less power for generalizability. Researchers should include a large sample of original sentences and evaluators for future study.
We chose a freely accessible diabetes patient education pamphlet as a heuristic example for evaluating the accuracy of machine translation devices. The pamphlet, "You are the heart of your family…take care of it," is published by the National Institutes of Health and the Centers for Disease Control and Prevention and distributed by the National Diabetes Education Program. This pamphlet includes six written sentences as behavior change suggestions for managing diabetes and three recommended questions for patients to ask their clinicians.
This paper examines the accuracy of Google Translate when translating the six written diabetes prevention and management strategies to determine the differences between machine and human translators, which could be used to direct further research. This study was approved by the Texas A&M University Institutional Review Board. Google produced a more accurate translation from English to Spanish than English to Chinese. Some sentences translated by Google from English to Chinese exhibit the potential to result in delayed patient care. We recommend continuous training and credential practice standards for professional medical translators to enhance patient safety as well as providing health education information in multiple languages.
Thankfully, Google Translate is using Neural Machine Translation with a total of eight language pairs. Eventually, this will expand to all 108 language pairs available in Google Translate already. Neural Machine Translate is much more sophisticated—it interprets whole sentences at a time rather than phrases word-by-word. Compared to Google's previous algorithm, Google Translate cuts down 80% of errors. The previous algorithm used a method of cutting up a sentence and matching word or phrase to a large dictionary of words. Now, this system will take that same dictionary and use two different neural networks to translate the text.
One network will break down the sentence to determine the context of the phrase, while the other network will generate the text in a different language. The human translations contained substantially less errors than those provided by Google Translate, and the reviewers overwhelmingly preferred the human-translated instructions as opposed to those translated using Google Translate. However, there were some discrepancies in inter-rater reliability between the two reviewers, so the data obtained in this phase of the study was not entirely statistically significant for each error type.
Google has crowdsourcing features for volunteers to be a part of its "Translate Community", intended to help improve Google Translate's accuracy. Volunteers can select up to five languages to help improve translation; users can verify translated phrases and translate phrases in their languages to and from English, helping to improve the accuracy of translating more rare and complex phrases. In August 2016, a Google Crowdsource app was released for Android users, in which translation tasks are offered. First, Google will show a phrase that one should type in the translated version. Second, Google will show a proposed translation for a user to agree, disagree, or skip.
Third, users can suggest translations for phrases where they think they can improve on Google's results. Tests in 44 languages show that the "suggest an edit" feature led to an improvement in a maximum of 40% of cases over four years, while analysis across the board shows that Google's crowd procedures often reduce erroneous translations. Notwithstanding these limitations, this investigation provides important contributions to the ever-growing literature base examining the effectiveness of machine translation tools. In particular, our findings highlight that as sentences become more complex in health information and require higher levels of reading ability, the likelihood of machine translation tools making errors increases. As shown in the paper, these errors have the potential to negatively impact patient health behaviors. At the time, it was based on zillions of human translations made by the European Parliament and the United Nations.
But the translation algorithm was built on a statistical model. Simply put, it meant that Google translated meaning on a word level. Accuracy decreases for those languages when fewer of those conditions apply, for example when sentence length increases or the text uses familiar or literary language. For many other languages vis-à-vis English, it can produce the gist of text in those formal circumstances. Human evaluation from English to all 102 languages shows that the main idea of a text is conveyed more than 50% of the time for 35 languages. For 67 languages, a minimally comprehensible result is not achieved 50% of the time or greater.
A few studies have evaluated Chinese, French, German, and Spanish to English, but no systematic human evaluation has been conducted from most Google Translate languages to English. Originally, Google Translate was released as a statistical machine translation service. The input text had to be translated into English first before being translated into the selected language. Since SMT uses predictive algorithms to translate text, it had poor grammatical accuracy.
Despite this, Google initially did not hire experts to resolve this limitation due to the ever-evolving nature of language. As shown in Figure 2, when sentences were translated from English to Spanish, S2 and S3 had a considerable difference between Google and human in the Fluency domain, where the human translator did much better than Google. For the relatively easy sentences , there was not much difference between Google and human in any of the four domains. Interestingly, there was not much difference for the most difficult sentence either.
We also noticed some obvious gaps for S5 in the Adequacy, Meaning, and Severity domains, where Google received a higher translation accuracy than the human translator did. As shown in Figure 3, when sentences were translated from English to Chinese, S5, S2, S3, and S6 had a considerable difference between Google and human in all four domains, where the human did much better than Google . Similar to what we found in the Spanish set, there was not much difference between Google and human in all domains for the easier sentences . When comparing between Figures 2 and and3, 3, results showed that the general distance between Google and human for Chinese is larger than Spanish, indicating that Google provided higher accuracy translation service in Spanish than in Chinese. Two professional medical translators translated the original English pamphlet into Spanish and Chinese, respectively. Both were American Translators Association ‒certified translators .
The ATA website lists all the certified translators' contact information. We approached both translators as regular customers seeking translation services. We did not inform them that their translation product would be evaluated.