By 2022, Google had added 24 new languages using 'zero-shot' machine translation, where a machine learning model learns to translate into another language without ever seeing an example, and announced “the 1,000 Languages Initiative, a commitment to build AI [artificial intelligence] models that will support the 1,000 most spoken languages in the world", recalls Google.
"Now, we're using AI to expand the range of languages supported" and, "thanks to our great PaLM 2 language model, we're starting to roll out 110 new languages to Google Translate, our biggest expansion ever, including Portuguese from Portugal", he says, in an online publication.
In other words, Google Translate will now distinguish between variants of Portuguese (Portugal versus Brazil).
"From Cantonese to Q'eqchi', these new languages represent more than 614 million speakers, allowing translations for around 8% of the world's population", says Google.
Around a quarter of the new languages "are from Africa and represent our largest expansion of African languages to date, including Fon, Kikongo, Luo, Ga, Swati, Venda and Wolof", he adds.
Among the languages that are now supported in Google Translate are Afar, a tonal language spoken in Djibouti, Eritrea and Ethiopia. "Of all the languages in this launch, Afar had the highest number of voluntary contributions from the community", he highlights.
Then Cantonese, which had long been "one of the most requested languages on Google Translate", continues.
Other examples are Manx, the Celtic language of the Isle of Man, which was almost extinct with the death of its last native speaker in 1974, but "thanks to an island-wide revival movement, there are now thousands of speakers", and nko, a standardized form of the Manding languages of West Africa that unifies many dialects into a common language.
"Its unique alphabet was invented in 1949 and has an active research community that today develops resources and technology for it", says Google, in its publication.
There is also Punjabi (Shahmukhi), a variety of Punjabi written in Perso-Arabic script (Shahmukhi) and is the most spoken language in Pakistan, Tamazight, a Berber language spoken in North Africa, and Tok Pisin, a "creole of origin English and the lingua franca of Papua New Guinea".
Languages "have immense variation: regional varieties, dialects, different spelling patterns" and, in fact, "many languages do not have a standard format, so it is impossible to choose the 'right' variety."
But "our approach has been to prioritize the most commonly used varieties in each language", he adds.
"PaLM 2 was a key piece in this puzzle, helping Translator more efficiently learn languages that are closely related to each other, including languages close to Hindi, such as Awadhi and Marwadi, and French creoles, such as Seychelles Creole and Creole of Mauritius", he explains.
And as technology evolves "and we continue to partner with expert linguists and native speakers, we will, over time, support even more language varieties and spelling conventions."
No comments:
Post a Comment