Martech Learning Technology

‘Multilingual doesn’t mean multicultural’: AI and the Anglocentric web

By Jack Stacey, Content Writer

Earnest

|

The Drum Network article

This content is produced by The Drum Network, a paid-for membership club for CEOs and their agencies who want to share their expertise and grow their business.

Find out more

May 5, 2023 | 8 min read

Jack Stacey and David Gyertson of Earnest wax lyrical about the art of translation, and question the linguistic diversity of language-based tools like ChatGPT.

Old assorted books on shelf

“I learned to love the web through Latin” - Earnest’s Jack Stacey / Gabriella Clare Marino

Throughout school, I had a great talent for understanding languages. Not so much the speaking of them; the wires connecting the brain to mouth have never been that robust.

But the art of translating and context was something I loved. Connecting words to meaning, putting them into an order that made sense. Using rules, cultural understanding and context to make sense of words you didn’t know, and then turning them into something entirely new.

That ‘new-ness’ is a key part of the art of translation. Gabriel Garcia-Marquez always said that the English translations of his work were often superior art forms. He saw them as entirely new pieces of art in themselves. That idea of translation creating something new is what introduced me to coding.

Coding is its own kind of translation, taking copy and fragmenting it and marking it up until it takes on a life if its own (this was the early 00s, we’re still talking basic stuff, and copy and a few images were all we had to work with).

The power of mnemonics

So, after a first degree in European languages, I went off on a tangent and found a way to make the worlds connect.

At the School of Library and Information Studies at UCL, as part of my master’s degree, I began to digitize some of the works of Garcia Lorca as part of the Text Encoding Initiative.

This soon came to seem like a preposterous task, when you realize that all code is inherently English. In HTML, 'p' means para, and 'h1' means heading 1. The art of marking Spanish literature in English code felt very muddy.

Mnemonics like this are an incredibly powerful tool for learning rapidly. But the foundations of the web are built upon these English mnemonic devices.

Just think how much harder it must be for a developer to remember why they are using p (as in paragraph) or a data-src attribute when those letters in that sequence have no logical meaning in their native language.

AI and the American-English-centric web

For 20 years now, I have lived with a healthy cynicism of this ‘world-wide’ web. It’s an Anglocentric construct, designed and built for, and by, the West, but gladly used by others. And yet, we act surprised when other nations might have their own dominant social networks, or block information they view as unhealthy to their citizens.

ChatGPT and Bard (among hundreds of other niche tools) are of course the new darlings of the web. The power to generate content at scale, to answer sophisticated queries with articulate and detailed responses.

Yet, while ChatGPT is nominally multilingual and can respond in Mandarin, it isn’t actually available in China. And, when responding in Chinese, how many Chinese sources is it actually calling upon, or is it just translating the same data set to another language?

Suggested newsletters for you

Daily Briefing

Daily

Catch up on the most important stories of the day, curated by our editorial team.

Ads of the Week

Wednesday

See the best ads of the last week - all in one place.

The Drum Insider

Once a month

Learn how to pitch to our editors and get published on The Drum.

Multilingual does not equal multicultural

ChatGPT (and no doubt others, but all the facts are not clear yet) does a very good job at responding to questions in other languages, in other languages.

But that is language alone. The actual context of its responses is formed upon the content that it is fed - content from the web, selected initially by a handful of developers working on the project.

Research has found that earlier versions of the tools had 51% of their content coming from US-hosted webpages. Other English-speaking countries with sizable populations made little contribution to the data set.

The fiction books used to train the natural-speaking language models were founded on a set of around 700 English books. Wikipedia, in English, was used as one of the first data sets to work with.

And where other language content was included, it can widely be assumed much of this was in the form of translations of English first content. Google Translate, the first AI junk content bot of the web, churns out millions of webpages of content - translations of words, without the cultural context.

Why does it matter?

Well firstly, this illustrates that there is undoubtedly a huge bias underlying AI. Bias in the information it works with, and how it learns to speak to you about that information.

Search has always had its intrinsic bias through algorithms, but at least there was a choice. You were served with a range of views and could select from them.

The use of AI as research for content generation feeds a vicious circle - where the research done is provided on limited data, in language formed from English literature, and then fed back into that content engine of the web in whatever language

As marketers, we rely on content. And we research to produce that content. Some will use AI tools for this, the worst will just use AI to write that content.

The majority of us will no doubt come to rely on AI bots to feed us knowledge and help with that research. But it is crucial that we do so with our eyes wide open and dig deeper, learn and discover for ourselves, and keep providing value and context to the information we find.

Martech Learning Technology

Content by The Drum Network member:

Earnest

Earnest is the award-winning B2B marketing agency that’s chasing out the humdrum in London and New York.

Why is B2B treated like the poor cousin to B2C?...

Find out more

More from Martech

View all

Trending

Industry insights

View all
Add your own content +