‘Multilingual doesn’t mean multicultural’: AI and the Anglocentric web
Jack Stacey and David Gyertson of Earnest wax lyrical about the art of translation, and question the linguistic diversity of language-based tools like ChatGPT.
“I learned to love the web through Latin” - Earnest’s Jack Stacey / Gabriella Clare Marino
Throughout school, I had a great talent for understanding languages. Not so much the speaking of them; the wires connecting the brain to mouth have never been that robust.
But the art of translating and context was something I loved. Connecting words to meaning, putting them into an order that made sense. Using rules, cultural understanding and context to make sense of words you didn’t know, and then turning them into something entirely new.
That ‘new-ness’ is a key part of the art of translation. Gabriel Garcia-Marquez always said that the English translations of his work were often superior art forms. He saw them as entirely new pieces of art in themselves. That idea of translation creating something new is what introduced me to coding.
Coding is its own kind of translation, taking copy and fragmenting it and marking it up until it takes on a life if its own (this was the early 00s, we’re still talking basic stuff, and copy and a few images were all we had to work with).
The power of mnemonics
So, after a first degree in European languages, I went off on a tangent and found a way to make the worlds connect.
At the School of Library and Information Studies at UCL, as part of my master’s degree, I began to digitize some of the works of Garcia Lorca as part of the Text Encoding Initiative.
This soon came to seem like a preposterous task, when you realize that all code is inherently English. In HTML, 'p' means para, and 'h1' means heading 1. The art of marking Spanish literature in English code felt very muddy.
Mnemonics like this are an incredibly powerful tool for learning rapidly. But the foundations of the web are built upon these English mnemonic devices.
Just think how much harder it must be for a developer to remember why they are using p (as in paragraph) or a data-src attribute when those letters in that sequence have no logical meaning in their native language.
AI and the American-English-centric web
For 20 years now, I have lived with a healthy cynicism of this ‘world-wide’ web. It’s an Anglocentric construct, designed and built for, and by, the West, but gladly used by others. And yet, we act surprised when other nations might have their own dominant social networks, or block information they view as unhealthy to their citizens.
ChatGPT and Bard (among hundreds of other niche tools) are of course the new darlings of the web. The power to generate content at scale, to answer sophisticated queries with articulate and detailed responses.
Yet, while ChatGPT is nominally multilingual and can respond in Mandarin, it isn’t actually available in China. And, when responding in Chinese, how many Chinese sources is it actually calling upon, or is it just translating the same data set to another language?
Suggested newsletters for you
Multilingual does not equal multicultural
ChatGPT (and no doubt others, but all the facts are not clear yet) does a very good job at responding to questions in other languages, in other languages.
But that is language alone. The actual context of its responses is formed upon the content that it is fed - content from the web, selected initially by a handful of developers working on the project.
Research has found that earlier versions of the tools had 51% of their content coming from US-hosted webpages. Other English-speaking countries with sizable populations made little contribution to the data set.
The fiction books used to train the natural-speaking language models were founded on a set of around 700 English books. Wikipedia, in English, was used as one of the first data sets to work with.
And where other language content was included, it can widely be assumed much of this was in the form of translations of English first content. Google Translate, the first AI junk content bot of the web, churns out millions of webpages of content - translations of words, without the cultural context.
Why does it matter?
Well firstly, this illustrates that there is undoubtedly a huge bias underlying AI. Bias in the information it works with, and how it learns to speak to you about that information.
Search has always had its intrinsic bias through algorithms, but at least there was a choice. You were served with a range of views and could select from them.
The use of AI as research for content generation feeds a vicious circle - where the research done is provided on limited data, in language formed from English literature, and then fed back into that content engine of the web in whatever language
As marketers, we rely on content. And we research to produce that content. Some will use AI tools for this, the worst will just use AI to write that content.
The majority of us will no doubt come to rely on AI bots to feed us knowledge and help with that research. But it is crucial that we do so with our eyes wide open and dig deeper, learn and discover for ourselves, and keep providing value and context to the information we find.
Content by The Drum Network member:
Earnest is the award-winning B2B marketing agency that’s chasing out the humdrum in London and New York.
Why is B2B treated like the poor cousin to B2C? Business people are still people, after all – they just happen to be at work.
Since we opened for business in 2009, we’ve built brands, shaped strategies, produced content programmes, created experiences and developed campaigns that not only deliver results, but engage and delight their audiences too.
B2B marketing is tough. There are hard-to-reach audiences. Difficult-to-please internal stakeholders. And very often complex, intangible products.
That’s why B2B deserves just as much attention, passion and intellectual energy as B2C. And it’s why Earnest is on a mission to raise standards in B2B, creatively and strategically. Chasing out the humdrum, and ushering in the unexpected.
We positively relish the unique challenges that B2B marketing presents. Since we started the agency in 2009, we’ve earned a reputation for devising solutions that go beyond the obvious, often delivering far more than the client’s original objectives.
The agency offers an unusually broad mix of disciplines – including branding, campaigns, strategic planning, content, and experiential – and we’ve won awards for them all. That’s testament to the fact that we approach every challenge, of every size and every shape, in the same way – with high standards and open minds.