Richard Wallis on his involvement with Google and Schema.org

The marketing sector can be a complicated place as new marketing tools and techniques are launched, almost on a weekly basis. Powered by The Drum Network, this regular column invites The Drum Network's members to demystify the marketing trade and offer expert insight and opinion on what is happening in the marketing industry today that can help your business tomorrow.

Click Consult caught up with thought leader Richard Wallis ahead of his session at the Benchmark Search and Digital conference.

While Richard Wallis is one of the most recognisable names in the world to those working in the field of linked data and the semantic web, your average SEO agent could be forgiven for not recognising the name without a quick Google. Nevertheless, with more than 20 years under his belt working in and advising on information architecture, Wallis is a leading voice in what is one of the most important web developments to have emerged in the last decade.

Thanks to his involvement in Click Consult’s upcoming sixth annual Benchmark Search Conference, I was delighted to be able to ask him a few questions to help highlight both his personal contribution and the contribution of structured data to the modern web.

A little context on structured data and Schema.org:

Rather than being a subset or variety of data, when we refer to ‘structured data’ we are talking about an organisational construct of data. While prose may convey information, it tends to do so in an organic way - conveying information in a looser, more conversational manner. However, if you were to study the prose, and distill its meaning into, for example, a table – this would represent ‘structured’ data - essentially the same information in an easier-to-digest format.

Where search engines are concerned, it is easier for an algorithm to parse information if it is offered within a scaffold or framework of structural information. This tends to be done using HTML, microdata and JSON-LD cues that provide the search engine with additional pointers that it can use to determine the nature of the data it is processing.

Schema.org is an open-source project established by Google, Microsoft and Yahoo that aimed to establish a uniform vocabulary for this structured data – essentially creating a schema for data anywhere on the web.

Firstly, Richard, how did you become involved in Schema.org and what is Schema.org?

Wallis: My involvement with Schema.org is the result of a natural progression and following a combination of interests, passions, and business needs. Having been involved in computing for longer than I'd like to admit, with a focus on data, metadata, and discovery coming from the world of libraries; the emergence of Semantic Web, followed by its more practical offspring, Linked Data, created opportunities for practical solutions to age old problems.

Linked Data held the promise of globally open, understandable and consumable data – a promise that, unfortunately, has only been fulfilled for those willing and able to immerse themselves in that environment and the multiple vocabularies it uses.

By 2011, we were in the frustrating position of having proven established (web-based) techniques and technologies that, in individual cases, delivered significant benefit for openly describing and discovering things, but with each implementation operating in its own fairly closed world, with its own selection of vocabulary terms (ontologies).

To me, it felt analogous to humans inventing speech but then creating a multiplicity of languages, whereas computers much prefer, at least at the core, a single vocabulary. 2011 was the year when Google, Bing, and Yahoo! launched Schema.org and brought hope that those frustrations would dissipate, and these technologies would become beneficial for all, backed by that powerful driver of the potential for commercial advantage.

With some simple, yet widely supported, steps of an agreed shared general purpose vocabulary, an easy route to sharing data by embedding it in HTML, backed by a good reason to use them (the potential of rich results) Google and the other search engines provided a de facto route towards a Semantic (enabled) Web.

Realising the potential, yet being aware of the initial limitations of a general purpose vocabulary for sufficient capability for describing things in sufficient detail for useful discovery, I took advantage of the open participative nature of Schema.org development, to form, and chair a W3C Community Group to create proposals for improving Schema.org for the benefit of bibliographic data (my then commercial focus).

Since that time, with my broadening scope across all sectors as an independent consultant, and participation with and chairing of several other W3C Community Groups, I have played a significant part in the evolution of the vocabulary from the initial few hundred terms to the now approaching two thousand terms, and have gained much implementation experience along the way.

In support of this, I have combined my consultancy in, and evangelism for, Schema.org with active involvement with Google (one of my consultancy clients) and others in the development of the Schema.org site, supporting documentation, and active engagement in the supporting communities.

So, to answer the first part of the question, how did I become involved? It made sense as to the trends around the future direction and development of the web, while also chiming well with my personal and commercial interests.

How important do you feel the Schema.org project is to search?

Wallis: The first 20 years or so of the web taught us that the only way to find things was to search for them - a message emphasised by the capabilities and business models of the search engine organisations. Humans, however, use a combination of search and discovery (of relationships between things) to find them. For example, how often does a library or bookshop user, having searched for the right location/shelf for the book they were looking for, come away with a different but related item that they discovered?

Search engines have always strived to achieve this, but until the broad acceptance and implementation of Schema.org structured data on a large number of sites, they did not have sufficient reliable and connected data to produce reliable useful features. As they amass more and more of this structured data we are seeing the capabilities it is powering – Knowledge Panels, position zero results, answer boxes, and various other rich results. More and more, Knowledge Graph-powered features are being announced at regular intervals. One of these features is Dataset Search – which probably would not exist without Knowledge Graph and the structured data (Schema.org) which feeds it.

Does this mean that it's the end of traditional search? Far from it. Search is a key component for humans and our supporting technologies' ability to find things. However, of equal importance are the Knowledge Graphs the search engines are building as they evolve from simple Search engines towards Discovery and Answer tools and services. This move is nowhere more apparent than with voice-enabled devices.

In very simple terms, if you want your article, product, corporate identity, or anything else, really, to be relevant in this new world, you also need to be in the Knowledge Graphs. To get into those Knowledge Graphs you can do no better than to be sharing structured data (using Schema.org) embedded in your websites for crawling by the search engines.

Finally, how do you see schema developing in the future?

There are two aspects to this. Firstly, by building on the participative open community model that has elevated Schema.org to where it is today, proposals will continue to be put forward and often adopted to address areas of limited capability. For example, currently under discussion is a proposal to introduce terms to describe product return processes. The result hopefully will be a community agreed set of enhancements released in one of the regular (currently monthly) update releases of the vocabulary's definition.

Secondly, through the efforts of people such as myself, self-taught enthusiastic developers, and continued expansion of documentation released by Schema.org, Google & others, the general structured data skill levels will increase and improve across all sectors, and especially in SEO associated industries.

Many in those sectors do not have a background in structured, linked or semantic data and, therefore, see Schema.org as either "for rich snippets only" or "keywords on steroids" or often both. This produces what looks like valid Schema.org code, validated by search engine test tools, but they are often of little or no use to the search engine trying to place your article, product, organisation, in context with the millions of others they are aware of, in a way that might lead them to direct their users to your page/site.

Schema.org is nearly eight years old. In many eyes, though, it's still a very recent arrival. Over time it will become just another thing that web developers will naturally take account of, becoming a bedfellow with JavaScript, CSS, and the rest. What new adopters need to be aware of is that it is not rocket science, but like most technologies, you can "just do it" or you "can do it right to get the most benefit".

Richard Wallis is an independent innovative thought leader, evangelist and consultant for Linked, Open, Structured and Actionable Data, Schema.org and its pragmatic application to the real world on the web and in the enterprise. He will be speaking at Benchmark Search and Digital Conference on the 11th of September.

You can still apply for the limited remaining tickets here.

John Warner is the senior SEO marketing and content executive at Click Consult.

Get The Drum Newsletter

Build your marketing knowledge by choosing from daily news bulletins or a weekly special.

Come on in, it’s free.

This isn’t a paywall. It’s a freewall. We don’t want to get in the way of what you came here for, so this will only take a few seconds.