BERT, DeepRank and Passage Indexing… the Holy Grail of Search?

The BERT update has been the biggest ranking algorithm update since RankBrain.

It is said to impact 1 in 10 of all search queries.

It was rolled out about a year ago in October 2019 and has been running with fantastic results since.

There is chatter that BERT will be called DeepRank and Google has made some significant advances into understanding naturally spoken language (through Natural Language Processing / NLP) which can lead to the holy grail fo search.

In this post I will talk about what all this means and, touch briefly on how it will impact the way SEO is done.

So, what exactly is B.E.R.T and how does it make things different in search?

Definition: B.E.R.T. stands for Bidirectional Encoder Representations from Transformers. It is an algorithm that uses Artificial Intelligence and Deep Learning with neural networks based technique for natural language processing to pre-train the machine (machine learning).

In plain English… BERT can be used to help Google better discern the context of words in search queries to help accurately understand the meaning of the query, so the best information can be shown to the user. Google says that about 10% of all search queries will/ have been affected with this change.

Google hired the best AI expert on the planet also known as the father of AI, Geoffrey Hinton from Canada and he worked with Jeff Dean to initiate the AI / Machine Learning research into NLP search queries using Deep Learning.

Understanding the Meaning and Context Behind Complex Search Queries

Google’s B.E.R.T. Update tackles the problem of understanding complex search queries of what the user is searching for and giving them results that closely match their search query.

The important part here is “understanding complex queries”

While simple queries are easy to decipher for Google, these longer more complex queries can cause problems as they often-times they can be construed to have multiple meanings if each word is not understood in context to the other words in the sentence.

For example, a simple query like “SEO course” or “best movies of 2020” are easy for Google to grasp and then look up their index and serve the page with the highest quality (most popular result) for the query. It is not ambiguous and the result is found easily with little or no confusion for the algorithm.

However, it is the super-long search queries that cause the ambiguity and issues. These comprise of a large fraction of searches. Perhaps about 10% to 15% of queries.

Most of these are “never seen before queries” and are first time queries on a day-to-day basis for Google, which is a startling statistic in itself.

The goal here for Google is that eventually, we should be able to ask our question in a more naturally spoken way, and get back a result that is correct.

A few examples of such a query would be something like…

Who is playing the soccer game tonight?
Who do I call for a tow truck around here?
Does anyone make a nail polish that is safe for dogs?
What are the best places for a vegan from Canada to eat in Paris?

Google has given us a few examples of “before and after BERT” on this

In one example, “math practice books for adults” surfaced a listing for a book for Grades 6 to 8 With BERT applied, a listing for a book titled “Math for Grownups” ranked at #1.

You can see for yourself below that the number one listing is indeed a book for adults and the lower listings are the ones for kids.

(image courtesy: SearchEngineLand.com)

This shows us that the searchers intent is indeed understood by Google and thus the matching higher result is the one that matches it better.

Another example of a Search Query – Before and After BERT

Google says in the past for the query “parking on a hill with no curb” it would strip out the stop word “no” and then process the query without realizing that the word “no” was actually a significant word in the context and meaning of the query.

Before BERT Google would basically take these complex queries and remove all the stop words, and take the main keywords in the search, and then look up the best match in its index of stored pages having the same / similar words based on brute force calculation (no understanding or AI / deep learnings applied).

With BERT Google keeps all the words and uses all the words in the sentence to now try to understand what the query is about.

Google is now looking at the words and their relationship of their appearance in the sentence with the stop words included and the probability of what they refer to (removing all the ambiguity) of their “context” in the meaning behind the entire sentence.

The algorithm now uses the stop words (that it would previously strip out from the search query before sending it to the core algorithm to look up the index) to find the context and meaning of the sentence and how the words are relating to one another.

Google wants you to ask your question in the most natural way that you can and to take that and understand what you had in mind, and then give you the search result and satisfy your query with the information they have available.

BERT using Deep Learning for Natural Language Processing

Understanding naturally spoken language is termed as NLP (Natural Language Processing).

For Google, this is where the holy grail of search really should be.

Google should be able to understand the long query and match up the best and most relevant pages for you in the search results.

And, this is now going to be called DeepRank in the future and not BERT.

The word DeepRank is taken from Deep Learning (a type of Machine Learning / AI) and the Ranking aspect of search.

DeepRank is now live as of October 2020 and is used to process 100% of search queries and not just the 10% BERT processed. DeepRank has been referenced back in this 2017 research paper.

“BERT operates in a completely different manner,” said Eric Enge, general manager at Perficient Digital. “Traditional algorithms do try to look at the content on a page to understand what it’s about and what it may be relevant to. However, traditional NLP algorithms typically are only able to look at the content before a word OR the content after a word for additional context to help it better understand the meaning of that word. The bidirectional component of BERT is what makes it different.” As mentioned above, BERT looks at the content before and after a word to inform its understanding of the meaning and relevance of that word. “This is a critical enhancement in natural language processing as human communication is naturally layered and complex.”

Trillions of Questions, No Easy Answers: A (home) movie about how Google Search works

For those interested in some geeky stuff… Google released a movie on Search in October 2020, which is an interesting watch for all you SEOs and techies out there. (If you want to skip to the cool stuff about DeepRank, jump to about 38 minutes or so… but, overall the entire movie is a good watch).

Passage Indexing – A Significant Update Is Coming…

Google has informed us about an update called “passage indexing” that is coming soon. With this Google will be able to pull out paragraphs and passages within a page and serve them directly to the user as a result for his search query.

Instead of ranking entire pages, Google will now rank and show specific passages from within pages and documents in response to some queries.

At the 2020 Search On event Google’s Prabhakar Raghavan, when announcing “passage indexing” also said , “We’ve recently made a breakthrough in ranking.”

He goes on to mention DeepCT an exciting method that Google will be using, in addition to the TF:IDF that it has used in the past, and that has significant implementations that are very revolutionary in helping understand the meaning and context behind sentences.

If you want to get an in-depth understanding of these recent changes you should read passage indexing and how it will leverage BERT and, a Deep Dive into BERT.

Google’s Raghavan explains:

“Very specific searches can be the hardest to get right, since sometimes the single sentence that answers your question might be buried deep in a web page. We’ve recently made a breakthrough in ranking and are now able to not just index web pages, but individual passages from the pages. By better understanding the relevancy of specific passages, not just the overall page, we can find that needle-in-a-haystack information you’re looking for. This technology will improve 7 percent of search queries across all languages as we roll it out globally.” (Prabhakar, 2020)

An amazing example of Passage Indexing result in the SERPs is given below…

”With our new technology, we’ll be able to better identify and understand key passages on a web page. This will help us surface content that might otherwise not be seen as relevant when considering a page only as a whole….,” Google said at the Search on event 2020.

John Mueller went on to say… “So my kind of taking a step back and just guessing at this, with my internal information. Usually what happens with these things is we will roll them out in in one particular place, experiment a bit to find out how to best implement these, how they best work and then find ways to roll that out a little bit more broadly.
So it might be that we start showing these in the featured snippets first because I don’t know we showed that example or maybe that’s the clearest way we can check this. And then at some point we start showing them more in the normal search results as well.
But again kind of like with all of these newer changes in search. Usually we we try them in a small scale and then roll them out a little bit larger over time.

What this means, is that a good answer may well be found inside a passage in an otherwise broad topic document… or perhaps a random blurb without any focus on content.

Such instances are found all over the web in blog posts and countless opinion pieces, with unrelated content or mixed topics.

I do believe this matches up with BERT because now Google will understand the other side of the equation… the content of the pages themselves (and not just the query).

We haven’t heard from Google if passage-based ranking will yield just featured snippet results or both FS & results in the 10-blue links. So I asked @johnmu yesterday in the hangout. First, he was clear that he honestly doesn’t know the answer, but did… https://t.co/7zKrIuBuHW pic.twitter.com/YQrVhnJBes
— Glenn Gabe (@glenngabe) November 11, 2020

The Holy Grail of Search, Are We Finally There?

Here’s my brief take on how DeepRank will match up with Passage Indexing, and thus open up the doors to the holy grail of search finally.

Google will use Deep Learning to understand each sentence and paragraph and the meaning behind these paragraphs and now match up your search query meaning with the paragraph that is giving the best answer.

Once Google understands the meaning of what each paragraph is saying on the web, it will be able to show you just the extracted paragraph from the full page that has the answer / information you seek.

This will be like a two-way match… the algorithm will have to process every sentence and paragraph and page with the DeepRank (Deep Learning algorithm) to understand its context and store it not just in a simple word-mapped index but in some kind-of database that understands what each sentence is about so it can serve it out to a query that is processed and understood.

There is no doubt, that this kind of processing will require tremendous computing resources. Which other company is set up for this kind of computing power than Google?

So, What Does All This Mean for SEO?

How should we as SEOs be optimizing for this? Have any studies been done and what is the initial feedback on this?

For one, tests and results have already shown authority and trusted sites will be ranking higher for these NLP queries.

You should be focusing on creating high value content. Not every page needs to be like that. But you need to create some core content pages that positions your site with a handful of authority content pages.

Moving forward, the way to create content for higher organic rankings, should be to write fewer articles, but each with very high value and that are each basically link magnets (or link baits).

They should naturally attract links from other relevant pages on authority websites out there on the same topic, or it should be easy for you to do outreach and get these backlinks.

You can then let your link juice flow from these prominent pages that have these powerful and relevant backlinks – to your other pages on your site, and Google will increase the overall authority of your site since your internal linking is strong (results also showed sites with more internal linking ranking higher after the BERT update).

The above, is a quick strategy that is currently working and is helping sites with high quality content rank higher in the SERPs after the BERT update.

2 thoughts on “BERT, DeepRank and Passage Indexing… the Holy Grail of Search?”

Umair Sajjad

November 14, 2020 at 8:55 am

In this way, only high authoriy and Brand sites will rank in serps, so what about seos like me? It’s now only the game of authority, build high power links, creat 1000 words content with quality words and u r all set to go.
What about writing 10k words and then making hard core on page changes? I think it’s gone now.
- Vishal Lamba
  
  November 14, 2020 at 9:19 am
  
  You need to build the trust of your site which can be done through getting brand entity signals first and then just get some higher authority back links to your high value content. Remember new pages go through the google dance when on page 1 higher up so they are checking the bounce rates and to see if users search query / intent was met by your page .. so if your ranking are to stick, the content must be of high value and satisfy user intent to visit your page. Get this and other internal silos right and you can win any almost any niche.