What is BERT?

BERT, short for Bidirectional Encoder Representations from Transformers, is an artificial intelligence model and machine learning algorithm that Google uses to understand the intent of a search query.

Google released BERT as an open-source project in 2018. Before BERT, Google uncovered the search intent by analyzing the keywords in the search query. However, with BERT, Google now uses Natural Language Processing (NLP) technology.

Instead of just matching keywords in the search query with keywords in the results displayed on search results pages, BERT identifies the crucial words in the search query and uses them to uncover the context in which they are used. This, in turn, allows Google to understand the search intent and deliver more relevant results in return.

BERT is not a generative AI like ChatGPT or Gemini. Instead, BERT is a language understanding model designed for tasks like text classification, question answering, and named entity recognition. It is trained to understand and process text by predicting masked words within sentences rather than generating new text.

Generative AI models, like GPT, produce new content by generating sequences of words, sentences, or entire documents. In contrast, BERT’s primary strength lies in analyzing and comprehending existing text rather than creating new content.

How is BERT Different From Other AI Models

BERT was unique among other AI models that existed when it was released because of its ability to understand the context in which a word is used.

Other AI models at the time could only predict the words that come after a group of words. This means they were unidirectional. That is, they can only predict words in one direction. In this case, from left to right.

However, BERT predicts the word that comes before or after a group of words. This makes it bidirectional, as it can predict words in both directions. That is, it can predict words from left to right and right to left.

How BERT Works

BERT was trained to understand the context in which a word is used. For example, let us consider two sentences:

This is a chair
Tom will chair the party

‘Chair’ has different meanings in both sentences. In the first sentence, it refers to the furniture we sit on, while in the other one, it indicates that Tom will be in charge of the party.

Normally, machine learning models convert words into vectors, which are a group of numbers. So, a machine learning model like word2vec, for example, may use the same vector for ‘chair’ in both sentences. That is, it does not differentiate between ‘chair’ as an item of furniture and ‘chair’ as being in charge of an event.

However, BERT will use different vectors for the words since it understands that chair is used in different contexts in both sentences. So, it treats them as if they are different words even though they have the exact same spelling.

This becomes helpful during training as it allows BERT to predict the words that should appear in a sentence more accurately.

For instance, one common training for masked language models like BERT involves using masked words. A masked word is a word that is intentionally hidden or replaced with a placeholder during training.

For example, “Every weekend, I enjoy taking my dog to the [MASK], where we can relax and enjoy the fresh air.”

BERT would have to analyze the words that appear before and after the [MASK] placeholder and use that to predict the masked word. In this case, the masked word could be park, beach, or countryside since they are the sorts of places a person could take their dog to on a weekend to relax and get some fresh air.

This training method is different from other AI models that would typically only assess the words that come before the masked word and use that to predict the masked word.

When applied to search queries, BERT identifies the most important words in a search query. It then analyzes the words on either side of the search query and uses them to understand the context in which they are used with respect to the important word. This is why BERT can uncover the intent of a search query.

How BERT Improves Google Search Results

Google uses BERT to understand the intent of a search query. This is especially helpful for identifying the intent of conversational queries, long-tail queries, and queries that contain important prepositions like ‘for’ and ‘to.’

For example, the image below compares the results Google displays for a search query powered by BERT and one that is not.

Sample of a search results page powered by BERT — Image source: Google Blog

In the image above, Google did not understand the search query before BERT was applied and returned results about US citizens traveling to Brazil. However, after BERT was applied, it returned results about Brazilian travelers going to the US.

In the search result below, Google matched ‘stand-alone’ in the top-ranking results with ‘stand’ included in the search query. This resulted in an unhelpful result that did not answer the query. However, with BERT, Google understood the query and returned a more relevant result.

What is BERT?

How is BERT Different From Other AI Models

How BERT Works

How BERT Improves Google Search Results

Related terms:

Responsive Design

Meta Keywords

Largest Contentful Paint (LCP)

Gated Content

Nofollow