



PageRank was once at the heart of search and what made Google the empire it is today.

Even if you believe search has migrated away from PageRank, there’s no denying that it’s a concept that has long permeated the industry.

Every SEO pro should have a good understanding of what PageRank was and what it is today.

This article covers:

What is PageRank? A history of how PageRank evolved. How PageRank Revolutionized Search His PageRank and PageRank in the Toolbar. How PageRank works. How PageRank flows between pages. Is PageRank still in use?

Let’s dive in.

What is PageRank?

Created by Google founders Larry Page and Sergey Brin, PageRank is an algorithm based on the combined relative strength of all hyperlinks on the Internet.

Most claim that the name is based on Larry Page’s surname, but others argue that “page” refers to a web page. Both positions are probably true, and the overlap is probably intentional.

When Page and Brin were at Stanford University, they wrote a paper titled “PageRank Citation Ranking: Bringing Order to the Web.”

This paper, published in January 1999, presents a relatively simple algorithm for evaluating the strength of web pages.

Image from patents.google.com, April 2023

The paper was patented in the United States (but not in Europe, where formulas are not patentable).

Image from patents.google.com, April 2023

Stanford University owns this patent and has assigned it to Google. This patent is currently scheduled to expire in 2027.

Image PageRank Evolution History from patents.google.com, April 2023

While at Stanford in the late 1990s, Brin and Page both studied information retrieval methods.

At the time, using links to determine how “important” each page was compared to others was an innovative way to order pages. It was computationally difficult, but by no means impossible.

The idea soon morphed into Google, which at the time was a small fry in the search world.

There was so much organizational belief in Google’s approach from some parties that Google initially launched the search engine without the ability to generate revenue.

And while Google (then known as “BackRub”) was the search engine, PageRank was the algorithm used to rank pages within search engine result pages (SERPs).

google dance

One of PageRank’s challenges was that the calculation, while simple, had to be iterative. The calculation is performed multiple times for every page and every link on the internet. At the turn of the 2000s, this calculation took several days to process.

During that time, Google’s SERP fluctuated up and down. These changes were often unstable as a new PageRank was calculated for each page.

Known as the “Google Dance”, it was infamous for stopping SEO experts of the day whenever Google launched its monthly updates.

(Google Dance later became the name of an annual party Google hosts for SEO professionals at its Mountain View headquarters.)

trusted seed

Subsequent iterations of PageRank introduced the idea of ​​a “trusted seed” set to kick off the algorithm, rather than giving every page on the internet the same initial value.

reasonable surfer

Another iteration of the model introduced the idea of ​​a “rational surfer”.

This model suggests that a page’s PageRank may not be shared evenly with the pages it links to, but the relative values ​​can be weighted.

Regression of page rank

Google’s algorithm was initially considered “spam-fighting” internally, as the importance of a page is determined not only by its content, but also by a kind of “voting system” generated by links to the page. rice field.

But Google’s confidence didn’t last long.

PageRank started to become a problem as the backlink industry grew. As such, Google withdrew it from public view, but continued to rely on it in its ranking algorithms.

The PageRank toolbar was retired by 2016, eventually restricting all public access to PageRank. But by this point Majestic in particular (an SEO tool) was able to correlate their own calculations very well with his PageRank.

Until January 2017, Google encouraged SEO professionals not to manipulate links through the “Google Guidelines” document and advice from the spam team led by Matt Cutts.

Google’s algorithm was also changing during this time.

After the company reduced its reliance on PageRank and acquired MetaWeb and its own knowledge graph (called “Freebase” in 2014), Google began indexing the world’s information in a variety of ways.

Toolbar PageRank and PageRank

Google was initially so proud of its algorithm that it was happy to release it to anyone who wanted to know what it calculated.

The most notable representation was a toolbar extension for browsers such as Firefox, which scored from 0 to 10 for every page on the Internet.

In reality, PageRank has a much wider range of scores, but the 0-10 allowed SEO professionals and consumers to instantly assess the importance of a page on the Internet.

The PageRank toolbar made the algorithm very easy to see, but it also came with complexity. Specifically, I meant that links are clearly the easiest way to “game” Google.

The more links (or more precisely, the better the links), the better your page will rank in Google’s SERPs for your target keyword.

This meant that a secondary market was formed to buy and sell links based on the PageRank of the URL where the links were sold.

The problem got worse when Yahoo released a free tool called Yahoo Search Explorer. With this tool, anyone can now search for links to specific pages.

Two other tools, Moz and Majestic, have since built on the free option by building their own indexes on the Internet and evaluating links independently.

How PageRank Revolutionized Search

Other search engines relied heavily on analyzing the content of each page individually. These methods could hardly discern the difference between influential pages and pages simply written with random (or manipulated) text.

This meant that the search methods of other search engines were very easy to manipulate for SEO professionals.

Google’s PageRank algorithm was revolutionary.

Combined with the relatively simple concept of “nGrams” to help establish relevance, Google has found a winning formula.

It quickly overtook major incumbents of the time, including AltaVista and Inkomi (which underpinned MSN and others).

Also, by operating at the page level, Google found a much more scalable solution than the “directory” based approach taken by Yahoo and later by DMOZ. However, DMOZ (also known as the Open Directory Project) was initially able to provide Google with an open source solution. own directory.

How PageRank works

The PageRank formula takes many forms, but it can be explained in a few sentences.

First, every page on the Internet is given an estimated PageRank score. This can be any number. Historically, PageRank was commonly presented as a score between 0 and 10, but in practice, estimation need not start in this range.

The PageRank for that page is divided by the number of off-page links to get the smaller fraction.

PageRank is then distributed to the linked pages, and all other pages on the internet do the same.

Then, in the next iteration of the algorithm, the new estimate of PageRank for each page will be the sum of all parts of the page that link to each given page.

The formula also includes a “decay factor,” which describes the likelihood that a person who is surfing the web will stop surfing altogether.

The proposed new PageRank is reduced by a decay factor before subsequent iterations of the algorithm begin.

This method is repeated until PageRank scores reach a stable equilibrium. The resulting numbers were usually replaced by the more recognizable 0-10 range for convenience.

One way to express this mathematically is:

Image from the author, April 2023

where:

PR = PageRank in the next iteration of the algorithm. d = decay rate. j = page number on the Internet (if every page has a unique number). n = total number of pages on the Internet. i = iteration of the algorithm (initially set to 0).

Expressions can also be expressed in matrix form.

Iteration of problems and formulas

There are some challenges with this formula.

If the page does not link to other pages, the formula will not reach equilibrium.

Therefore, in this event PageRank is distributed across all pages on the internet. This way, pages with no incoming links can still get some PageRank, but it won’t accumulate to a significant value.

Another less documented issue is that newer pages can be more important than older pages, yet have a lower PageRank. This means that over time, his PageRank for older content can grow disproportionately high.

The time the page has been published is not taken into account by the algorithm.

How PageRank flows across pages

If a page starts with a value of 5 and has 10 links, all linked pages will be given a 0.5 PageRank (excluding the decay factor).

This way PageRank flows on the internet between iterations.

When new pages appear on the Internet, they start with a tiny amount of PageRank. However, as other pages start linking to these pages, his PageRank for those pages increases over time.

Is PageRank still in use?

Public availability to PageRank was removed in 2016, but the scores are still believed to be available to search engineers within Google.

It turns out that PageRank remains an available factor due to a leak of the factors used by Yandex.

Google engineers suggested that the original form of PageRank be replaced with a new approximation that requires less processing power to compute. This formula is not very important in how Google ranks pages, but it remains constant for each web page.

And regardless of what other algorithms Google chooses to use, it’s likely that PageRank continues to be built into many of the search giant’s systems to this day.

Dixon details how PageRank works in this video.

Featured Image: VectorMine/Shutterstock

