Prakhar Pratyush

Sitemaps

Jun 16, 2021

This blog is regarding a very cool concept related to Search Engine Optimization aka SEO upon which I worked in the last week.

First of all, I will give a little background of “How Google Search Works !” then explain what actually is this weird term “Sitemap” and how does it help in SEO.

How Google Search Works

We search something on Google and result comes up within a fraction of second. But Google is actually working upon your query even before you thought about searching that.

Google has a large number of Crawlers whose job is to visit billions of page on Internet, crawl the entire page, index them (If content is good) in the Google database and whenever a user searches for some query it shows up the result which it indexed earlier.

Ok lot of technical terms.. Right ?

Let’s take an example : Suppose you have a blogging website, now let’s walk through the process of Crawling-Indexing-Result

Google’s Crawler finds the URL of your homepage ( or any other ), it crawls ( go through ) the homepage checks what is the title, images, texts etc on the page and index the homepage URL with metadata related to that page (If the page is worth indexing). Think it like a Python dictionary ( maps in C++ ) where URL is your key and meta datas are the values.

Now Crawler watches for other URLs on the homepage like your About page, Individual Blog links. Crawler will visit those pages, suppose a blog page, it will go through the blog to determine what the blog is about, what’s the title, images, contents etc and will index that URL to Google’s database.

Now if some user searches for a topic which matches with your blog, Google will take your blog into consideration while showing up the search result. Search result and it’s ranking depends on various factors like : Your location, contents, page loading time, responsiveness of site and a lot more (Indexing and showing up the result follows a very complex algorithm but the major process is this)

What is Sitemap

Google Crawls billions of sites on internet itself, but it is a time consuming process. Suppose it crawled your site today now when it will crawl the next time is not sure. (Maybe months..Maybe years). However it prefers the sites which are highly ranked.

Why should it crawl again, crawling once is not enough ?

It is fine, if your site contents doesn’t changes much. What I mean to say, Sites like Quora, Stack Overflow have thousands of new questions/answers everyday so it is neccesary for them to get their newly created contents get indexed to show up in the search result otherwise it will not appear for months.

So Google provides a technique to submit a list of URLs of your newly created and updated contents on your site,this list is known as SITEMAP

Now google will go through those submitted URLs list and index it contents are worth indexing.

This amplifies the crawling and indexing process because :

  • Google knows that something has changed on that site, so crawl it as soon as possible
  • We are providing the filtered URLs which should be crawled (By default there is a chance google will miss the newly created URLs and just crawls the older URLs of the site)

Note :

It is not guaranteed that google will index all the provided URLs in the list, it is just a method of pinging google to crawl.

Algorithms will decide whether to index or not. It only shortens the crawling gap time.

This was the basic concept related to SITEMAP, there is a lot more to explore like how to actually submit the sitemap and ping google. Learning about robots.txt file.

Try exploring Amazon’s robots.txt file what is it !

Also You can learn more about sitemap and crawling-indexing process from Google’s official documentation :

Try registering your site on Google Search Console and explore it. (It’s Free)

Thank You !

© 2025 Prakhar Pratyush