What is a sitemap url

How to Create an XML Sitemap (and Submit it to Google)

Just as hard as it is to find a new place without a map, it is sometimes even for Google to find all the pages on your website without a sitemap.

Fortunately, it's quick and easy to create an XML sitemap and submit it to Google.

We'll go through how to do this step by step below.

But let's go over some basics first.

(Do you already know the basics? Click here to jump directly to creating a sitemap.)

A sitemap is an XML file that lists all of the important content on your website. Any page or file that you want to appear on search engines should be in your sitemap.

Sitemaps cannot list more than 50,000 URLs and they cannot be larger than 50MB. If your sitemap exceeds one or more of these numbers, you will need to create more than one.

What does an XML sitemap look like?

XML sitemaps are made for search engines, not people. This is why they can look a bit overwhelming if you've never seen one before.

<?xml version="1.0" encoding="UTF-8"?> <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"> <url> <loc>https://ahrefs.com/</loc> <lastmod>2019-08-21T16:12:20+03:00</lastmod> </url> <url> <loc>https://ahrefs.com.com/blog/</loc> <lastmod>2019-07-31T07:56:12+03:00</lastmod> </url> </urlset>

Let's break this down.

XML declaration

<?xml version="1.0" encoding="UTF-8"?>

This tells search engines that they are reading an XML file. It also indicates the version of the XML and the character encoding used. For sitemaps should Version 1.0 is used and the encoding got to UTF-8.

URL set

<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">

This is a container for all URLs in the sitemap. It also tells crawlers which protocol standard is used. Most sitemaps specify the sitemap 0.90 standard, which is used by Google, Yahoo! and Microsoft is supported.

Url

<url> <loc>https://ahrefs.com/</loc> <lastmod>2019-08-21T16:12:20+03:00</lastmod> </url>

This is the parent tag for every URL. You need to specify the location of a URL in a nested tag. It is very important that these must be absolute, not relative, canonical URLs.

While this is the only day required here, there are a few optional properties:

  • : Specifies the date when the file was last modified. This must be done in the W3C date format. For example, if you updated an article on September 25, 2019, the attribute would be 2019-09-25. You can also add the time, but that's optional.
  • : Specifies the priority of the URL relative to all other URLs on the page. The values ​​range between 0.0 and 1.0. Higher is more important.
  • : Specifies how often the page is likely to change. Its purpose is to give search engines an idea of ​​how often they should crawl the URL again. Valid values ​​are here always, hourly, daily, weekly, monthly, yearly, and never.

None of these optional tags are really important for SEO.

Too, Google's Gary Ilyes states that for the most part they ignore it because "webmasters do a terrible job of keeping it correct." Because many sitemap generators set this to the current date for all pages and not the date when the file was last edited, it's easy to see why.

Too, Google says they are ignoring this day because it is just a "big mess".

To, says John Mueller, "Priority and Changefreq don't play such a big role in sitemaps anymore."

Why do I need a sitemap?

Google discovers new content by crawling the internet. When crawling a page, consider both internal and external links on that page. If a discovered page is not in your search index, you can parse its content and index it where it makes sense.

But Google cannot find all content this way. If a page is not linked by other known pages, you will not find it.

That's where sitemaps come into play.

Sitemaps show Google (and other search engines) where to find the most important pages on your website so they can crawl and index them. This is important because search engines cannot rank content without indexing it first.

How to create a sitemap

Some CMS generate a sitemap for you. This is automatically updated as you add or remove pages and posts from your site. If your CMS cannot do this, there is usually a plugin available that can do this.

Create a sitemap in WordPress

Even though WordPress is behind 34.5% of all websites, it doesn't generate a sitemap for you. To create one, you'll need to use a plugin like Yoast SEO.

To install Yoast, log into your WordPress dashboard.

Go to Plugins> Add New.

Search for "Yoast SEO."

Click “Install Now” on the first result and then “Activate”.

Go to SEO> General> Features. Make sure that the item “XML sitemaps” is activated.

You should now see your sitemap (or index sitemap) at yourdomain.com/sitemap.xml or yourdomain.com/sitemap_index.xml.

If you want to explicitly include or exclude some page types (tags, category pages, etc.) in your sitemap, then go to the “Search Appearance” settings.

You can also exclude individual posts or pages using the “Advanced” meta box in the editor.

IMPORTANT. Only exclude pages from your sitemap that you don't want to see in search results.

Learn more about it in our WordPress SEO Guide.

Create a sitemap in Wix

Wix will automatically create a sitemap for you. You can take it under yourwixsite.com/sitemap.xml Find.

Unfortunately, you don't have much control over which pages go into your sitemap or not. If you want to exclude a page, go to the “SEO (Google)” settings tab for that page and turn off the “Show this page in search results” switch.

Note that this will also add a noindex meta tag to this page, which will prevent it from appearing in search results.

Create a sitemap with Squarespace

Squarespace will also automatically create a sitemap for you. You can usually find them under yoursquarespacesite.com/sitemap.xml Find.

There is no way to manually edit your sitemap in Squarespace, but you can exclude noindex pages from search engines using the “SEO” tab.

This will also remove the pages from your sitemap.

Create a sitemap in Shopify

Shopify will automatically create a sitemap for you. You can take it underyourstore.com/sitemap.xml Find.

Unfortunately, there is no easy way to set a page to noindex in Shopify. You have to edit the code in the .liquid files directly.

Create a sitemap without a CMS

If you think you have less than ~ 300 pages on your website, then install the free version of Screaming Frog.

Once it's installed, go to Mode> Spider.

Paste your homepage url into the box labeled "Enter URL to spider."

Click "Start."

Look in the lower right corner as soon as the crawl is done.

It will say something like this:

If the number is 499 or less, go to Sitemaps> XML sitemap.

Since Google doesn't pay much attention to,, and, we recommend excluding them from the sitemap file.

Click “Next” and save the sitemap on your computer. Done.

If the number shows “500 out of 500”, then there is no point in exporting the sitemap. Why? Because it means you reached the crawl limit before all of the pages on your site could be crawled. As a result, hundreds of pages could be missing from the exported sitemap — which makes it pretty pointless.

One way to solve this is to find a free sitemap generator. There are a lot of them.

Unfortunately, most of them are not reliable.

We tested some of the most popular generators and found that quite a few contained non-canonical URLs, non-indexed pages, and redirects. This is badly practiced SEO.

generatorUses canonical URLs?Uses non-indexable urls?Uses redirects?
xml-sitemaps.comYes ❌No ✅No ✅
web-site-map.comYes ❌No ✅No ✅
xmlsitemapgenerator.orgYes ❌No ✅No ✅
smallseotools.com/xml-sitemap-generatorYes ❌Yes ❌Yes ❌
freesitemapgenerator.comYes ❌Yes ❌Yes ❌
duplichecker.com/xml-sitemap-generator.phpYes ❌Yes ❌Yes ❌
xsitemap.comYes ❌Yes ❌Yes ❌

So what's the solution?

If Screaming Frog couldn't crawl your entire site, then use Ahrefs Site Audit to crawl your site.

https://www.youtube.com/watch?v=LjinWqfGyVE

Once the crawl is complete, you should go to the Page Explorer and add these filters.

Click Export> Current Table View.

Open the CSV file, then copy and paste all the urls from the url column into this tool.

Click "Add to queue" then "Export queue as sitemap.xml."

This file is your finished sitemap.

How to submit a sitemap to Google

To get started, you should know where your sitemap is.

If you're using a plugin, chances are the url is: domain.com/sitemap.xml.

If you do this manually, then name your sitemap e.g. sitemap.xml and then load it into the root directory of your website. You should then find the sitemap under domain.com/sitemap.xml reachable.

Go to Google Search Console> Sitemaps> Add to sitemap location> Click “Confirm”

That was it. Finished.

It is also useful to add your sitemap URL (s) to the robots.txt.

You can find this file in the root directory of your web server. To add your sitemap, open the file and add the following line:

Sitemap: https://www.yourdomain.com/sitemap.xml

You'll need to swap the sample URL with the location of your sitemap.

If you have multiple sitemaps, add multiple lines.

Sitemap: https://www.asos.com/sitemap_1.xml

Sitemap: https://www.asos.com/sitemap_2.xml

Common website errors affecting your sitemap

The Google Search Console shows you most of the technical errors related to your sitemap.

For example, here is a warning that one of our submitted URLs is being blocked by the robots.txt:

You can find out more about this topic and how to solve it here.

Still, there are some issues that Google won't tell you about.

Below are some of the most common errors and how to find and fix them.

Useless, poor quality pages in your sitemap

Every page in your sitemap should now be indexable and canonical.

Unfortunately, that doesn't mean that all pages are high quality. If you have a large amount of content, chances are poor quality pages made it into the sitemap.

For example, look at the following two pages on an e-commerce site:

Neither of these are valuable to searchers, yet they are in the site's sitemap and Google has indexed both of them.

To find these pages go to Site Audit> Duplicate content

Look for areas with duplicate and nearly duplicate pages with no canonicals. These are the orange squares. Click on one to see the pages in the group.

Take a look at the pages and see if they have any value.

Having low quality pages on the website is bad for three reasons:

  • You're wasting crawl budget. Getting Google to waste time and resources crawling useless, bad pages is not ideal. You could invest that time crawling more important content instead. (For a record: Google saysthat crawl budget "is not something most publishers should worry about.")
  • They “steal” link authority from more important sites. There is a clear correlation between the authority of pages and their rankings. Internal links to bad pages only weaken the authority that could flow to more important pages. (Interestingly, we saw an increase rather than a decrease in traffic when we almost got the Deleted half of the articles on the Ahrefs blog.)
  • They lead to a poorer user experience. There is no added value for visitors to land on these pages. It is annoying for visitors to click and they might leave the page if they think the page is bad or neglected.

All in all, the best thing to do is to remove poor quality content from your website, including your sitemap. If you do, then you should remember to remove the internal links to these pages. Failure to do so would result in one problem (poor quality content) being swapped for another (broken links).

In addition to duplicates and almost duplicates, you can also search for pages with little content.

Just check the “On-Page” report in the Site Audit for pages with a “Low Word” warning.

Pages accidentally excluded from the sitemap

If you've followed one of the recommended methods above to create your sitemap, pages with noindex or canonical tags (not self-referential) will not be included.

That is a good thing. You shouldn't include canonical URLs or non-indexable pages in your sitemap.

This means that if you have inconsistent noindex tags on your page, pages can be erroneously excluded.

To check for errors, go to the "Indexability" report in Site Audit and click the "Noindex page" warning. This shows all unindexed pages.

Most of these will not be indexable on purpose, but it makes sense to double-check. Usually fake noindex tags are easy to spot as they exist on entire subsections of your website.

If you come across any pages that shouldn't be set to noindex, remove the noindex tag from the page and add them to your sitemap. If you're using a CMS, it should happen automatically.

It also makes sense to check for incorrect canonicals and redirects. To do that, go to Page Explorer and add the following filters:

Remove any false canonicals and redirects, then add the relevant pages to your sitemap.

FAQs

Here are some answers to frequently asked questions about sitemaps. Let us know if you have any other questions that aren't answered here and we'll add them.

Do you need a sitemap for AMP pages?

Nope.

@ Kfowler325 No need for sitemaps for AMP pages - the rel = amphtml link is enough for us.

- 🍌 John 🍌 (@JohnMu) 13 October 2016

How do I create a sitemap for an e-commerce website?

You can create a sitemap for an e-commerce site the same way you would for any site. In particular, you should check whether there is duplicate or almost duplicate content on your e-commerce site, as this often slips through the grid on a large scale, e.g. through faceted navigation.

Final thoughts

Building a sitemap isn't rocket science, especially if you're using a plugin that does most of the work for you. It's not difficult to create one by hand either — just crawl your page and format the resulting list of URLs.

Don't forget that Google doesn't have to index any pages in your sitemap. Sitemaps also have nothing to do with rankings.

If you want to rank better on Google, read this.

Do you have any questions? Let me know in the comments or on Twitter.

Translated by: Sebastian Simon. Sebastian Simon has been involved with SEO since 2009, currently at seven-bytes.de and heine.de.