1. Technology

What is a Sitemap?

How to Use the XML Sitemap Protocol to Create Web Sitemaps

By

What is an XML Sitemap?

An XML sitemap is a fairly simple XML file that contains information about one or more URLs on your Web site. The information that is stored there helps search engines better spider your site. All it needs to be is a list of URLs for your Web site, but to get more out of it, you want to include other information as well:

  • Last modified time
    This is a date or date and time when the file was last modified.
  • How often the file is modified
    This allows you to define how often the content is modified. It doesn't require the search engines to respider it that often.
  • Define the priority of a page
    With priority you can indicate if a page is more or less important than other pages in the sitemap. This will not increase or decrease your page's priority against other Web sites, only against pages within the current site.

How To Build a Sitemap

The beauty of this protocol is how easy it is to build. There is only one required field - the location of the URL you're defining. All additional information is optional.

The head of the document
This is an XML document, so you need to start it with an XML declaration:

<?xml version="1.0" encoding="UTF-8"?>

The container element
The container element in a sitemap is the <urlset> element. If you want to write a sitemap without validating it against the sitemap schema, write the following:

<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
 </urlset>

The URL
This is the container that holds all the information about each URL in your document. Place it inside the <urlset>:

<url>
 </url>

The location
This is the only required element in the URL element. It should contain a URI to a page you want the search engines to spider.

<loc>http://webdesign.about.com/od/sitemaps/a/aa010807.htm</loc>

If there are any special characters in the URL, you will need to encode them.

The Modified Date
This should be in the W3C Datetime format: YYYY-MM-DD or YYYY-MM-DDThh:mmTZD:

<lastmod>2007-01-08</lastmod>

Modification frequency
This field is a suggestion rather than a command to search engines. They may crawl the pages more frequently than you indicate or less. Don't rely on the option "never" to tell the search engine never to spider it. Use your robots.txt file for that. Valid values for this field are:

  • always
    These change every time they are accessed.
  • never
    This describes URLs that have been archived.
  • daily
  • weekly
  • monthly
  • yearly
<lastmod>2007-01-08</lastmod>

The page priority
This is the priority of the page relative to other documents on the site. It is a number from 0.0 to 1.0. The default value is 0.5. Assigning a high priority to all your pages is unlikely to help, as the priority only refers to the same site, and if all pages are marked the same priority, then they will be all treated equally. I recommend leaving the priority alone except for pages like your home page (Priority 1.0) or pages that aren't ready for full promotion (Priority 0.1).

<priority>0.8</priority>

A Simple Sitemap

A sitemap only has three lines of XML required. So you could have a sitemap with only one URL in it that looked like this:

<?xml version="1.0" encoding="UTF-8"?>
 <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
   <url>
     <loc>http://webdesign.about.com/od/sitemaps/a/aa010807.htm</loc>
   </url>
 </urlset>

And a more complex one with all the fields looks like:

<?xml version="1.0" encoding="UTF-8"?>
 <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
   <url>
     <loc>http://webdesign.about.com/od/sitemaps/a/aa010807.htm</loc>
     <lastmod>2007-01-08</lastmod>
     <changefreq>yearly</changefreq>
     <priority>0.8</priority>
   </url>
 </urlset>

What if I have a Lot of Files on My Site?

Sitemap files have a limit of 50,000 URLs or 10MB. This ensures that your server isn't overloaded serving huge files to search engines. But if you have a site that is larger than that, then you'll need multiple sitemap files.

The sitemap protocol allows you to create multiple sitemaps for one site and then group them all in a sitemap index file. This file as the following fields:

  • <sitemapindex>
    The container element - similar to the <urlset> element.
  • <sitemap>
    The container for the sitemap information - similar to the <url> element.
  • <loc>
    The location of the sitemap.
  • <lastmod>
    The date the sitemap file was last modified. This will enable search engine crawlers to only crawl a sub-set of your pages. If their sitemap hasn't been updated recently, then there's no reason to crawl the pages.

A simple sitemap index file looks like this:

<?xml version="1.0" encoding="UTF-8"?>
 <sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
   <sitemap>
     <loc>http://webdesign.about.com/od/sitemaps/l/mpreviss07sitemap.xml</loc>
     <lastmod>2007-01-08</lastmod>
   </sitemap>
 </sitemapindex>

The Sitemap File Location

Where you put your sitemap file on your Web server determines which files can be included in that sitemap. Sitemaps.org recommends that you place all your sitemaps in the root of your Web server, so that the files are most inclusive.

A sitemap placed in

http://www.yoursite.com/articles/

can only include files in the "articles/" directory and below.

What to Do With Your Sitemap Files

Once you have a sitemap file for your Web site, you should submit them to search engines so that they can spider them. For example, if you have Google Webmaster tools, you can use that to submit a sitemap to Google. Or you can submit a site feed to Yahoo!, and they accept sitemaps.

©2014 About.com. All rights reserved.