An XML sitemap is a file that lists all the important pages of your website in a structured format that search engines can easily read. Think of it as a table of contents for your site. While search engine crawlers will eventually discover most of your pages by following links, a sitemap speeds up the process and ensures nothing important gets overlooked, especially on large sites with hundreds or thousands of pages.
What a Sitemap Actually Contains
At its core, an XML sitemap is a list of URLs with optional metadata for each entry. Here is what each field means:
<loc>: The full URL of the page. This is the only required field.<lastmod>: The date the page was last modified. Google uses this to decide whether to re-crawl the page. If you update a blog post, the lastmod date should reflect that change.<changefreq>: How often the page is likely to change (always, hourly, daily, weekly, monthly, yearly, never). In practice, Google largely ignores this field and relies on its own crawl data instead.<priority>: A value between 0.0 and 1.0 indicating the relative importance of the page within your site. Like changefreq, this is mostly ignored by Google today. It was more relevant in the early days of sitemaps.
Sitemap Structure and Sitemap Index Files
A single sitemap file can contain up to 50,000 URLs and must not exceed 50 MB when uncompressed. For most small to medium WordPress sites, one file is plenty. But larger sites (WooCommerce stores with thousands of products, news sites with years of archives) quickly hit that limit.
The solution is a sitemap index file. Instead of listing URLs directly, the index file points to multiple smaller sitemaps:
<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<sitemap>
<loc>https://example.com/post-sitemap.xml</loc>
<lastmod>2025-03-15</lastmod>
</sitemap>
<sitemap>
<loc>https://example.com/page-sitemap.xml</loc>
<lastmod>2025-02-20</lastmod>
</sitemap>
<sitemap>
<loc>https://example.com/product-sitemap.xml</loc>
<lastmod>2025-03-18</lastmod>
</sitemap>
</sitemapindex>Both Yoast SEO and Rank Math automatically split your sitemap into smaller files organized by content type (posts, pages, categories, products, etc.).
A Basic Sitemap Example
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>https://example.com/</loc>
<lastmod>2025-03-15</lastmod>
<priority>1.0</priority>
</url>
<url>
<loc>https://example.com/about/</loc>
<lastmod>2025-01-10</lastmod>
<priority>0.8</priority>
</url>
</urlset>How Google Search Console Uses Your Sitemap
Submitting your sitemap to Google Search Console is one of the first things you should do after launching a WordPress site. Once submitted, Search Console tells you how many URLs Google found in the sitemap and how many of those are actually indexed. This is incredibly useful for diagnosing problems. If your sitemap lists 500 pages but only 300 are indexed, you know there is an issue worth investigating. Maybe some pages are thin, duplicated, or returning errors.
Google Search Console also shows you when the sitemap was last read by Googlebot, so you can confirm that Google is regularly checking for updates.
WordPress and Sitemaps
Since version 5.5, WordPress generates a basic XML sitemap automatically at /wp-sitemap.xml. This built-in sitemap is functional but fairly basic. It includes posts, pages, and custom post types, but lacks features that SEO plugins provide.
Most site owners use an SEO plugin instead, because the plugins offer more control:
- Yoast SEO: Generates sitemaps at
/sitemap_index.xml, splits them by post type, includes image references within each URL entry, and automatically excludes noindexed content. - Rank Math: Similar functionality, accessible at
/sitemap_index.xml. Also supports news sitemaps and video sitemaps for sites with that type of content.
When you activate an SEO plugin that generates sitemaps, it typically disables the WordPress core sitemap to avoid conflicts.
What to Include and What to Exclude
Your sitemap should be a curated list of pages you actually want search engines to index. That means being selective:
- Include: Published blog posts, important pages (about, contact, services), product pages, category pages that have meaningful content.
- Exclude: Pages set to noindex, thin content pages (tag archives with only one or two posts), paginated archive pages (/page/2/, /page/3/), internal search result pages, login or registration pages, thank-you pages after form submissions.
A bloated sitemap full of low-quality URLs can actually hurt your SEO. Google has a limited crawl budget for each site, and if it spends time crawling pages that do not deserve to be indexed, your important pages might get crawled less frequently.
The Sitemap Directive in robots.txt
Your robots.txt file should include a line pointing to your sitemap:
Sitemap: https://example.com/sitemap_index.xmlThis helps search engines find your sitemap even if you have not submitted it through Search Console. Most SEO plugins add this line automatically. If you have a custom robots.txt file, make sure the Sitemap directive is present and points to the correct URL.
Common Sitemap Mistakes
A few pitfalls come up regularly with WordPress sitemaps:
- Including noindexed URLs: If a page has a noindex meta tag but appears in the sitemap, you are sending Google mixed signals. The page says "do not index me" while the sitemap says "please index me." SEO plugins usually handle this correctly, but manually created sitemaps can have this issue.
- Stale lastmod dates: Some setups never update the lastmod timestamp. If every page in your sitemap shows the same date from three years ago, Google stops trusting the lastmod data and falls back to its own crawl schedule.
- Forgetting to update after migration: After moving to a new domain or changing your URL structure, the sitemap often still contains old URLs. This leads to a flood of 404 errors in Search Console.
- Multiple conflicting sitemaps: Running both the WordPress core sitemap and a plugin sitemap at the same time. While this is not harmful per se, it can cause confusion when debugging indexing issues.
What InspectWP Checks
InspectWP checks whether your WordPress site has an accessible XML sitemap. It looks for the sitemap URL in your robots.txt file and tries common paths like /sitemap.xml, /sitemap_index.xml, and /wp-sitemap.xml. If a sitemap is found, InspectWP confirms it is reachable and valid, helping you catch issues like broken sitemap URLs or missing sitemap directives before they affect your search visibility.