A sitemap tells Google every URL on your site that should be indexed. Without one, Google still finds most of your pages eventually, through internal links and external backlinks. With one, it finds them in days instead of weeks, and the number of pages actually in Google stays close to the number you published instead of quietly drifting apart. For a company your size, that gap is where money leaks. You are paying a generalist or an agency to write service pages, location pages, and blog posts. If Google never indexes half of them, you paid for pages that cannot rank, cannot get found, and cannot bring in a lead. This is the sitemap and indexing process a small marketing team can set up once and run in ten minutes a week, without buying enterprise tooling you do not need yet.
What a sitemap is, and what it is not
A sitemap is an XML file that lists every URL on your site that you want crawlers to know about, with a little optional metadata per URL: when it last changed, how often it changes, and a rough priority. Google, Bing, and any other crawler read it to discover URLs they have not seen yet and to re-check URLs that recently changed. The open standard for the format lives at sitemaps.org, and Google's own documentation on how it uses them is the Search Console sitemaps overview.
Here is the part most people get wrong. A sitemap is not a ranking signal. A URL in your sitemap does not rank better than a URL that is not. The sitemap only affects discovery and crawl priority. Whether a page ranks is decided by on-page SEO, content quality, and links. So if a vendor pitches you "we will optimize your sitemap to boost rankings," that is a red flag. The sitemap's whole job is to make sure Google knows your pages exist and re-checks them when they change. That is valuable, but it is a plumbing job, not a growth lever.
Do you even have a sitemap already?
Before you build anything, check. Most modern content platforms generate one for you. Open a browser and go to yoursite.com/sitemap.xml. If you see a page of XML, you have one. If you see a 404, you do not, and that is your first fix. Common cases by platform:
- WordPress: Yoast SEO, Rank Math, or the built-in WordPress sitemap all generate one automatically. You usually do not build it by hand.
- Shopify, Squarespace, Wix, Webflow: all generate a sitemap for you at /sitemap.xml. You do not configure it, you just submit it.
- A custom-built site (React, Next.js, Rails, Laravel, or a static site generator): the sitemap is generated in code from your page data. Your developer or agency owns keeping it current. Ask them to confirm it lists every page type, not just the blog.
- A hand-maintained XML file uploaded once and never touched: this is the trap. It goes stale the moment someone adds a page and forgets to edit it. If this is you, get it moved to auto-generation. That is the single most valuable change here.
The mechanics of how a codebase generates one are covered in the source pattern this article is based on, the sitemaps pattern. You do not need to read the code. You need to know the principle it teaches: the sitemap should be generated from the same data your site renders from, so it can never drift out of sync with the pages that actually exist. Whatever platform you are on, that is the standard to hold your team or vendor to.
The repeatable setup, in five steps
This is a one-time setup a marketing generalist can do without a developer, as long as a sitemap already exists. If it does not, step 1 is the only one that needs your dev or agency.
- Confirm the sitemap exists and is auto-generated. Visit yoursite.com/sitemap.xml. If it 404s or is a stale hand-edited file, get it generated from your page data first.
- Verify your site in Google Search Console. Add the property, verify by the DNS method so it covers the whole domain, and you now have Google's own view of your site. This is free and it is the instrument you will use for everything below.
- Submit the sitemap in Search Console. Go to the Sitemaps section, paste in the sitemap URL, and submit. Google fetches it within an hour or so and starts crawling. Status should read Success.
- Reference the sitemap from your robots.txt. Your robots.txt file (at yoursite.com/robots.txt) should include a line pointing to the sitemap so other crawlers, including Bing and AI answer engines, can find it too. Most platforms do this automatically; confirm it.
- Set a weekly reminder to check index coverage. Ten minutes, same day each week. The check itself is below.
The weekly index check, in ten minutes
Setup is one afternoon. The value is in the weekly habit. The whole point of a small internal process is that it is repeatable and cheap, so it survives a busy quarter. Here is the check your generalist runs every week.
- Open Search Console and go to the Pages report under Indexing. Note the Indexed count and the Not indexed count.
- Compare Indexed against roughly how many pages you have published. If Indexed is close to your published count, you are healthy. If it is well below, you have an indexing gap worth investigating.
- Click Not indexed and read the reason categories. Google groups the excluded URLs by why they are not indexed. Some reasons are fine; some are problems. The next section decodes them.
- Note any new errors that appeared in the last 7 days. A sudden jump in a category is the early warning that something changed on your site.
Most weeks the answer is "nothing changed, we are still healthy." The week something does break, you catch it in the same review cycle instead of three weeks later when the traffic drop shows up in a report and your owner asks why. That early-warning value is the entire argument for the habit.
Reading the Not indexed reasons without panicking
The Not indexed list scares people because red-looking labels appear next to your URLs. Most of them are harmless. You need to know which two or three actually cost you money. The reasons that matter:
- Discovered, currently not indexed: Google knows the URL exists (probably from your sitemap) but has not crawled it yet. Common for new pages on a site without much authority. Usually fine and resolves on its own. If an important page sits here for weeks, add internal links to it from pages that are already indexed.
- Crawled, currently not indexed: Google looked at the page and chose not to index it. This is the one to care about. It usually means thin content, or the page is too similar to another page. The fix is content, not technical: make the page more specific and more useful.
- Alternate page with proper canonical tag: your page points at another page as the canonical version, and Google indexed that one instead. This is healthy. It is your site correctly avoiding duplicate content. Leave it alone.
- Server error (5xx) or Not found (404): Google tried to crawl a URL in your sitemap and hit an error. This is urgent. Either the page is broken and needs fixing, or it genuinely does not exist and should be removed from the sitemap.
- Blocked by robots.txt or Excluded by noindex tag: your own settings are telling Google not to index the page. Fine if intentional (a checkout page, an admin area). A problem if a real marketing page got caught by an over-broad rule. Check whether the blocked page is one you actually want in Google.
The mistakes that quietly break indexing
These are the sitemap and indexing errors that show up again and again in audits of small companies. Hand this list to whoever owns your site so they do not create them in the first place.
- The sitemap includes pages that return 404s. A stale sitemap points at pages that no longer exist. Google flags it and starts trusting the sitemap less. Auto-generation prevents this.
- The sitemap includes pages you have set to noindex. Google sees the contradiction and ignores the sitemap entry. Pick one intent per page.
- The same page is blocked in robots.txt and listed in the sitemap. Block it or list it, never both. This confuses the crawler.
- The sitemap was never resubmitted after a site rebuild. A redesign often changes URL patterns, and the old sitemap points at the old URLs. Resubmit after any rebuild.
- The sitemap lists non-canonical URLs (versions with tracking parameters, print versions, filtered category pages). Only list the clean canonical URL of each page.
None of these need enterprise software to fix. They need a sitemap generated from your real page data and a person who checks Search Console weekly. That is the whole discipline.
In-house or outsource: the honest call at your size
Here is the decision most SME marketing leads actually face. Do you keep indexing in-house or hand it to an agency? The honest answer splits it.
- Keep in-house: verifying Search Console, submitting the sitemap, the weekly index check, and fixing thin-content pages. These are process and content tasks your generalist can own after one afternoon of learning. Owning them keeps the knowledge in your building and keeps you from paying an agency for fifteen-second actions.
- Outsource the one-time technical setup: getting a proper auto-generated sitemap wired into a custom-built site, fixing a robots.txt that is blocking the wrong things, or untangling a legacy indexing mess. This is bounded, one-off developer work. Pay for it once, then bring the ongoing check back in-house.
The trap to avoid is paying a monthly retainer for "technical SEO" where the deliverable is mostly submitting your sitemap and pasting the Search Console screenshot into a report. You can do that yourself. If a vendor's technical SEO line item is really an indexing-monitoring line item, that is a fifteen-minute weekly task you are paying a retainer for. We packaged this thinking into how we structure engagements on our solutions, where the ongoing work is the content and strategy that actually compounds, not the plumbing checks a client can run themselves.
Defending the spend to your owner or finance
At your size, someone has to justify the marketing spend to an owner or a finance person who is skeptical of SEO. Indexing is one of the easiest wins to defend, because the numbers are concrete and free to pull. Frame it as coverage, not vanity metrics.
- Report the coverage ratio. "We have published 137 pages. Google has indexed 132 of them. That is 96 percent coverage." That is a number an owner understands: we are getting credit for almost everything we paid to produce.
- Report the gap you closed. "Three months ago we were at 60 percent coverage. We were paying for pages Google could not see. We fixed the sitemap and the thin pages, and we are now at 96 percent." That is a defensible before-and-after.
- Tie it to lead pages specifically. "All 12 of our service pages are indexed and eligible to be found." The owner cares about the pages that produce leads, so lead with those.
- Keep the tool cost line at zero. "This runs on free tools and ten minutes of internal time a week." That is the sentence that ends the budget conversation in your favor.
What to skip until you are bigger
Part of running a defensible process is not over-buying. There are real indexing features that exist for larger sites and are wasted effort at 10 to 99 people. Skip these until you clearly need them:
- Sitemap index files and paginated sitemaps. These matter above 50,000 URLs. If you have a few hundred pages, one sitemap file covers you.
- Splitting sitemaps by content type (one for blog, one for products, one for locations). Nice for very large sites so the reports read cleanly. Overkill for you.
- News sitemaps and video sitemaps. Only relevant if you publish news or host a lot of video. Most SMEs need neither.
- The Search Console API and bulk URL analysis. The built-in interface shows plenty for a few hundred pages. The API is for sites analyzing tens of thousands of queries.
- Paid indexing services and daily URL-submission tools. These sell a fifteen-second free action as a subscription. Do not buy them.
The rule of thumb: if a feature exists to manage tens of thousands of pages, and you have a few hundred, it is not for you yet. Buying it early is exactly the enterprise-tooling mistake that makes an owner rightly question the marketing budget.
How this scales as you grow
The process here holds until you cross a few thousand pages, at which point the checks change shape. If you are heading toward that, the mid-market version of this playbook covers governance and index coverage across many pages. And the same core discipline, retold for the two neighboring reader sizes, lives in the sibling versions of this piece so you can hand the right one to the right person. See the micro businesses version for an owner-operator doing it solo, the mid-market teams version for a larger site with governance and a stack to integrate with, and the agencies version if you are the one delivering this across a book of client sites.
If you serve considered-purchase buyers, the indexing work sits underneath everything else you do to get found. It is worth doing well precisely because it is cheap and repeatable. We work with companies your size on exactly this kind of durable, low-cost foundation, and you can see how we approach it for professional services firms specifically.
The short version
Confirm you have an auto-generated sitemap. Verify Search Console. Submit the sitemap. Check the Pages report for ten minutes a week and fix the two reasons that actually cost you money: thin pages Google crawled but did not index, and real pages caught by an over-broad block. Report coverage as a percentage to your owner. Skip the enterprise tooling until you are far bigger. That is a repeatable internal process a small team can own for the cost of a habit, and it stops you paying for pages Google never sees.
If you want the setup done once and handed back to your team to run, run the estimator and we will scope the one-time indexing fix and show you the weekly check we would leave you with. Or talk to us about getting your sitemap and coverage sorted, then owned in-house.