A sitemap tells Google every URL on your site that should be indexed. Without one, Google still finds most of your pages eventually — via internal links and external backlinks. With one, Google finds them in days instead of weeks, and the indexed-page count stays close to your published-page count instead of drifting. Every FH client site ships with an auto-generated sitemap. Here’s the pattern.
What a sitemap is (and isn’t)
It’s an XML file listing every URL on your site, with optional metadata: last modified, change frequency, priority. Crawlers (Google, Bing, anyone) read it to discover URLs they haven’t crawled yet and to re-prioritize URLs that recently changed.
It is not a ranking signal. URLs in a sitemap don’t rank better than URLs not in a sitemap. The sitemap only affects discovery and crawl prioritization. The ranking work is on-page SEO, content, and links.
Next.js App Router: app/sitemap.ts
Next.js generates a sitemap from a TypeScript file at `app/sitemap.ts`. Export a default function that returns an array of URL objects. Next handles the XML serialization, the routing (`/sitemap.xml`), and the caching.
// app/sitemap.ts
import type { MetadataRoute } from "next";
import { POSTS } from "@/lib/blog/posts";
import { LOCATIONS } from "@/lib/locations";
import { SERVICES } from "@/app/components/home/data";
import { slugify } from "@/lib/slug";
const BASE = "https://frontendhorizon.com";
export default function sitemap(): MetadataRoute.Sitemap {
const now = new Date();
const staticRoutes = [
"",
"/solutions",
"/who-we-serve",
"/portfolio",
"/blog",
"/contact",
].map((path) => ({
url: `${BASE}${path}`,
lastModified: now,
changeFrequency: "monthly" as const,
priority: 0.8,
}));
const posts = POSTS.map((p) => ({
url: `${BASE}/blog/${p.slug}`,
lastModified: new Date(p.updatedAt ?? p.publishedAt),
changeFrequency: "monthly" as const,
priority: 0.6,
}));
const locations = LOCATIONS.map((l) => ({
url: `${BASE}/locations/${l.slug}`,
lastModified: now,
changeFrequency: "monthly" as const,
priority: 0.7,
}));
const solutions = SERVICES.map((s) => ({
url: `${BASE}/solutions/${slugify(s.name)}`,
lastModified: now,
changeFrequency: "monthly" as const,
priority: 0.7,
}));
return [...staticRoutes, ...solutions, ...locations, ...posts];
}Submitting to Search Console
GSC → Sitemaps → enter the URL → Submit. Google fetches it within an hour or so and starts crawling. Status: ‘Success’ means it parsed cleanly. ‘Couldn’t fetch’ usually means a 404 or 5xx; check the URL is live.
What lastModified actually does
Google uses lastModified to prioritize re-crawls. A URL with a recent lastModified gets crawled sooner. Don’t lie about it — Google has signals to detect when you’re claiming a recent modification on a page that hasn’t actually changed, and it stops trusting your sitemap.
Sitemap size limits
Each sitemap file can hold up to 50,000 URLs or 50MB uncompressed. Above that, use a sitemap index — a master sitemap that lists multiple sub-sitemaps. Most SMB sites are nowhere near these limits. We hit them once on a retail client with 80,000 product pages; the fix was a paginated sitemap.
// app/sitemap.ts — paginated
export function generateSitemaps() {
// Split into chunks of 5000 URLs per sitemap
return Array.from({ length: 17 }, (_, i) => ({ id: i }));
}
export default function sitemap({ id }: { id: number }): MetadataRoute.Sitemap {
const start = id * 5000;
const end = start + 5000;
return PRODUCTS.slice(start, end).map((p) => ({
url: `https://example.com/products/${p.slug}`,
lastModified: new Date(p.updatedAt),
}));
}Sitemap and robots.txt
Reference the sitemap from robots.txt so other crawlers find it. Next.js has `app/robots.ts` for this:
// app/robots.ts
import type { MetadataRoute } from "next";
export default function robots(): MetadataRoute.Robots {
return {
rules: { userAgent: "*", allow: "/", disallow: ["/admin/", "/api/"] },
sitemap: "https://frontendhorizon.com/sitemap.xml",
};
}Common mistakes that hurt indexing
- Including non-canonical URLs (e.g., URLs with query strings, paginated URLs). Only include canonical URLs in the sitemap.
- Including pages with `noindex` directives. Google will see the contradiction and ignore the sitemap entry.
- Including pages that return 404s. The sitemap’s job is to tell Google about pages that exist.
- Including pages blocked in robots.txt. Pick one or the other — block in robots OR include in sitemap, not both.
- Forgetting to update the sitemap after a site rebuild. We’ve seen this 3 times; the rebuild changes URL patterns and the old sitemap is stale.
Multiple sitemaps for different content types
Some teams split sitemaps by content type: `sitemap-blog.xml`, `sitemap-products.xml`, `sitemap-locations.xml`. This is fine; it makes the GSC Coverage report easier to read because indexed counts are grouped by content type. We do it for sites with more than 5,000 URLs.
Dynamic generation and ISR
Next.js generates the sitemap at build time by default. If you publish blog posts via static generation, the sitemap rebuilds on every deploy. If you publish content out-of-band (a CMS, a database write), the sitemap needs ISR to pick up new entries without a redeploy.
// Force ISR on the sitemap
export const revalidate = 3600; // re-generate hourlyNews sitemaps and video sitemaps
If you publish news content, Google supports a special news sitemap format with extra metadata (publication date, title). If you publish video, a video sitemap helps Google index thumbnails and durations. Most SMB sites need neither. We’ve added a news sitemap exactly once for a hyperlocal news client.
Verifying it’s working
- Fetch your sitemap URL in a browser — should return XML.
- GSC → Sitemaps → status should be ‘Success’ and ‘Discovered URLs’ should match your expected page count.
- GSC → Pages → Indexed should grow toward your sitemap URL count over the next week or two.
- If indexed count stays well below sitemap count, look at the Coverage report’s Excluded reasons for why.
How this lands across FH client work
Every FH client site ships with an auto-generated `app/sitemap.ts` that pulls from the same typed data sources the site renders from. No drift between published pages and sitemap. No manual maintenance. Submitted to GSC on day one of every launch. If your site’s sitemap is hand-maintained or missing entirely, book a consultation — we’ll add the auto-generation pattern in a half-day engagement.