Skip to content

How to Control the OpenAI Crawlers With robots.txt: A Micro Business Guide

One tiny file controls how AI sees your site. Here is how to read it, what each line means, and the copy-paste block that keeps you findable.

John Cravey with EleviFounder9 min read

There is a single small file that acts as the control panel for how OpenAI's crawlers treat your website. It is called robots.txt, it lives at your web address, and most business owners have never opened it. You do not need to become technical to understand it. You need to know what it is, how to check yours, and the handful of lines that keep you findable in ChatGPT while letting you make a simple choice about whether AI can learn from your site. This guide is that, in plain language.

The short version

robots.txt is a plain text file that sits at the root of your website and tells crawlers what they are allowed to look at. You can see yours right now: type your web address into a browser and add /robots.txt at the end, like yoursite.com/robots.txt. OpenAI runs a few different crawlers, and this one file is where you tell each of them what to do (OpenAI's crawler docs). The good news is that for a small business, getting this right is a matter of adding a few clear lines, and the most important thing it does is keep you findable in ChatGPT.

The reason it matters is that this file can accidentally lock AI out of your site without you ever deciding to. Website builders, security plugins, and old settings sometimes add a rule that blocks crawlers, and if that rule catches OpenAI's search crawler, you quietly disappear from ChatGPT search. Nothing on your site looks broken. You just stop showing up. So the goal of reading this is partly to make sure you are not accidentally blocking yourself, and partly to give you simple control over the one real choice you have: whether to let AI learn from your content.

You only need to recognize a few words to read this file safely. These are them.

How to read your file without panicking

robots.txt looks technical because it is written for machines, but you only need to look for two things. First, is there a line that says Disallow: / sitting under User-agent: * (the star means all crawlers)? That blocks everything and is worth fixing. Second, is OpenAI's search crawler, OAI-SearchBot, named anywhere with a Disallow? If neither of those is true, you are probably in good shape and open. If you see a not-found page when you visit your robots.txt, that usually means nothing is being blocked, which is also fine, since crawlers are allowed by default when there is no rule against them.

You do not need to understand every line in the file. Plenty of robots.txt files have rules for other services, sitemaps, and technical odds and ends that are none of your concern. You are looking for exactly two things: a lonely Disallow that blocks everyone, and any rule that names OAI-SearchBot and shuts it out. If you find either, the next section is your fix. If you find neither, you can still add the explicit allow below to be safe, and to protect yourself against a future setting that might block you.

The exact lines to use

Here is the block that keeps you findable in ChatGPT. It names OpenAI's search crawler and welcomes it, and because a named rule beats a general one, it protects you even if some broad block exists elsewhere in the file.

User-agent: OAI-SearchBot
Allow: /
Add this to keep ChatGPT able to find you. The named allow protects you even against a stray Disallow elsewhere.

That is the essential one. If you also want to make a decision about whether AI can learn from your content for training, you add a second group for the training crawler, GPTBot. To let it learn, use Allow; to keep your content out of training, use Disallow. For most small businesses, letting it learn is fine, and the whole training decision is covered plainly in the micro-business training guide. Here is the version that keeps you findable and blocks training, if that is what you want.

User-agent: OAI-SearchBot
Allow: /

User-agent: GPTBot
Disallow: /
Findable, but AI training blocked. Use this only if you have paid or proprietary content to protect; otherwise the allow version is fine.

Where you put these lines depends on how your site was built. On WordPress, an SEO plugin has a robots.txt editor. On Squarespace or Wix, crawling is handled in settings. On Shopify, you edit a robots file in your theme. If you are not sure, the two lines are simple enough that whoever built your site can add them in a few minutes, so forward them this guide. After you make the change, wait about a day for OpenAI to process it, then you are done.

This is the whole task. Most small businesses do it once and never touch it again.

What each OpenAI crawler is for

You do not need to memorize this, but a quick tour of the crawler names helps the file make sense, because each name in the file is a different job. OpenAI runs a few crawlers, and they do genuinely different things. OAI-SearchBot is the one that decides whether you show up in ChatGPT search, so it is the one you most want to keep allowed. GPTBot is the one that learns from your content to train the models, so it is the one you have an optional decision about. There is also an ads crawler, OAI-AdsBot, that only matters if you run ads in ChatGPT, which almost no small business does yet, so you can ignore it for now. And there is ChatGPT-User, the live-read agent, which does not obey this file anyway.

The practical takeaway is that when you see a name after User-agent: in your file, you can now roughly place it. If it says OAI-SearchBot, those lines control your findability, and you want them to allow. If it says GPTBot, those lines control training, and the choice is yours. If it says something you do not recognize, like a Google or Bing crawler, that is a different company's crawler and not part of this decision. Knowing that the names map to different jobs is most of what it takes to read the file without anxiety, and it is why the safe move is a specific, named allow for the search crawler rather than a broad rule you are not sure about.

This also explains why AI is not a single on-off switch, which trips a lot of people up. Because each crawler is named separately, you can be findable and allow training, or findable and block training, or in theory other combinations, all in the same file. There is no single line that means allow AI or block AI. There are separate lines for separate crawlers doing separate things. For a small business the only two that matter are the search crawler, which you keep allowed, and the training crawler, which you decide on, and everything else in the file you can leave alone.

What this file cannot do

Before the limits, one reassurance: getting this wrong in a way that hurts you is easy to avoid, because the only truly harmful mistake is blocking the search crawler, and the explicit allow prevents exactly that. Short of deleting lines you do not understand, there is very little you can do here that damages your business, so you can approach the file with curiosity rather than dread.

It is worth knowing the limits so you do not expect the wrong things. This file controls whether OpenAI's crawlers can find and learn from your site, but it does not control the live read that happens when someone asks ChatGPT about you in the moment. That agent, ChatGPT-User, does not obey robots.txt for those user-triggered reads, and you would not want to block it anyway, because it means a potential customer is looking at you right now. Making sure that live read goes well is about a fast, clear, current page, which we cover in the micro-business live-fetch guide, not about this file.

The other limit is that robots.txt is only followed if the request actually reaches your site. Some hosting setups or security services sit in front of your website and can block crawlers before your robots.txt is ever read. For most small businesses on ordinary hosting, this is not an issue, but if you added a security service and suddenly worry you are invisible, that is a place to check. For the typical small business, though, the file is the control that matters, and the two lines above are the important part.

If you cannot find or edit the file yourself

Plenty of small business owners get to this point and hit a wall: they looked at their robots.txt, maybe it has a block, but they have no idea how to change it. That is completely normal, and it does not mean you have to learn to code. The file is controlled differently depending on how your site was built, and in most cases the change is a few minutes for whoever set up your site. On WordPress, it is usually an SEO plugin like Yoast or Rank Math that has a robots.txt editor buried in its settings. On Squarespace or Wix, crawling is handled in the built-in settings rather than a file you edit directly. On Shopify, it is a file in your theme. If your site was built by a freelancer or agency, this is a five-minute request to send them, along with the two lines to add.

The important thing is not to let uncertainty about the mechanics stop you from getting the outcome. The outcome is simple: make sure OpenAI's search crawler is allowed. If you can do it yourself through your builder's settings, do it. If you cannot, forward this guide and the two OAI-SearchBot lines to whoever manages your site, and ask them to confirm the search crawler is not blocked and to add the explicit allow. Then check yourself a day later by visiting your robots.txt in a browser, so you know it actually got done. You do not need to understand the file to make sure it is right; you just need to know what right looks like, which is now, and to make sure someone put it there.

One caution if you do edit it yourself: change only what you understand, and do not delete lines you are unsure about. Many robots.txt files contain rules for other services, admin areas, and technical details that should stay. Your job is narrow: make sure there is no block on the search crawler, and add the explicit allow. Leave the rest alone. If you are ever unsure whether an existing line is a problem, that is exactly the kind of thing worth a quick question to a web person rather than a guess, because the downside of deleting the wrong line is worse than the downside of leaving a harmless one in place.

If the whole topic still feels intimidating, hold on to the one idea that actually matters: this file can quietly lock you out of ChatGPT, and the two-line allow is how you make sure it does not. Everything else in this guide, the crawler names, the precedence rule, the training decision, is useful context, but the essential action is small and specific. Confirm the search crawler is not blocked, add the explicit allow so it stays that way, and you have handled the part that affects your business most. You do not need to become fluent in robots.txt. You need to make sure the one door that matters is open, and to know how to check that it stayed open after your site changes.

The bottom line for a micro business

One small file controls how OpenAI's crawlers treat your site. Check yours, make sure the search crawler is not blocked, add the explicit allow to be safe, and decide whether to let AI learn from your content. That is the whole job, and for most small businesses it is a ten-minute task you do once. The single most important outcome is staying findable in ChatGPT, so if you remember only one thing, remember the OAI-SearchBot allow lines, which keep the door open no matter what else is in the file.

If you help other small businesses, the way an agency standardizes this is in the agency playbook. And once you are sure the file is right, the next step is making sure you actually show up, which is covered in the micro-business guide to showing up in ChatGPT search. Want us to check your robots.txt for you? Run a free discovery and we will read the file and tell you if anything is quietly blocking you.

Written by
John Cravey
Founder

Founder of Frontend Horizon. Writes most of the long-form work on the FH blog.

Newer post
How to Control the OpenAI Crawlers With robots.txt: A Guide for Growing Businesses
Older post
How to Control the OpenAI Crawlers With robots.txt: An Agency Playbook
Keep reading

More from the blog

AI·13 min

AEO for Micro Businesses: Get Named in AI Answers Without a Marketing Team

You do not need a content team or a budget. You need a few hours, your five best customer questions, and a plain-words answer to each one.

AI·12 min

AI-Assisted Content for Micro Businesses: Write a Month of Content in a Weekend

You do not have a marketing team. You have a Saturday. Here is how to turn that into four weeks of content that still sounds like a real person wrote it.

AI·10 min

How to Show Up in ChatGPT Search as a Micro Business

No agency, no dev, no budget. Just the handful of checks that decide whether ChatGPT can see you at all.