Brand Visibility in LLMs If you want your web content to be crawled, make sure there isn't something in place that might be blocking the various bots and crawlers. Sounds obvious, right? Well.... 1. When ChatGPT first burst onto the scene, many companies erring on the side of conservatism, blacked various bots and crawlers. With employee turnover, you want to check if such blocks may still be in place. 2. AI bots can also be blocked by a range of security tools: - Web Application Firewalls (WAFs); - Bot Management / DDoS Protection tools; - CDN security features. So you may want to ask your technical Website and Security folks to double-check that key pages are accessible to known crawlers, and Allowlisted on these security tools. 3. Also, not all bots and crawlers are equally capable in their ability to crawl, understand, and assimilate information on a website. Each crawler is tuned differently: - HTML: All bots handle fairly well. - PDFs & Office files: Mostly partially read; Google's bots lead here, other bots are less capable/lag. - JavaScript content: Search-engine-grade bots (Google, Bing) succeed. Whereas, simpler crawlers (GPTBot, ClaudeBot, Mistral) don't read this content successfully. - Media: All bots require appropriate metadata, as that can be read. Java script rendered web content is not advised. Anything that requires a user interaction to reveal the content is better rendered as key content in plain HTML on initial page load. Few crawlers have the ability to "click" on tabs, open accordions, or advance carousels, for example. I hope this is helpful.
Cloudfare and Squarespace both allow options that are similar to robots.txt to allow AI crawling. The JS stuff you mentioned though, I had no idea. This is great to understand how LLMs find companies. Seems like the next frontier of finding qualified folks, but how relevant will they be?
David M. - From my research, Cloudflare and Squarespace do have some LLM-oriented bots "acceptlisted" - but many are not. Obviously, this is an ever changing listing. You will want to confirm what is/isn't allowlisted for your site pages. It starts by knowing which LLMs use which bots and crawlers, and for what. i hope this helps.
