LightSite AI, an agentic platform focused on Generative Engine Optimization and AI search visibility, announced the release of new internal research based on approximately 6.5 million datapoints related to LLM bot behavior across customer websites.
The research was developed from observed interactions between websites and large language models, with the goal of better understanding how AI systems discover, crawl, and extract content across the web. The findings are intended to help marketing, SEO, and digital teams improve AI SEO, LLM discoverability, and machine-readable website infrastructure.
According to the analysis, question-shaped URLs and page structures appeared to perform better than generic content paths in many cases. LightSite AI reported that pages framed around direct user-style questions were indexed more often than broader or less specific content formats.
The research also found that websites with deeper structured data and clearer machine-readable signals tended to receive deeper crawling behavior and more repeat bot visits. This suggests that structured content may play an important role in helping AI systems interpret websites more efficiently.
Another finding from the dataset was that LLM bots often extract a limited amount of data from the first page they access. In LightSite AI’s analysis, this averaged roughly 25 KB to 30 KB per page. This may increase the importance of clarity in the opening section of a page, especially for brands trying to improve visibility in AI-driven search environments.
The report also notes that, based on the observed data, there was no clear evidence that content becomes more visible in AI search simply because it was written to “sound” optimized for language models. Instead, the research pointed more consistently toward clarity, directness, and structured presentation of information.
“Many companies are still treating AI search like a variation of traditional SEO, but the underlying behavior is different,” said Stas Levitan, CEO of LightSite AI. “This research suggests that LLM discoverability is shaped less by content tricks and more by clarity, structure, and the ability of machines to confidently interpret what a website is about.”
The company said some of the findings have already been shared publicly, while additional research is expected to be released in future publications.
LightSite AI provides software, automation, and AI agents for Generative Engine Optimization. Its platform helps brands improve performance in AI search by strengthening structured, machine-readable website signals, analyzing AI search visibility, supporting content creation, and identifying backlink opportunities.
The new research is part of the company’s broader effort to bring more evidence and transparency to a market that is often shaped by assumptions rather than direct observation of AI crawler behavior.
Additional information about the research and LightSite AI’s work in AI SEO, structured data for AI, and LLM discoverability is available on the company’s website.
