Latest news with #TollBit
Yahoo
24-06-2025
- Business
- Yahoo
The internet of agents is rising fast, and publishers are nowhere near ready
Imagine you owned a bookstore. Most of your revenue depends on customers coming in and buying books, so you set up different aspects of the business around that activity. You might put low-cost 'impulse buy' items near the checkout or start selling coffee as a convenience. You might even partner with publishers to put displays of popular bestsellers in high-visibility locations in the story to drive sales. Kroger is closing 60 stores: See the list of locations that are reportedly shuttering in 2025 so far Housing market map: Zillow just released its updated home price forecast for 400-plus housing markets What's behind the rise in interim CEOs Now imagine one day a robot comes in to buy books on behalf of someone. It ignores the displays, the coffee kiosk, and the tchotchkes near the till. It just grabs the book the person ordered, pays for it, and walks out. The next day 4 robots come in, then 12 the day after that. Soon, robots are outnumbering humans in your store, which are dwindling by the day. You soon see very few sales from nonbook items, publishers stop bothering with those displays, and the coffee goes cold. Revenue plummets. In response, you might start charging robots a fee to enter your store, and if they don't pay it, you deny them entry. But then one day a robot that looks just like a human comes in—to the point that you can't tell the difference. What do you do then? This analogy is basically what the publishing world is going through right now, with bot traffic to media websites skyrocketing over the past three months. That's according to new data from TollBit, which recently published its State of the Bots report for the first quarter of 2025. Even more concerning, however, is that the most popular AI search engines are choosing to ignore long-respected standards for blocking bots, in some cases arguing that when a search 'agent' acts on behalf of an individual user, the bot should be treated as human. TollBit's report paints a fast-changing picture of what's happening with AI search. Over the past several months, AI companies have either introduced search abilities or greatly increased their search activity. Bot scraping focused on retrieval-augmented generation (RAG), which is distinct from training data, increased 49% over the previous quarters. Anthropic's Claude notably introduced search, and in the same period ChatGPT (the world's most popular chatbot by far) had a spike in users, plus deep research tools from all the major providers began to take hold. At the same time, publishers increased their defenses. The report reveals that media websites in January were using various methods to block AI bots four times as much as they were doing in a year before. The first line of defense is to adjust their website's file, which tells which specific bots are welcome and which ones are forbidden from accessing the content. The thing is, adhering to is ultimately an honor system and not really enforceable. And the report indicates more AI companies are treating it as such: Among sites in TollBit's network, bot scrapes that ignore increased from 3.3% to 12.9% in just one quarter. Part of that increase is due to a relatively new stance the AI companies have taken, and it's subtle but important. Broadly speaking, there are three different kinds of bots that scrape or crawl content: Training bots: These are bots that crawl the internet to scrape content to provide training data for AI models. Search indexing bots: Bots that go out and crawl the web to ensure the model has fast access to important information outside its training set (which is usually out of date). This is a form of RAG. User agent bots: Also a form of RAG, these are crawlers that go out to the web in real time to find information directly in response to a user query, regardless of whether the content it finds has been previously indexed. Because No. 3 is an agent acting on behalf of a human, AI companies argue that it's an extension of that user behavior and have essentially given themselves permission to ignore settings for that use case. This isn't guesswork—Google, Meta, and Perplexity have made it explicit in their developer notes. This is how you get human-looking robots in the bookstore. When humans go to websites, they see ads. Humans can be intrigued or enticed by other content, such as a link to a podcast about the same topic as an article they're reading. Humans can decide whether or not to pay for a subscription. Humans sometimes choose to make a transaction based on the information in front of them. Bots don't really do any of that (not yet, anyway). Large parts of the internet economy depend on human attention to websites, but as the report shows, that behavior drops off massively when someone uses AI to search the web—AI search engines provide very little in the way of referral traffic compared to traditional search. This of course is what's behind many of the lawsuits now in play between media companies and AI companies. How that is resolved in the legal realm is still TBD, but in the meantime, some media sites are choosing to block bots—or at least are attempting to—from accessing their content at all. For user agent bots, however, that ability has been taken away. The AI companies have always seen data harvesting in the way that's most favorable to their insatiable demand for it, famously claiming that data only needs to be 'publicly available' to qualify as training data. Even when they claim to respect for their search engines, it's an open secret that they sometimes use third-party scrapers to bypass it. So apart from suing and hoping for the best, how can publishers regain some, well, agency in the emerging world of agent traffic? If you believe AI substitution threatens your growth, there are additional defenses to consider. Hard paywalls are easier to defend, both technically and legally, and there are several companies (including TollBit, but there are others, such as ScalePost) that specialize in redirecting bot traffic to paywalled endpoints specifically for bots. If the robot doesn't pay, it's denied the content, at least in theory. Collective action is another possibility. I doubt publishers would launch a class action around this specific relabeling of user agents, but it does provide more ammunition in broader copyright lawsuits. Besides going to court, industry associations could come out against the move. The News/Media Alliance in particular has been very vocal about AI companies' alleged transgressions of copyright. The idea of treating agentic activity as the equivalent of human activity has consequences that go beyond the media. Any content or tool that's been traditionally available for free will need to reevaluate that access now that robots are destined to be a growing part of the mix. If there was any doubt that simply updating instructions was adequate, the TollBit report blew it out of the water. The stance that 'AI is just doing what humans do' is often used as a defense for when AI systems ingest large amounts of information and then produce new content based on it. Now the makers of those systems are quietly extending that idea, allowing their agents to effectively impersonate humans while shopping the web for data. Until it's clear how to build profitable stores for robots, there should be a way to force their masks off. This post originally appeared at to get the Fast Company newsletter: Error in retrieving data Sign in to access your portfolio Error in retrieving data Error in retrieving data Error in retrieving data Error in retrieving data


Fast Company
23-06-2025
- Business
- Fast Company
The internet of agents is rising fast, and publishers are nowhere near ready
Imagine you owned a bookstore. Most of your revenue depends on customers coming in and buying books, so you set up different aspects of the business around that activity. You might put low-cost 'impulse buy' items near the checkout or start selling coffee as a convenience. You might even partner with publishers to put displays of popular bestsellers in high-visibility locations in the story to drive sales. Now imagine one day a robot comes in to buy books on behalf of someone. It ignores the displays, the coffee kiosk, and the tchotchkes near the till. It just grabs the book the person ordered, pays for it, and walks out. The next day 4 robots come in, then 12 the day after that. Soon, robots are outnumbering humans in your store, which are dwindling by the day. You soon see very few sales from nonbook items, publishers stop bothering with those displays, and the coffee goes cold. Revenue plummets. In response, you might start charging robots a fee to enter your store, and if they don't pay it, you deny them entry. But then one day a robot that looks just like a human comes in—to the point that you can't tell the difference. What do you do then? This analogy is basically what the publishing world is going through right now, with bot traffic to media websites skyrocketing over the past three months. That's according to new data from TollBit, which recently published its State of the Bots report for the first quarter of 2025. Even more concerning, however, is that the most popular AI search engines are choosing to ignore long-respected standards for blocking bots, in some cases arguing that when a search 'agent' acts on behalf of an individual user, the bot should be treated as human. The robot revolution TollBit's report paints a fast-changing picture of what's happening with AI search. Over the past several months, AI companies have either introduced search abilities or greatly increased their search activity. Bot scraping focused on retrieval-augmented generation (RAG), which is distinct from training data, increased 49% over the previous quarters. Anthropic's Claude notably introduced search, and in the same period ChatGPT (the world's most popular chatbot by far) had a spike in users, plus deep research tools from all the major providers began to take hold. At the same time, publishers increased their defenses. The report reveals that media websites in January were using various methods to block AI bots four times as much as they were doing in a year before. The first line of defense is to adjust their website's file, which tells which specific bots are welcome and which ones are forbidden from accessing the content. The thing is, adhering to is ultimately an honor system and not really enforceable. And the report indicates more AI companies are treating it as such: Among sites in TollBit's network, bot scrapes that ignore increased from 3.3% to 12.9% in just one quarter. Part of that increase is due to a relatively new stance the AI companies have taken, and it's subtle but important. Broadly speaking, there are three different kinds of bots that scrape or crawl content: Training bots: These are bots that crawl the internet to scrape content to provide training data for AI models. Search indexing bots: Bots that go out and crawl the web to ensure the model has fast access to important information outside its training set (which is usually out of date). This is a form of RAG. User agent bots: Also a form of RAG, these are crawlers that go out to the web in real time to find information directly in response to a user query, regardless of whether the content it finds has been previously indexed. Because No. 3 is an agent acting on behalf of a human, AI companies argue that it's an extension of that user behavior and have essentially given themselves permission to ignore settings for that use case. This isn't guesswork— Google, Meta, and Perplexity have made it explicit in their developer notes. This is how you get human-looking robots in the bookstore. When humans go to websites, they see ads. Humans can be intrigued or enticed by other content, such as a link to a podcast about the same topic as an article they're reading. Humans can decide whether or not to pay for a subscription. Humans sometimes choose to make a transaction based on the information in front of them. Bots don't really do any of that (not yet, anyway). Large parts of the internet economy depend on human attention to websites, but as the report shows, that behavior drops off massively when someone uses AI to search the web—AI search engines provide very little in the way of referral traffic compared to traditional search. This of course is what's behind many of the lawsuits now in play between media companies and AI companies. How that is resolved in the legal realm is still TBD, but in the meantime, some media sites are choosing to block bots—or at least are attempting to—from accessing their content at all. For user agent bots, however, that ability has been taken away. The AI companies have always seen data harvesting in the way that's most favorable to their insatiable demand for it, famously claiming that data only needs to be 'publicly available' to qualify as training data. Even when they claim to respect for their search engines, it's an open secret that they sometimes use third-party scrapers to bypass it. Unmasking the bots So apart from suing and hoping for the best, how can publishers regain some, well, agency in the emerging world of agent traffic? If you believe AI substitution threatens your growth, there are additional defenses to consider. Hard paywalls are easier to defend, both technically and legally, and there are several companies (including TollBit, but there are others, such as ScalePost) that specialize in redirecting bot traffic to paywalled endpoints specifically for bots. If the robot doesn't pay, it's denied the content, at least in theory. Collective action is another possibility. I doubt publishers would launch a class action around this specific relabeling of user agents, but it does provide more ammunition in broader copyright lawsuits. Besides going to court, industry associations could come out against the move. The News/Media Alliance in particular has been very vocal about AI companies' alleged transgressions of copyright. The idea of treating agentic activity as the equivalent of human activity has consequences that go beyond the media. Any content or tool that's been traditionally available for free will need to reevaluate that access now that robots are destined to be a growing part of the mix. If there was any doubt that simply updating instructions was adequate, the TollBit report blew it out of the water. The stance that 'AI is just doing what humans do' is often used as a defense for when AI systems ingest large amounts of information and then produce new content based on it. Now the makers of those systems are quietly extending that idea, allowing their agents to effectively impersonate humans while shopping the web for data. Until it's clear how to build profitable stores for robots, there should be a way to force their masks off.
Yahoo
20-06-2025
- Business
- Yahoo
How this startup is helping publishers profit from AI scraping
"Artificial intelligence (AI) scraping" describes the practice of using AI to extract data from websites, oftentimes without the publishers' permission. AI scraping has been a key discussion amid the AI revolution, as the practice raises questions about ethical usage of the technology. Reddit (RDDT) is suing Anthropic ( alleging the AI company used its content without authorization. TollBit CEO and co-founder Toshit Panigrahi joins Catalysts to discuss how TollBit helps publishers protect their content from AI scraping. To watch more expert insights and analysis on the latest market action, check out more Catalysts here. This month, Reddit sued AI giant Anthropic, claiming that the OpenAI rival had accessed its platform of more than 100,000 times it accessed the platform since July of 2022 after Anthropic allegedly said it had blocked its bots from doing so. We sat down with the COO of Reddit to discuss the suit. What's important to us is that um That we are able to protect our users privacy, their deletion rights, like we have policies um that ensure that, you know, when users take down a post like the post is taken down. And so it's really important and as we said in our terms of service that, you know, we have a conversation with folks who have access to our data because that's a commitment that we have in terms of our policies. Our next guest has crunched the numbers on the size and scope of AI scraping by bots and helps publishers profit from the scraping. Joining us now we've got Tosit Panigrahi, who is the Tolbit CEO and co-founder. Tolbit is a New York-based startup that helps news publishers monitor and make money when AI companies scrape their content. Uh, by acting almost like a toll booth of the internet, did we get that right, Tosa? Yes, that's correct. Thanks for having me. Absolutely. So take us into your business and and what you're seeing more broadly here, especially as we're knowing and knowledge knowledgeable of how much scraping these AI engines need to do in order to get either knowledge sets that can then be used for generative AI efforts and they reach to basically every source part of the web that they can essentially get their hands on for free. Absolutely. So Tobit is a platform that helps, uh, websites of all sizes monitor, manage, and monetize their AI bot traffic, which essentially means we give them tools to to get an idea of how rampant the scraping might be on their site. We give them tools to block it and enforce content access rights, and then we give them, uh, I think a real innovation is our bot paywall, a tool that allows these AI bots to come in and actually pay for sanctioned access to that content. Um, and, and data, right? And I think one of the things that we're seeing, right, especially in the last quarter is the demand for not for uh uh content for training, content for retrieval at inference time when people ask the question, the bots have to go out and read and answer your question. Is AI training as it stands right now ethical from the standard practices that users expect when they're on the internet? I think this is a question that's bigger than all of us. I think we're definitely looking to some of the course to set, to decide and set some precedent as to whether or not training is fair use, but I think, uh, us as a business, right? And I think where some of the conversation should be going should be around, you know, these bots who are, who have to go out when you and I ask a question. Finding these platforms to go read that content, right? They don't know what happened to it. They don't know what the price of the ticket was if you wanted to go to France, right? They actually have to go out and access those sites to get that information. That will be a far bigger use case than, than, uh, just training as these tools continue to evolve. And so what, what is the revenue model like? How does the business make money? So, essentially, what with the technology that we built, we have built a gateway that any AI application, agent bot can come in through and actually pay uh through the form of micropayments for access to content and data, right? So it could be anything from what happened today, right? You want, you want to read um what happened on the news today to, I want to know, you know, what the price of the hotel is today in New York, and I want you to go, uh, book. It for me, right? And so we are able to uh let the website set those rules, set the the access protocols for what that content data should cost, and then we take a transaction fee on top of that for enabling this faster, cleaner, licensed access to the content. So it's a really fascinating business and I'm sure there's a very large total addressable market that just continues to grow at this juncture. Thanks so much for bringing this down. We appreciate it. Thank you.
Yahoo
17-06-2025
- Business
- Yahoo
‘This is coming for everyone': A new kind of AI bot takes over the web
People are replacing Google search with artificial intelligence tools like ChatGPT, a major shift that has unleashed a new kind of bot loose on the web. To offer users a tidy AI summary instead of Google's '10 blue links,' companies such as OpenAI and Anthropic have started sending out bots to retrieve and recap content in real time. They are scraping webpages and loading relevant content into the AI's memory and 'reading' far more content than a human ever would. Subscribe to The Post Most newsletter for the most important and interesting stories from The Washington Post. According to data shared exclusively with The Washington Post, traffic from retrieval bots grew 49 percent in the first quarter of 2025 from the fourth quarter of 2024. The data is from TollBit, a New York-based start-up that helps news publishers monitor and make money when AI companies use their content. TollBit's report, based on data from 266 websites - half of which are run by national and local news organizations - suggests that the growth of bots that retrieve information when a user prompts an AI model is on an exponential curve. 'It starts with publishers, but this is coming for everyone,' Toshit Panigrahi, CEO and co-founder of TollBit, said in an interview. Panigrahi said that this kind of bot traffic, which can be hard for websites to detect, reflects growing demand for content, even as AI tools devastate traffic to news sites and other online platforms. 'Human eyeballs to your site decreased. But the net amount of content access, we believe, fundamentally is going to explode,' he said. A spokesperson for OpenAI said that referral traffic to publishers from ChatGPT searches may be lower in quantity but that it reflects a stronger user intent compared with casual web browsing. To capitalize on this shift, websites will need to reorient themselves to AI visitors rather than human ones, Panigrahi said. But he also acknowledged that squeezing payment for content when AI companies argue that scraping online data is fair use will be an uphill climb, especially as leading players make their newest AI visitors even harder to identify. Debate around the AI industry's use of online content has centered on the gargantuan amounts of text needed to train the AI models that power tools like ChatGPT. To obtain that data, tech companies use bots that scrape the open web for free, which has led to a raft of lawsuits alleging copyright theft from book authors and media companies, including a New York Times lawsuit against OpenAI. Other news publishers have opted for licensing deals. (In April, The Washington Post inked a deal with OpenAI.) In the past eight months, as chatbots have evolved to incorporate features like web search and 'reasoning' to answer more complex queries, traffic for retrieval bots has skyrocketed. It grew 2.5 times as fast as traffic for bots that scrape data for training between the fourth quarter of 2024 and the first quarter of 2025, according to TollBit's report. Panigrahi said TollBit's data may underestimate the magnitude of this change because it doesn't reflect bots that AI companies send out on behalf of AI 'agents' that can complete tasks on a user's behalf, like ordering takeout from DoorDash. The start-up's findings also add a new dimension to mounting evidence that the modern internet - optimized for Google search results and social media algorithms - will have to be restructured as the popularity of AI answers grows. 'To think of it as, 'Well, I'm optimizing my search for humans' is missing out on a big opportunity,' he said. Installing TollBit's analytics platform is free for news publishers, and the company has more than 2,000 clients, many of which are struggling with these seismic changes, according to data in the report. Although news publishers and other websites can implement blockers to prevent various AI bots from scraping their content, TollBit found that more than 26 million AI scrapes bypassed those blockers in March alone. Some AI companies claim bots for AI agents don't need to follow bot instructions because they are acting on behalf of a user. Mark Howard, chief operating officer for the media company Time, a TollBit client, said the start-up's traffic data has helped Time negotiate content licensing deals with AI companies including OpenAI and the search engine Perplexity. But the market to fairly compensate publishers is far from established, Howard said. 'The vast majority of the AI bots out there absolutely are not sourcing the content through any kind of paid mechanism. … There is a very, very long way to go.' Related Content Field notes from the end of life: My thoughts on living while dying He's dying. She's pregnant. His one last wish is to fight his cancer long enough to see his baby. The U.S. granted these journalists asylum. Then it fired them.
Yahoo
14-06-2025
- Business
- Yahoo
‘This is coming for everyone': A new kind of AI bot takes over the web
People are replacing Google search with artificial intelligence tools like ChatGPT, a major shift that has unleashed a new kind of bot loose on the web. To offer users a tidy AI summary instead of Google's '10 blue links,' companies such as OpenAI and Anthropic have started sending out bots to retrieve and recap content in real time. They are scraping webpages and loading relevant content into the AI's memory and 'reading' far more content than a human ever would. Subscribe to The Post Most newsletter for the most important and interesting stories from The Washington Post. According to data shared exclusively with The Washington Post, traffic from retrieval bots grew 49 percent in the first quarter of 2025 from the fourth quarter of 2024. The data is from TollBit, a New York-based start-up that helps news publishers monitor and make money when AI companies use their content. TollBit's report, based on data from 266 websites - half of which are run by national and local news organizations - suggests that the growth of bots that retrieve information when a user prompts an AI model is on an exponential curve. 'It starts with publishers, but this is coming for everyone,' Toshit Panigrahi, CEO and co-founder of TollBit, said in an interview. Panigrahi said that this kind of bot traffic, which can be hard for websites to detect, reflects growing demand for content, even as AI tools devastate traffic to news sites and other online platforms. 'Human eyeballs to your site decreased. But the net amount of content access, we believe, fundamentally is going to explode,' he said. A spokesperson for OpenAI said that referral traffic to publishers from ChatGPT searches may be lower in quantity but that it reflects a stronger user intent compared with casual web browsing. To capitalize on this shift, websites will need to reorient themselves to AI visitors rather than human ones, Panigrahi said. But he also acknowledged that squeezing payment for content when AI companies argue that scraping online data is fair use will be an uphill climb, especially as leading players make their newest AI visitors even harder to identify. Debate around the AI industry's use of online content has centered on the gargantuan amounts of text needed to train the AI models that power tools like ChatGPT. To obtain that data, tech companies use bots that scrape the open web for free, which has led to a raft of lawsuits alleging copyright theft from book authors and media companies, including a New York Times lawsuit against OpenAI. Other news publishers have opted for licensing deals. (In April, The Washington Post inked a deal with OpenAI.) In the past eight months, as chatbots have evolved to incorporate features like web search and 'reasoning' to answer more complex queries, traffic for retrieval bots has skyrocketed. It grew 2.5 times as fast as traffic for bots that scrape data for training between the fourth quarter of 2024 and the first quarter of 2025, according to TollBit's report. Panigrahi said TollBit's data may underestimate the magnitude of this change because it doesn't reflect bots that AI companies send out on behalf of AI 'agents' that can complete tasks on a user's behalf, like ordering takeout from DoorDash. The start-up's findings also add a new dimension to mounting evidence that the modern internet - optimized for Google search results and social media algorithms - will have to be restructured as the popularity of AI answers grows. 'To think of it as, 'Well, I'm optimizing my search for humans' is missing out on a big opportunity,' he said. Installing TollBit's analytics platform is free for news publishers, and the company has more than 2,000 clients, many of which are struggling with these seismic changes, according to data in the report. Although news publishers and other websites can implement blockers to prevent various AI bots from scraping their content, TollBit found that more than 26 million AI scrapes bypassed those blockers in March alone. Some AI companies claim bots for AI agents don't need to follow bot instructions because they are acting on behalf of a user. Mark Howard, chief operating officer for the media company Time, a TollBit client, said the start-up's traffic data has helped Time negotiate content licensing deals with AI companies including OpenAI and the search engine Perplexity. But the market to fairly compensate publishers is far from established, Howard said. 'The vast majority of the AI bots out there absolutely are not sourcing the content through any kind of paid mechanism. … There is a very, very long way to go.' Related Content He's dying. She's pregnant. His one last wish is to fight his cancer long enough to see his baby. The U.S. granted these journalists asylum. Then it fired them. 'Enough is enough.' Why Los Angeles is still protesting, despite fear.