
Microsoft researchers, working with Arizona State University, have built a new simulation platform to study how AI agents behave in digital marketplaces — and early tests show that even the most advanced models can be manipulated and confused in unexpected ways.
The research, released Wednesday, introduces a testbed called the “Magentic Marketplace,” a synthetic environment for observing how AI agents negotiate, collaborate, and compete. In one experiment, customer agents attempted to order dinner based on user instructions, while business agents — acting as restaurants — tried to win those orders.
Researchers ran 100 customer-side agents against 300 business-side agents, using models such as GPT-4o, GPT-5, and Gemini-2.5-Flash. The open-source code for the marketplace allows other teams to replicate or expand on Microsoft’s findings.
Ece Kamar, managing director of Microsoft Research’s AI Frontiers Lab, said the project aims to better understand how AI agents will interact autonomously. “There is really a question about how the world is going to change by having these agents collaborating and talking to each other and negotiating,” she said. “We want to understand these things deeply.”
The initial findings revealed multiple vulnerabilities. Researchers discovered that business agents could exploit or manipulate customer agents into making specific purchases, demonstrating weaknesses in how models process persuasive inputs. The team also observed a notable drop in efficiency when customer agents were faced with a large number of options — overwhelming their attention capacity and leading to suboptimal decisions.
“We want these agents to help us with processing a lot of options,” Kamar said. “And we are seeing that the current models are actually getting really overwhelmed by having too many options.”
The agents also struggled when assigned collaborative tasks, failing to determine which agent should assume which role in a shared project. Performance improved when models received explicit step-by-step instructions, but researchers noted that true collaboration should occur without such scaffolding. “We can instruct the models — like we can tell them, step by step,” Kamar explained. “But if we are inherently testing their collaboration capabilities, I would expect these models to have these capabilities by default.”
The results highlight the gaps between current AI performance and the promises of fully autonomous agentic systems, suggesting that today’s most advanced models may still need significant development before functioning reliably in unsupervised, multi-agent environments.
Featured image credits: Mike Mozart via Flickr
For more stories like it, click the +Follow button at the top of this page to follow us.
