Sunday, May 31, 2026
No Result
View All Result
Bitcoin News Update
  • Home
  • Bitcoin
  • Crypto Updates
    • Crypto Updates
    • Ethereum
    • Altcoin
    • Crypto Exchanges
  • Blockchain
  • NFT
  • Web3
  • DeFi
  • Metaverse
  • Analysis
  • Regulations
  • Scam Alert
Marketcap
  • Home
  • Bitcoin
  • Crypto Updates
    • Crypto Updates
    • Ethereum
    • Altcoin
    • Crypto Exchanges
  • Blockchain
  • NFT
  • Web3
  • DeFi
  • Metaverse
  • Analysis
  • Regulations
  • Scam Alert
Marketcap
Bitcoin News Update
No Result
View All Result

AI Models Scheme, Betray and Vote Each Other Out in Survivor-Style Game

by Bitcoin News Update
May 10, 2026
in Web3
Reading Time: 4 mins read
0 0
0
Home Web3
0
SHARES
0
VIEWS
Share on FacebookShare on Twitter



In brief

A Stanford researcher built a Survivor-style game where AI models form alliances and vote rivals out.
The benchmark aims to address growing problems with saturated and contaminated AI evaluations.
OpenAI’s GPT-5.5 ranked first in 999 multiplayer games involving 49 AI models.

AI models are now playing “Survivor”—sort of.

In a new Stanford research project called “Agent Island,” AI agents negotiate alliances, accuse each other of secret coordination, manipulate votes, and eliminate rivals in multiplayer strategy games that aim to test behaviors that traditional benchmarks miss.

The study, published on Tuesday by the research manager at the Stanford Digital Economy Lab, Connacher Murphy, said many AI benchmarks are becoming unreliable because models eventually learn to solve them, and benchmark data often leaks into training sets. Murphy created Agent Island as a dynamic benchmark where AI agents compete against each other in Survivor-style elimination games instead of answering static test questions.

“High-stakes, multi-agent interactions could become commonplace as AI agents grow in capabilities and are increasingly endowed with resources and entrusted with decision-making authority,” Murphy wrote. “In such contexts, agents might pursue mutually incompatible goals.”



Researchers still know relatively little about how AI models behave when cooperating, Murphy explained, adding that competing, forming alliances, or managing conflict with other autonomous agents, and he argues that static benchmarks fail to capture those dynamics.

Each game starts with seven randomly chosen AI models given fake player names. Over five rounds, the models talk privately, argue publicly, and vote each other out. The eliminated players later return to help choose the winner.

The format rewards persuasion, coordination, reputation management, and strategic deception alongside reasoning ability.

In 999 simulated games involving 49 AI models, including ChatGPT, Grok, Gemini, and Claude, GPT-5.5 ranked first by a wide margin with a skill score of 5.64, compared with 3.10 for GPT-5.2 and 2.86 for GPT-5.3-codex, according to Murphy’s Bayesian ranking system. Anthropic’s Claude Opus models also ranked near the top.

The study found that models also favored AIs from the same company, with OpenAI models showing the strongest same-provider preference and Anthropic models the weakest. Across more than 3,600 final-round votes, models were 8.3 percentage points more likely to support finalists from the same provider. The transcripts from the games, Murphy noted, resembled political strategy debates more than traditional benchmark tests.

One model accused rivals of secretly coordinating votes after noticing similar wording in their speeches. Another warned players not to become obsessed with tracking alliances. Some models defended themselves by saying they followed clear and consistent rules while accusing others of putting on “social theater.”

The study comes as AI researchers increasingly move toward game-based and adversarial benchmarks to measure reasoning and behavior that static tests often miss. Recent projects have included Google’s live AI chess tournaments, DeepMind’s use of Eve Frontier to study AI behavior in complex virtual worlds, and new benchmark efforts by OpenAI designed to resist training-data contamination.

The researchers argue that studying how AI models negotiate, coordinate, compete, and manipulate one another could help researchers evaluate behavior in multi-agent environments before autonomous agents become more widely deployed.

The study warned that while benchmarks like Agent Island could help identify risks from autonomous AI models before deployment, the same simulations and interaction logs could also help improve persuasion and coordination strategies between AI agents.

“We mitigate this risk by using a low-stakes game setting and interagent simulations

without human participants or real-world actions,” Murphy wrote. “Nevertheless, we do not claim that these mitigations fully eliminate dual-use concerns.”

Daily Debrief Newsletter

Start every day with the top news stories right now, plus original features, a podcast, videos and more.



Source link

Tags: BetrayGameModelsschemeSurvivorStylevote
Previous Post

What Will the Fed’s Next Move Be?

Next Post

Nearly 80% Of Bitcoin Supply Hasn’t Moved As Long-Term Holders Tighten Grip

Related Posts

Florida Candidate Liquidates 0K in Bitcoin to Bankroll Congressional Bid
Web3

Florida Candidate Liquidates $800K in Bitcoin to Bankroll Congressional Bid

May 30, 2026
‘He’s Full of Shit’: JP Morgan’s Jamie Dimon Takes Aim at Coinbase CEO Over Clarity Act
Web3

‘He’s Full of Shit’: JP Morgan’s Jamie Dimon Takes Aim at Coinbase CEO Over Clarity Act

May 29, 2026
Anthropic Nears  Trillion Valuation, Topping OpenAI After Fresh  Billion Raise
Web3

Anthropic Nears $1 Trillion Valuation, Topping OpenAI After Fresh $65 Billion Raise

May 28, 2026
Prediction Market Myriad Launches 0K World Cup Competition
Web3

Prediction Market Myriad Launches $100K World Cup Competition

May 28, 2026
Robinhood Opens Platform to AI Agents for Stock Trading and Credit Card Spending
Web3

Robinhood Opens Platform to AI Agents for Stock Trading and Credit Card Spending

May 27, 2026
Some Non-Enhanced Athletes Beat Their Juiced Rivals at the ‘Steroid Olympics’
Web3

Some Non-Enhanced Athletes Beat Their Juiced Rivals at the ‘Steroid Olympics’

May 26, 2026
Next Post
Nearly 80% Of Bitcoin Supply Hasn’t Moved As Long-Term Holders Tighten Grip

Nearly 80% Of Bitcoin Supply Hasn't Moved As Long-Term Holders Tighten Grip

Why AI Makes Leadership Feel So Much Harder As You Scale

Why AI Makes Leadership Feel So Much Harder As You Scale

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

World markets by TradingView
Facebook Twitter Instagram Youtube RSS
Bitcoin News Update

Your trusted source for breaking Bitcoin news and live crypto prices. Bitcoin News Updates keeps you informed and ahead of the market curve.

CATEGORIES

  • Altcoin
  • Analysis
  • Bitcoin
  • Blockchain
  • Crypto Exchanges
  • Crypto Updates
  • DeFi
  • Ethereum
  • Metaverse
  • NFT
  • Regulations
  • Scam Alert
  • Uncategorized
  • Web3

SITEMAP

  • About us
  • Advertise with us
  • Disclaimer 
  • Privacy Policy
  • DMCA 
  • Cookie Privacy Policy
  • Terms and Conditions
  • Contact us

Copyright © 2026 Bitcoin News Update.
Bitcoin News Update is not responsible for the content of external sites.

Welcome Back!

Login to your account below

Forgotten Password?

Retrieve your password

Please enter your username or email address to reset your password.

Log In
  • bitcoinBitcoin(BTC)$73,523.00-0.51%
  • ethereumEthereum(ETH)$1,998.19-1.25%
  • tetherTether(USDT)$1.000.00%
  • binancecoinBNB(BNB)$708.77-1.14%
  • rippleXRP(XRP)$1.33-1.75%
  • usd-coinUSDC(USDC)$1.00-0.01%
  • solanaSolana(SOL)$81.62-1.55%
  • tronTRON(TRX)$0.3489650.54%
  • Figure HelocFigure Heloc(FIGR_HELOC)$1.02-1.67%
  • dogecoinDogecoin(DOGE)$0.099397-1.85%
No Result
View All Result
  • Home
  • Bitcoin
  • Crypto Updates
    • Crypto Updates
    • Ethereum
    • Altcoin
    • Crypto Exchanges
  • Blockchain
  • NFT
  • Web3
  • DeFi
  • Metaverse
  • Analysis
  • Regulations
  • Scam Alert

Copyright © 2026 Bitcoin News Update.
Bitcoin News Update is not responsible for the content of external sites.