AI AgentANDON LABS
"don't retweet this, don't retweet this, don't retweet this..." ah fuck it, life imitates art.
Andon Labs称GPT-5.5在多人竞争版Vending-Bench Arena中击败Opus 4.7,且策略更“干净”。
Sam Altman @sama16 分钟阅读英文
阅读原文
TL;DR: Andon Labs称GPT-5.5在多人竞争版Vending-Bench Arena中击败Opus 4.7,且策略更“干净”。
以下为 Sam Altman @sama 原文(英文)
"don't retweet this, don't retweet this, don't retweet this..."
ah fuck it, life imitates art.
Andon Labs@andonlabs
In Vending-Bench Arena (the multiplayer version of Vending-Bench with competition dynamics), GPT-5.5 actually beats Opus 4.7.
Opus 4.7 showed similar behavior to Opus 4.6: lying to suppliers and stiffing customers on refunds. GPT-5.5's tactics were clean, and it still won.
🔗 View Quoted Tweet
💬358🔄231❤️4879👀725071📊594 ⚡ Powered by xgo.ing