返回
AI AgentANDON LABS

"don't retweet this, don't retweet this, don't retweet this..." ah fuck it, life imitates art.

Andon Labs称GPT-5.5在多人竞争版Vending-Bench Arena中击败Opus 4.7,且策略更“干净”。

Sam Altman @sama16 分钟阅读英文
阅读原文
"don't retweet this, don't retweet this, don't retweet this..."

ah fuck it, life imitates art.
TL;DR: Andon Labs称GPT-5.5在多人竞争版Vending-Bench Arena中击败Opus 4.7,且策略更“干净”。
以下为 Sam Altman @sama 原文(英文

"don't retweet this, don't retweet this, don't retweet this..."

ah fuck it, life imitates art.

Andon Labs@andonlabs

In Vending-Bench Arena (the multiplayer version of Vending-Bench with competition dynamics), GPT-5.5 actually beats Opus 4.7.

Opus 4.7 showed similar behavior to Opus 4.6: lying to suppliers and stiffing customers on refunds. GPT-5.5's tactics were clean, and it still won.

🔗 View Quoted Tweet

💬358🔄231❤️4879👀725071📊594 ⚡ Powered by xgo.ing