Your benchmark has Opus 4.7 performing significantly worse than Sonnet 4.6. Even...

guilamu · 2026-04-24T20:22:53 1777062173

Yes Opus 4.7 fast (no reasoning) did a worst job than Sonnet 4.6 high (with reasoning) according to Gemini 3.1 Pro evaluation.

ac29 · 2026-04-24T20:32:32 1777062752

Your table doesn't indicate reasoning vs non-reasoning, or reasoning level

guilamu · 2026-04-24T20:36:16 1777062976

When nothing is noted it's max reasoning (xhigh in copilot chat in vscode if available).

The models not availble on copilot were tested through opencode (max reasoning) and deepseek v4 was tested through Cline (with max reasoning too).