SWE-bench Pro
Complex multi-file reasoning tasks · Opus 4.5
Altab
54.1%
Augment
51.4%
Cursor
49.8%
Claude Code
48.2%
Codex CLI
45.9%
Based on internal testing. All agents tested with identical evaluation harness.