SWE-bench Pro

SWE-bench Pro

Complex multi-file reasoning tasks · Opus 4.5

Altab

54.1%

Augment

51.4%

Cursor

49.8%

Claude Code

48.2%

Codex CLI

45.9%

Based on internal testing. All agents tested with identical evaluation harness.