Benchmarks from page 4 of the model card: | Benchmark | 3 Pro | 2.5 Pro | Sonnet 4.5 | GPT-5.1 | |-----------------------|-----------|---------|------------|-----------| | Humanity's Last Exam | 37.5% | 21.6% | 13.7% | 26.5% | | ARC-AGI-2 | 31.1% | 4.9% | 13.6% | 17.6% | | GPQA D