Anthropic’s Claude Opus 4 and Sonnet 4 Raise the Bar in AI Coding Performance

anthropic-launches-claude-opus-4-and-claude-sonnet-4-ai-models-azmotech

On Thursday, Anthropic unveiled two new AI models in the Claude 4 lineup Claude Opus 4 and Claude Sonnet 4. According to Anthropic, Claude Opus 4 is the “world’s best coding model,” delivering consistent performance on extended, complex workflows. Meanwhile, Claude Sonnet 4 offers improved coding and reasoning capabilities compared to its predecessor, Claude Sonnet 3.7.

Let’s start with the Claude Opus 4 AI model. On the SWE-bench verified benchmark, which evaluates real-world software engineering tasks, Claude Opus 4 scored 72.5%, edging out OpenAI’s top coding model, Codex-1, which scored 72.1%. Even more impressive, when using parallel test-time compute similar to Deep Think mode in Gemini 2.5 Pro Opus 4 reached an outstanding 79.4%.

Interestingly, the Claude Sonnet 4 model scores 72.7% on the SWE-bench, and when using parallel test-time compute, it reaches 80.2% accuracy surpassing the coding performance of the larger Opus 4 model.

claude-opus-4-and-claude-sonnet-4-performance-on-swe-bench-azmotech

Anthropic explains that the Claude Sonnet 4 model “balances performance and efficiency for internal and external use cases, with enhanced steerability for greater control over implementations. While not matching Opus 4 in most domains, it delivers an optimal mix of capability and practicality.

Claude Opus 4 shines in handling complex, long-duration tasks and agentic workflows, whereas Claude Sonnet 4 delivers a solid blend of coding performance and efficiency. Both are hybrid reasoning models, capable of providing quick responses as well as taking extra time for more in-depth reasoning.

Anthropic also highlights that when Claude Opus 4 has access to local files, it can store important information in a memory file. For instance, while playing Pokémon, the model generated a navigation guide file to enhance its gameplay.

Lastly, regarding safety, Anthropic has introduced AI Safety Level 3 (ASL-3) for the Claude Opus 4 model, marking a first in their Responsible Scaling Policy (RSP). They’ve put in place Constitutional Classifiers and other safeguards to block jailbreaking attempts.

The Claude 4 models are being rolled out to all paid users across Pro, Max, Team, and Enterprise plans. Fortunately, Claude Sonnet 4 is also available to free users, though without the extended thinking feature.

Share this article
Shareable URL
Leave a Reply

Your email address will not be published. Required fields are marked *

Read next
0
Share