Anthropic’s Claude Opus 4 and Sonnet 4 Raise the Bar in AI

On Thursday, Anthropic unveiled two new AI models in the Claude 4 lineup Claude Opus 4 and Claude Sonnet 4. According to Anthropic, Claude Opus 4 is the “world’s best coding model,” delivering consistent performance on extended, complex workflows. Meanwhile, Claude Sonnet 4 offers improved coding and reasoning capabilities compared to its predecessor, Claude Sonnet 3.7.

Let’s start with the Claude Opus 4 AI model. On the SWE-bench verified benchmark, which evaluates real-world software engineering tasks, Claude Opus 4 scored 72.5%, edging out OpenAI’s top coding model, Codex-1, which scored 72.1%. Even more impressive, when using parallel test-time compute similar to Deep Think mode in Gemini 2.5 Pro Opus 4 reached an outstanding 79.4%.

Interestingly, the Claude Sonnet 4 model scores 72.7% on the SWE-bench, and when using parallel test-time compute, it reaches 80.2% accuracy surpassing the coding performance of the larger Opus 4 model.

claude-opus-4-and-claude-sonnet-4-performance-on-swe-bench-azmotech

Anthropic explains that the Claude Sonnet 4 model “balances performance and efficiency for internal and external use cases, with enhanced steerability for greater control over implementations. While not matching Opus 4 in most domains, it delivers an optimal mix of capability and practicality.“

Claude Opus 4 shines in handling complex, long-duration tasks and agentic workflows, whereas Claude Sonnet 4 delivers a solid blend of coding performance and efficiency. Both are hybrid reasoning models, capable of providing quick responses as well as taking extra time for more in-depth reasoning.

Anthropic also highlights that when Claude Opus 4 has access to local files, it can store important information in a memory file. For instance, while playing Pokémon, the model generated a navigation guide file to enhance its gameplay.

Lastly, regarding safety, Anthropic has introduced AI Safety Level 3 (ASL-3) for the Claude Opus 4 model, marking a first in their Responsible Scaling Policy (RSP). They’ve put in place Constitutional Classifiers and other safeguards to block jailbreaking attempts.

The Claude 4 models are being rolled out to all paid users across Pro, Max, Team, and Enterprise plans. Fortunately, Claude Sonnet 4 is also available to free users, though without the extended thinking feature.

Anthropic’s Claude Opus 4 and Sonnet 4 Raise the Bar in AI Coding Performance

FAQ

1. What are Claude Opus 4 and Claude Sonnet 4?

2. Why is Claude Opus 4 considered the best coding model?

3. How does Claude Sonnet 4 compare to Opus 4?

4. What are the key strengths of these models?

5. Are these models available to all users?

Leave a Reply Cancel reply

Anthropic’s Claude Opus 4 and Sonnet 4 Raise the Bar in AI Coding Performance

FAQ

Share this article

Leave a Reply Cancel reply

Read next