DeepSeek V3.1 Debuts, Yet Still Trails Behind GPT-5 and Claude Opus

deepseek-azmotech

DeepSeek, a Chinese AI startup, has unveiled its latest hybrid reasoning model, DeepSeek V3.1, built for agentic tasks and tool usage. The model offers two modes Think and Non-Think with the ability to automatically extend its reasoning process if a query requires more time. Users can toggle between the modes using the “DeepThink” button.

In Non-Think mode, the system runs on deepseek-chat, while Think mode leverages deepseek-reasoner. Both support a 128K token context length and utilize 37B active parameters out of a total 671B. The DeepSeek V3.1 Base, built on 840B training tokens as an upgrade to V3, demonstrates impressive performance, especially in multi-step reasoning tasks.

For example, on SWE-bench Verified a benchmark that measures coding performance on real-world software engineering tasks DeepSeek V3.1 scored 66.0%, a big jump from DeepSeek R1-0528’s 44.6%. For comparison, OpenAI’s GPT-5 Thinking reached 74.9%, while Anthropic’s Claude Opus 4.1 scored 74.5%.

On Humanity’s Last Exam (HLE), DeepSeek V3.1 achieved 29.8% with tool calling, and on GPQA Diamond, it scored 81%. Overall, the new DeepSeek V3.1 surpasses the older R1-0528 model, but it still lags behind GPT-5 and Claude 4 in performance. As for pricing, the DeepSeek V3.1 API is set at 0.56 (input) / \$1.68 (output) per 1M tokens.

Share this article
Shareable URL
Leave a Reply

Your email address will not be published. Required fields are marked *

Read next
0
Share