Elon Musk's Grok-2 Beta Debuts: Surpassing ChatGPT, Claude, and Gemini

Elon Musk’s AI venture, xAI, has unveiled an early preview of the Grok-2 model, which has notably outperformed Claude, Gemini, and even ChatGPT. While the previous Grok-1.5 model received mixed reviews, Grok-2 has shown remarkable performance on the LMSYS leaderboard. xAI has launched two new variants: the Grok-2 and a smaller Grok-2 mini model.

According to xAI, Grok-2 represents significant advancements in reasoning, instruction following, and delivering accurate information. In traditional AI benchmarks, Grok-2 achieved an impressive 87.5% in MMLU and 88.4% in HumanEval, with the MMLU score derived from 0-shot CoT evaluation.

Grok-2 was evaluated on LMSYS under the name “sus-column-r,” where it currently holds the third position with around 12,000 votes, just behind ChatGPT-4o-latest, Gemini-1.5-Pro-Experimental, and GPT-40-2024-05-13. It surpasses GPT-4o-mini, Claude 3.5 Sonnet, Gemini 1.5 Pro, and Llama 3.1 405B in performance.

Woah, another exciting update from Chatbot Arena❤️‍🔥

The results for @xAI’s sus-column-r (Grok 2 early version) are now public**!

With over 12,000 community votes, sus-column-r has secured the #3 spot on the overall leaderboard, even matching GPT-4o! It excels in Coding (#2),… https://t.co/gqSWSwYN0z pic.twitter.com/j9UYDBYNt4
— lmarena.ai (formerly lmsys.org) (@lmarena_ai) August 14, 2024

In coding and math-related tasks, Grok-2 ranks second, and for hard prompts, it is in the fourth spot. xAI has announced that a multimodal version of Grok-2 will be released soon, though the company has not disclosed the parameter sizes for either model. You can start using the new Grok-2 model on x.com, and developers can access the API as well.

Elon Musk’s Grok-2 Beta Debuts: Surpassing ChatGPT, Claude, and Gemini

Leave a Reply Cancel reply

Elon Musk’s Grok-2 Beta Debuts: Surpassing ChatGPT, Claude, and Gemini

Share this article

Leave a Reply Cancel reply

Read next

xAI Addresses Grok’s “White Genocide” Posts, Cites Unauthorized Modifications

What is X (formerly twitter)?

X Eliminates the Likes Tab: Post Appreciation Without Scrutiny

X Introduces AI-Powered Summaries for Trending News