
Elon Musk’s AI venture, xAI, has unveiled an early preview of the Grok-2 model, which has notably outperformed Claude, Gemini, and even ChatGPT. While the previous Grok-1.5 model received mixed reviews, Grok-2 has shown remarkable performance on the LMSYS leaderboard. xAI has launched two new variants: the Grok-2 and a smaller Grok-2 mini model.
According to xAI, Grok-2 represents significant advancements in reasoning, instruction following, and delivering accurate information. In traditional AI benchmarks, Grok-2 achieved an impressive 87.5% in MMLU and 88.4% in HumanEval, with the MMLU score derived from 0-shot CoT evaluation.

Grok-2 was evaluated on LMSYS under the name “sus-column-r,” where it currently holds the third position with around 12,000 votes, just behind ChatGPT-4o-latest, Gemini-1.5-Pro-Experimental, and GPT-40-2024-05-13. It surpasses GPT-4o-mini, Claude 3.5 Sonnet, Gemini 1.5 Pro, and Llama 3.1 405B in performance.
In coding and math-related tasks, Grok-2 ranks second, and for hard prompts, it is in the fourth spot. xAI has announced that a multimodal version of Grok-2 will be released soon, though the company has not disclosed the parameter sizes for either model. You can start using the new Grok-2 model on x.com, and developers can access the API as well.