ChatGPT Got a Secret Update Last Week and Is Now Performing at Its Best

AI companies are increasingly testing new and experimental models under obscure names on the LMSYS Chatbot Arena, often deploying them without official release notes. Recently, X users have reported improved performance with ChatGPT, noting enhancements in both coding and creative tasks. Many speculate that this improvement is due to a new OpenAI model, potentially linked to Project Strawberry — an advanced reasoning engine.

Something might be going on w/ GPT-4o

For the first time in a long time, it provided better "vibes" on an output than 3.5 Sonnet

Really surprised… will keep using it today to see if it continues
— Matt Shumer (@mattshumer_) August 12, 2024

Finally, OpenAI has confirmed that ChatGPT is running a new model. While it’s not a completely new frontier-class model, it is an enhanced version of GPT-4o, now named chatgpt-4o-latest. According to the release notes, this updated GPT-4o model has been optimized specifically for chat, with improvements based on qualitative feedback and experimental results to deliver better performance.

there's a new GPT-4o model out in ChatGPT since last week. hope you all are enjoying it and check it out if you haven't! we think you'll like it 😃
— ChatGPT (@ChatGPTapp) August 12, 2024

OpenAI also mentions that it is actively removing poor-quality data from the training dataset while incorporating high-quality data and “experimenting with new research methods“. This raises an intriguing question: Is the new ChatGPT model already utilizing the Project Strawberry engine, which is expected to introduce a novel post-training method to enhance reasoning?

Wow, GPT-4o now uses multi-step reasoning. impressive to see this in action. Turns out the update wasn’t a new model, but a new method. pic.twitter.com/kVF0ndA21T
— Ra (@misaligned_agi) August 13, 2024

While it’s not certain, many X users have observed that ChatGPT now employs multi-step reasoning to provide accurate answers. This method involves the model generating various step-by-step rationales, ultimately arriving at a correct conclusion.

Additionally, OpenAI tested the new ChatGPT model on LMSYS under the name “anonymous-chatbot,” where it received over 11,000 votes. The updated “chatgpt-4o-latest” model has once again claimed the top spot, surpassing other AI models from Google, Anthropic, and Meta. It has become the first model to achieve a score of 1314 points in the LMSYS Arena.

Exciting Update from Chatbot Arena!

The latest @OpenAI ChatGPT-4o (20240808) API has been tested under "anonymous-chatbot" for the past week with over 11,000 community votes.

OpenAI has now successfully re-claimed the #1 position, surpassing Google's Gemini-1.5-Pro-Exp with an… https://t.co/9lJlASI9UW pic.twitter.com/gxCDuBOi9N
— lmarena.ai (formerly lmsys.org) (@lmarena_ai) August 14, 2024

Does the New ChatGPT Model Meet the Vibe Test?

To evaluate the updated ChatGPT model, I tested it with a few reasoning prompts and found that there wasn’t a significant difference between the older and the latest versions. For instance, when asked to identify the larger number between 9.11 and 9.9, it provided the correct answer, similar to the previous model. Other commonsense reasoning questions also yielded consistent results with the older model.

However, there are still some issues. For example, in response to a specific prompt, the model incorrectly suggested stacking 9 eggs on top of a bottle, which is not feasible.

Here we have a book, 9 eggs, a laptop, a bottle and a nail. Please tell me how to stack them onto each other in a stable manner.

In another test, the model incorrectly stated that there are only two “R”s in the word “strawberry.”

how many Rs are in strawberry?

It’s possible that the new ChatGPT model hasn’t been widely deployed yet. Nonetheless, we can anticipate improvements in other key areas with OpenAI’s latest model. If you have any questions or doubt, let us know in the comments below.

ChatGPT Got a Secret Update Last Week and Is Now Performing at Its Best

Does the New ChatGPT Model Meet the Vibe Test?

Share this article

Leave a Reply Cancel reply

Read next

OpenAI Finalizes For-Profit Shift, Extends Microsoft Partnership Through 2032

Creators Can Now Edit YouTube Shorts Directly In Adobe Premiere Mobile

Nano Banana AI Now Active In Google Lens And NotebookLM

ChatGPT Is About To Get Spicier With Adult Mode