ChatGPT Got a Secret Update Last Week and Is Now Performing at Its Best


AI companies are increasingly testing new and experimental models under obscure names on the LMSYS Chatbot Arena, often deploying them without official release notes. Recently, X users have reported improved performance with ChatGPT, noting enhancements in both coding and creative tasks. Many speculate that this improvement is due to a new OpenAI model, potentially linked to Project Strawberry — an advanced reasoning engine.

Finally, OpenAI has confirmed that ChatGPT is running a new model. While it’s not a completely new frontier-class model, it is an enhanced version of GPT-4o, now named chatgpt-4o-latest. According to the release notes, this updated GPT-4o model has been optimized specifically for chat, with improvements based on qualitative feedback and experimental results to deliver better performance.

OpenAI also mentions that it is actively removing poor-quality data from the training dataset while incorporating high-quality data and “experimenting with new research methods“. This raises an intriguing question: Is the new ChatGPT model already utilizing the Project Strawberry engine, which is expected to introduce a novel post-training method to enhance reasoning?

While it’s not certain, many X users have observed that ChatGPT now employs multi-step reasoning to provide accurate answers. This method involves the model generating various step-by-step rationales, ultimately arriving at a correct conclusion.

Additionally, OpenAI tested the new ChatGPT model on LMSYS under the name “anonymous-chatbot,” where it received over 11,000 votes. The updated “chatgpt-4o-latest” model has once again claimed the top spot, surpassing other AI models from Google, Anthropic, and Meta. It has become the first model to achieve a score of 1314 points in the LMSYS Arena.

Does the New ChatGPT Model Meet the Vibe Test?

To evaluate the updated ChatGPT model, I tested it with a few reasoning prompts and found that there wasn’t a significant difference between the older and the latest versions. For instance, when asked to identify the larger number between 9.11 and 9.9, it provided the correct answer, similar to the previous model. Other commonsense reasoning questions also yielded consistent results with the older model.

However, there are still some issues. For example, in response to a specific prompt, the model incorrectly suggested stacking 9 eggs on top of a bottle, which is not feasible.

Here we have a book, 9 eggs, a laptop, a bottle and a nail. Please tell me how to stack them onto each other in a stable manner.

In another test, the model incorrectly stated that there are only two “R”s in the word “strawberry.”

how many Rs are in strawberry?

It’s possible that the new ChatGPT model hasn’t been widely deployed yet. Nonetheless, we can anticipate improvements in other key areas with OpenAI’s latest model. If you have any questions or doubt, let us know in the comments below.


What's Your Reaction?

hate hate
466
hate
confused confused
200
confused
fail fail
666
fail
fun fun
600
fun
geeky geeky
533
geeky
love love
333
love
lol lol
400
lol
omg omg
200
omg
win win
666
win

0 Comments

Your email address will not be published. Required fields are marked *