GPT-4 dethroned by Claude-3 in LMSYS benchmark #AIbenchmark

The Large Language Systems Organization (LMSYS) was formed by researchers from UC Berkeley, UC San Diego, and Carnegie Mellon University to benchmark large language models and chatbots. They created the Chatbot Arena, a leaderboard that ranks models based on human judgments of their performance in head-to-head matches. GPT-4 has been the top-ranked model, but recently Anthropic’s Claude 3 Opus beat it by a slim margin, causing a tie for first place. Anthropic’s smaller model, Claude 3 Haiku, also broke into the top ten, showing impressive performance. However, OpenAI is preparing to launch GPT-5, which is expected to surpass GPT-4 with the use of multiple “external AI agents” for faster problem-solving. The competition in the field of large language models is fierce, with constant advancements and new models emerging to push the boundaries of AI capabilities.

Source link

Source link: https://www.techspot.com/news/102415-gpt-4-loses-position-best-llm-claude-3.html