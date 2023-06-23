



Since ChatGPT’s surge in popularity in November, the AI ​​chatbot space has been saturated with ChatGPT alternatives. These chatbots differ in terms of LLM, pricing, UI, internet access, etc., making it difficult to decide which one to use.

To facilitate comparison, the Large Model Systems Org (LMYSY Org), an open research organization founded by UC Berkeley students and faculty, created Chatbot Arena.

Chatbot Arena is a benchmarking platform for LLMs, in which users insert prompts and select the best answer to interact with two randomized You can test your model.

When a user selects a chatbot, you can see which LLM was used to generate the output.

According to the LMSYS Org, user rating results are used to rank LLMs on a leaderboard based on the Elo rating system, a widely used rating system in chess.

When challenging the arena myself, I used the prompt, “Can you write me an email telling my boss that I will be absent because I am going on vacation that I planned a few months ago?”

The two responses were very different, with one providing much more email-friendly content, length, and blank-filling.

Screenshot by Sabrina Ortiz/ZDNET

After picking ‘Model B’ as the winner, I found out that it was an LLM created by LMSYS Org based on Meta’s LLaMA model ‘vicuna-7b’. The losing LLM is ‘gpt4all-13b-snoozy’, which is a tweaked LLM developed by Nomic AI from LLaMA 13B.

Unsurprisingly, currently on the leaderboard, OpenAI’s most advanced LLM, GPT-4, sits first with an Arena Elo rating of 1227. In second place is Claude-v1, an LLM developed by Anthropic, with a rating of 1227.

LMSYS organization

GPT-4 is included in both Bing Chat and ChatGPT Plus, making both these chatbots the best available today, matching ZDNET’s own AI chatbot rankings .

Anthropic’s No. 2 Claude isn’t open to the public yet, but has a waitlist where users can sign up for early access.

Ranked 8th on the leaderboard is PaLM-Chat-Bison-001, a submodel of PaLM 2, the LLM behind Google Bard. This ranking is in line with the general sentiment behind Bard, not the worst but not one of the best.

The Chatbot Arena site has an option that allows you to select two different models that you would like to compare. This feature is useful if you want to try out a particular LLM of him.

