If there's Intelligent Life out There
Alica Scarborough редактировал эту страницу 6 месяцев назад


Optimizing LLMs to be proficient at specific tests backfires on Meta, Stability.

-. -. -. -. -. -. -

When you acquire through links on our website, we might make an affiliate commission. Here’s how it works.

Hugging Face has launched its 2nd LLM leaderboard to rank the finest language designs it has tested. The new leaderboard looks for to be a more difficult consistent requirement for testing open big language design (LLM) efficiency throughout a range of jobs. Alibaba’s Qwen models appear dominant in the leaderboard’s inaugural rankings, trademarketclassifieds.com taking three spots in the top 10.

Pumped to reveal the brand name new open LLM leaderboard. We burned 300 H100 to re-run brand-new evaluations like MMLU-pro for all significant open LLMs!Some learning:- Qwen 72B is the king and Chinese open designs are controling total- Previous evaluations have actually ended up being too easy for current … June 26, 2024

Hugging Face’s second leaderboard tests language models throughout 4 tasks: knowledge testing, reasoning on extremely long contexts, intricate math capabilities, and direction following. Six criteria are utilized to test these qualities, mediawiki.hcah.in with tests consisting of fixing 1,000-word murder mysteries, explaining PhD-level concerns in layperson’s terms, and annunciogratis.net a lot of difficult of all: high-school math equations. A full breakdown of the benchmarks used can be discovered on Hugging Face’s blog.

The frontrunner of the new leaderboard is Qwen, Alibaba’s LLM, which takes first, lespoetesbizarres.free.fr 3rd, and 10th place with its handful of variants. Also revealing up are Llama3-70B, Meta’s LLM, and a handful of smaller open-source projects that handled to surpass the pack. Notably missing is any indication of ChatGPT