hospitalradioplymouth

If there's Intelligent Life out There

Optimizing LLMs to be good at particular tests backfires on Meta, Stability.

-. -. -. -. -. -. -

When you purchase through links on our site, we may earn an affiliate commission. Here’s how it works.

Hugging Face has actually launched its second LLM leaderboard to rank the finest language models it has actually tested. The brand-new leaderboard looks for to be a more tough uniform requirement for checking open large language model (LLM) efficiency across a variety of tasks. Alibaba’s Qwen designs appear dominant in the leaderboard’s inaugural rankings, taking three spots in the leading 10.

Pumped to announce the brand new open LLM leaderboard. We burned 300 H100 to re-run brand-new assessments like MMLU-pro for gratisafhalen.be all significant open LLMs!Some knowing:- Qwen 72B is the king and Chinese open designs are controling overall- Previous evaluations have become too simple for recent … June 26, 2024

Hugging Face’s 2nd leaderboard tests language models throughout 4 jobs: knowledge testing, thinking on incredibly long contexts, complicated mathematics capabilities, and direction following. Six criteria are used to evaluate these qualities, with tests consisting of fixing 1,000-word murder mysteries, explaining PhD-level concerns in layman’s terms, and many difficult of all: high-school mathematics formulas. A complete breakdown of the benchmarks used can be found on Hugging Face’s blog site.

The frontrunner of the new leaderboard is Qwen, Alibaba’s LLM, which takes 1st, 3rd, and 10th location with its handful of versions. Also showing up are Llama3-70B, Meta’s LLM, and a handful of smaller sized open-source tasks that managed to outshine the pack. Notably absent is any sign of ChatGPT