If there's Intelligent Life out There
alanpiesse9873 hat diese Seite bearbeitet vor 9 Monaten


Optimizing LLMs to be excellent at specific tests backfires on Meta, Stability.

-. -. -. -. -. -. -

When you purchase through links on our site, we may make an affiliate commission. Here’s how it works.

Hugging Face has actually launched its second LLM leaderboard to rank the best language models it has evaluated. The new leaderboard looks for to be a more tough uniform requirement for checking open big language design (LLM) efficiency throughout a variety of tasks. Alibaba’s Qwen designs appear dominant in the leaderboard’s inaugural rankings, taking three areas in the leading 10.

Pumped to reveal the brand new open LLM leaderboard. We burned 300 H100 to re-run brand-new assessments like MMLU-pro for all major trademarketclassifieds.com open LLMs!Some knowing:- Qwen 72B is the king and Chinese open models are controling overall- Previous evaluations have actually ended up being too easy for recent … June 26, 2024

Hugging Face’s second leaderboard tests language models across 4 tasks: knowledge testing, reasoning on incredibly long contexts, complex mathematics capabilities, and guideline following. Six standards are utilized to check these qualities, with tests consisting of solving 1,000-word murder secrets, explaining PhD-level concerns in layman’s terms, and many daunting of all: high-school mathematics formulas. A complete breakdown of the benchmarks utilized can be discovered on Hugging Face’s blog site.

The frontrunner of the new leaderboard is Qwen, Alibaba’s LLM, which takes first, 3rd, and 10th location with its handful of variants. Also revealing up are Llama3-70B, Meta’s LLM, wiki.rrtn.org and a handful of smaller open-source jobs that managed to outperform the pack. Notably absent is any indication of ChatGPT