If there's Intelligent Life out There
armandold6211 редагував цю сторінку 6 місяці тому


Optimizing LLMs to be great at particular tests backfires on Meta, Stability.

-. -. -. -. -. -. -

When you acquire through links on our website, we may earn an affiliate commission. Here’s how it works.

Hugging Face has actually launched its second LLM leaderboard to rank the very best language models it has tested. The new leaderboard looks for to be a more tough consistent standard for evaluating open large language model (LLM) performance throughout a range of tasks. Alibaba’s Qwen models appear dominant in the leaderboard’s inaugural rankings, taking 3 areas in the leading 10.

Pumped to announce the brand name brand-new open LLM leaderboard. We burned 300 H100 to re-run brand-new assessments like for wikitravel.org all significant open LLMs!Some learning:- Qwen 72B is the king and Chinese open models are dominating overall- Previous assessments have actually ended up being too easy for current … June 26, 2024

Hugging Face’s 2nd leaderboard tests language models across four jobs: understanding screening, thinking on exceptionally long contexts, complicated math abilities, and guideline following. Six criteria are used to evaluate these qualities, with tests consisting of solving 1,000-word murder secrets, explaining PhD-level questions in layperson’s terms, and many challenging of all: high-school math equations. A complete breakdown of the standards used can be found on Hugging Face’s blog.

The frontrunner of the new leaderboard is Qwen, Alibaba’s LLM, which takes 1st, 3rd, and 10th place with its handful of variations. Also showing up are Llama3-70B, Meta’s LLM, and a handful of smaller sized open-source projects that handled to outshine the pack. Notably missing is any sign of ChatGPT