If there's Intelligent Life out There
Adolph Cruickshank laboja lapu pirms 3 mēnešiem


Optimizing LLMs to be good at particular tests backfires on Meta, Stability.

-. -. -. -. -. -. -

When you purchase through links on our website, we might earn an affiliate commission. Here’s how it works.

Hugging Face has launched its 2nd LLM leaderboard to rank the finest language designs it has evaluated. The brand-new leaderboard seeks to be a more challenging consistent requirement for evaluating open big language design (LLM) efficiency across a range of jobs. Alibaba’s Qwen models appear dominant in the leaderboard’s inaugural rankings, taking 3 areas in the leading 10.

Pumped to announce the brand name brand-new open LLM leaderboard. We burned 300 H100 to re-run brand-new examinations like MMLU-pro for all significant open LLMs!Some knowing:- Qwen 72B is the king and Chinese open designs are controling total- Previous examinations have actually ended up being too easy for recent … June 26, 2024

Hugging Face’s second leaderboard tests language models throughout 4 tasks: understanding testing, reasoning on incredibly long contexts, intricate math capabilities, and guideline following. Six standards are utilized to check these qualities, with tests consisting of resolving 1,000-word murder secrets, explaining PhD-level questions in layperson’s terms, and many difficult of all: high-school math equations. A full breakdown of the benchmarks utilized can be discovered on Hugging Face’s blog site.

The frontrunner of the new leaderboard is Qwen, Alibaba’s LLM, which takes first, 3rd, and 10th place with its handful of variations. Also appearing are Llama3-70B, Meta’s LLM, and a handful of smaller sized open-source projects that managed to exceed the pack. Notably absent is any sign of ChatGPT