Simon Willison's Weblog
carmellaeve67 redigerade denna sida 5 månader sedan


That model was trained in part using their unreleased R1 “reasoning” design. Today they’ve launched R1 itself, along with an entire family of brand-new models obtained from that base.

There’s an entire lot of stuff in the brand-new release.

DeepSeek-R1-Zero seems the base model. It’s over 650GB in size and, like the majority of their other releases, is under a clean MIT license. DeepSeek caution that “DeepSeek-R1-Zero comes across difficulties such as unlimited repetition, bad readability, and language mixing.” … so they likewise launched:

DeepSeek-R1-which “includes cold-start data before RL” and “attains performance similar to OpenAI-o1 across mathematics, code, and reasoning jobs”. That one is also MIT accredited, and is a similar size.

I do not have the capability to run designs larger than about 50GB (I have an M2 with 64GB of RAM), so neither of these two designs are something I can easily have fun with myself. That’s where the brand-new distilled designs are available in.

To support the research community, we have open-sourced DeepSeek-R1-Zero, DeepSeek-R1, and six dense designs distilled from DeepSeek-R1 based on Llama and Qwen.

This is a fascinating flex! They have actually models based upon Qwen 2.5 (14B, 32B, Math 1.5 B and Math 7B) and Llama 3 (Llama-3.1 8B and Llama 3.3 70B Instruct).

Weirdly those Llama models have an MIT license attached, which I’m uncertain works with the underlying Llama license. Qwen designs are Apache accredited so perhaps MIT is OK?

(I likewise just observed the MIT license files state “Copyright © 2023 DeepSeek” so they might need to pay a little bit more attention to how they copied those in.)

Licensing aside, these distilled models are fascinating monsters.

Running DeepSeek-R1-Distill-Llama-8B-GGUF

Quantized versions are currently beginning to appear. Up until now I have actually attempted simply one of those- unsloth/DeepSeek-R 1-Distill-Llama-8B-GGUF released by Unsloth AI-and it’s actually enjoyable to have fun with.

I’m running it using the combination of Ollama, LLM and the llm-ollama plugin.

First I fetched and ran the model using Ollama itself:

This downloads around 8.5 GB of design information and begins an interactive chat interface.

Once the model has been brought, LLM can speak with it as well. I choose utilizing LLM for experiments since it logs whatever to SQLite for later expedition.

I installed the plugin and [users.atw.hu](http://users.atw.hu/samp-info-forum/index.php?PHPSESSID=96e20aa5755bc27c00057e64c8978151&action=profile