Member-only story

DeepSeek R1 Distilled Models in Ollama: Not What You Think

DeepSeek R1’s distilled models in Ollama sound like smaller versions of the original, but are they really?

Kshitij Darwhekar
Towards AI
5 min readJan 30, 2025
AI-generated using ChatGPT by Author

Don’t have a paid Medium membership (yet)? You can read the entire article for free by clicking here with my friend’s link

Introduction

DeepSeek recently introduced its models, including DeepSeek V3 and DeepSeek R1. These models have gained significant popularity in the AI community and on social media due to their impressive performance compared to models like OpenAI’s o1. Unlike OpenAI’s models, DeepSeek’s models are fully open-source and free to use.

DeepSeek Models in comparison with OpenAI’s models.

Since DeepSeek models are open-source and licensed under MIT, they are free to use for both personal and commercial purposes, and you can even run them locally. However, unless you have an insanely powerful machine, you won’t be able to run DeepSeek R1 on your local setup.

That’s where the smaller distilled models come in. DeepSeek has not only released the R1:671B model but also several dense models that are widely used in the research community.

All of these models are available on Hugging Face and Ollama, so you can choose whichever platform you prefer. In the next section, we’ll dive deeper into these distilled models and their performance.

What Are DeepSeek R1 Distilled Models?

In simple terms, distillation is a process of transferring knowledge from a larger model (teacher model) into a smaller model (student model) without having to compromise on the performance of the model. In other words, the student model mimics the larger teacher model either in the final prediction layer or in the underlying hidden layers of the model.

Using the reasoning data generated by DeepSeek-R1, they fine-tuned several dense models that are widely used in the research community. The evaluation results demonstrate that the distilled smaller dense models perform exceptionally well on benchmarks. They have open-sourced distilled 1.5B, 7B, 8B, 14B, 32B, and 70B checkpoints based on the Qwen2.5 and Llama3 series to the community.

Note: You can refer to original GitHub repo

DeepSeek R1 running locally?

A lot of people on the internet claim that DeepSeek R1 is an open-source model and can be run locally. While this is true, the computing power required to run this model is no small feat.

In most cases, those claiming to run the model locally are actually using distilled versions. They believe they are running the smaller versions of the original DeepSeek model, but in reality, they are running either Qwen or Llama models that have been distilled with the reasoning capabilities of DeepSeek R1.

Why the Naming Might Be Confusing

Ollama seems to be a popular choice for running this model locally, and to be honest, I use it too because of its ease of use. You only need to run a single command, and voilà—the model is ready to use on your local machine.

However, one concern I have with Ollama is the way they name models on their website. For example, to run the DeepSeek-R1 distilled Qwen 7B model, you need to type the following command

ollama run deepseek-r1
DeepSeek R1 models on Ollama

This might confuse a lot of non-technical as well as some technical users. Since it sounds like these are just smaller versions of the original DeepSeek R1 model, but in reality these are Qwen and Llama models with DeepSeek R1’s reasoning capability.

Is This a Bad Thing?

Not necessarily. At the end of the day, these are the official models launched by DeepSeek, and they perform as expected when compared to similar models—sometimes even outperforming them.

Distilled Models Evaluation
Distilled Model Evaluation

As you can see in the table above,

Simply distilling DeepSeek-R1’s outputs enables the efficient DeepSeek-R1-Distill-Qwen-7B, to outperform nonreasoning models like GPT-4o-0513 across the board. DeepSeek-R1–14B surpasses QwQ-32BPreview on all evaluation metrics, while DeepSeek-R1–32B and DeepSeek-R1–70B significantly exceed o1-mini on most benchmarks. These results demonstrate the strong potential of distillation. Additionally, we found that applying RL to these distilled models yields significant further gains. We believe this warrants further exploration and therefore present only the results of the simple SFT-distilled models here.

If the models are performing well with the reasoning capabilities of DeepSeek-R1, Why did I write this article?

The reason is simple: the people who are using the distilled models locally instead of the Web UI (or Application) should know the difference between the models and their performance for realistic expectations.

Final Thoughts: What Should Users Know?

As we saw, these distilled models perform better than their competition, and this is such a huge deal. To be able to run models with this much power (capacity, performance) on local machines with minimal performance change. But it is also true that you will get better performance if you directly use the WebUI or their application. If you have privacy & security concerns, the Perplexity pro version allows you to use a DeepSeek R1 model, which is hosted in the USA.

But if you are like me, who thinks privacy in today’s world is a myth and you don’t trust any of these companies but still want to use the models,. The distilled models offer the best deal.

Note: You can explore the model in detail here

If you want to support me you can Buy Me a Coffee

Published in Towards AI

The leading AI community and content platform focused on making AI accessible to all. Check out our new course platform: https://academy.towardsai.net/courses/beginner-to-advanced-llm-dev

Written by Kshitij Darwhekar

Hey! I'm an Application Developer @ IBM who likes to write about Machine Learning, Generative AI and LLMs. https://kshitijdarwhekar.com

No responses yet

Write a response