How To Download and Use a Hugging Face Model Locally

Learn how to download a model from Hugging Face via the Terminal, load it locally, and run it in Python.

This tutorial will teach you the following:

How to download a Hugging Face model using the terminal.
How to run your model locally in Python.

This is useful when you need the model weights for deployment, i.e. to a Docker container or similar.

Prerequisites

This tutorial assumes you have Python installed. If not, you can download Python here:

Download Python

The official home of the Python Programming Language

www.python.org

Get the latest stable release, and you’ll be ready to complete this tutorial!

Another way to get Python

There are many ways to download Python. A direct download from python.org is probably the simplest. But my favourite, in 2025, is uv:

uv

uv is an extremely fast Python package and project manager, written in Rust.

docs.astral.sh

This is slightly more involved, so only go down this route if you have a decent amount of prior development experience.

Step 1 — Download the Hugging Face CLI

Before downloading a model, you need to install the Hugging Face CLI. You can do this with the following command in the terminal:

pip install -U "huggingface_hub"

Once installed, you can check that this has worked by running this command in the terminal:

huggingface-cli

You should see output like this:

usage: huggingface-cli <command> [<args>]
positional arguments:
  {download,upload,repo-files,env,login,whoami,logout,repo,lfs-enable-largefiles,lfs-multipart-upload,scan-cache,delete-cache,tag}
                        huggingface-cli command helpers
    download            Download files from the Hub
    upload              Upload a file or a folder to a repo on the Hub
    repo-files          Manage files in a repo on the Hub
    env                 Print information about the environment.
    login               Log in using a token from huggingface.co/settings/tokens
    whoami              Find out which huggingface.co account you are logged in as.
    logout              Log out
    repo                {create} Commands to interact with your huggingface.co repos.
    lfs-enable-largefiles
                        Configure your repository to enable upload of files > 5GB.
    scan-cache          Scan cache directory.
    delete-cache        Delete revisions from the cache directory.
    tag                 (create, list, delete) tags for a repo in the hub
options:
  -h, --help            show this help message and exit

If you see this output or something similar, move on to the next step.

Step 2 — Downloading the Model

Visit the page for the model you want to download. For example, I’m interested in this Qwen embedding model:

The model card for the Qwen3 4 billion parameter embedding model.

Copy the name of the model by clicking the copy button at the very top of this page:

The title of the model with copy button.

Paste it into this command on the terminal:

huggingface-cli download <your-copied-model> --local-dir ./path/to/your/dir

As an example, here’s what my command looks like:

huggingface-cli download Qwen/Qwen3-Embedding-4B --local-dir ./qwen-embedding-model-4b

You should see output like the following:

Fetching 14 files:   0%|                                                                                                                                                                                                                                                    | 0/14 [00:00<?, ?it/s]Downloading 'config.json' to 'qwen-embedding-model-4b/.cache/huggingface/download/config.json.8b4b87fc69023e7a224eb6563753aaf3223d8b98.incomplete'
Downloading 'model-00001-of-00002.safetensors' to 'qwen-embedding-model-4b/.cache/huggingface/download/model-00001-of-00002.safetensors.e70bfe3c970523fb7ef4eddffed2254ce3f1e7150c3de2af4342de129dd756f8.incomplete'
config.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 727/727 [00:00<00:00, 92.2kB/s]
Download complete. Moving file to qwen-embedding-model-4b/config.json                                                                                                                                                                                                    | 0.00/727 [00:00<?, ?B/s]
Downloading 'README.md' to 'qwen-embedding-model-4b/.cache/huggingface/download/README.md.81d922bc72353348a181473b9cc0ee53571ae13b.incomplete'
Downloading 'config_sentence_transformers.json' to 'qwen-embedding-model-4b/.cache/huggingface/download/config_sentence_transformers.json.76aef3ade63553ebb698fe3c2a3264040ed093f8.incomplete'
Downloading 'generation_config.json' to 'qwen-embedding-model-4b/.cache/huggingface/download/generation_config.json.d46f1983345269c582611bbedb3ca0a13f8e5f7b.incomplete'
Downloading 'merges.txt' to 'qwen-embedding-model-4b/.cache/huggingface/download/merges.txt.31349551d90c7606f325fe0f11bbb8bd5fa0d7c7.incomplete'
config_sentence_transformers.json: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 215/215 [00:00<00:00, 2.44MB/s]
Download complete. Moving file to qwen-embedding-model-4b/config_sentence_transformers.json
README.md: 17.3kB [00:00, 2.08MB/s]  0%|                                                                                                                                                                                                                                 | 0.00/215 [00:00<?, ?B/s]
Download complete. Moving file to qwen-embedding-model-4b/README.md
Downloading '1_Pooling/config.json' to 'qwen-embedding-model-4b/.cache/huggingface/download/1_Pooling/config.json.81de5602eacbce382009c5af7a23085871801d8f.incomplete'
config.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 313/313 [00:00<00:00, 4.67MB/s]
Download complete. Moving file to qwen-embedding-model-4b/1_Pooling/config.json                                                                                                                                                                                        | 0.00/4.97G [00:00<?, ?B/s]
Downloading 'model-00002-of-00002.safetensors' to 'qwen-embedding-model-4b/.cache/huggingface/download/model-00002-of-00002.safetensors.ed1b87c8e9eb7e535a1a155e4fd00d9f4dba80e58a6db48a4c9f82cede7079c1.incomplete'
merges.txt: 1.67MB [00:00, 27.0MB/s]                                                                                                                                                                                                                                     | 0.00/313 [00:00<?, ?B/s]
Download complete. Moving file to qwen-embedding-model-4b/merges.txt
generation_config.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 117/117 [00:00<00:00, 1.49MB/s]
Download complete. Moving file to qwen-embedding-model-4b/generation_config.json
Downloading '.gitattributes' to 'qwen-embedding-model-4b/.cache/huggingface/download/.gitattributes.52373fe24473b1aa44333d318f578ae6bf04b49b.incomplete'                                                                                                                 | 0.00/117 [00:00<?, ?B/s]
                                                                                                                                                                                                                                                                                                  Downloading 'modules.json' to 'qwen-embedding-model-4b/.cache/huggingface/download/modules.json.952a9b81c0bfd99800fabf352f69c7ccd46c5e43.incomplete'
Downloading 'model.safetensors.index.json' to 'qwen-embedding-model-4b/.cache/huggingface/download/model.safetensors.index.json.3d736ef26714eee0abde3e05104ee1b3ec26c974.incomplete'                                                                                   | 0.00/3.08G [00:00<?, ?B/s]
Downloading 'tokenizer.json' to 'qwen-embedding-model-4b/.cache/huggingface/download/tokenizer.json.83cdf8c3a34f68862319cb1810ee7b1e2c0a44e0864ae930194ddb76bb7feb8d.incomplete'
modules.json: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 349/349 [00:00<00:00, 5.93MB/s]
Download complete. Moving file to qwen-embedding-model-4b/modules.json
model.safetensors.index.json: 30.4kB [00:00, 1.42MB/s]
Download complete. Moving file to qwen-embedding-model-4b/model.safetensors.index.json
Downloading 'tokenizer_config.json' to 'qwen-embedding-model-4b/.cache/huggingface/download/tokenizer_config.json.df3a9d96759529ca1006eb6db024bbb099a97578.incomplete'                                                                                                   | 0.00/349 [00:00<?, ?B/s]
.gitattributes: 1.57kB [00:00, 6.52MB/s]
Download complete. Moving file to qwen-embedding-model-4b/.gitattributes                                                                                                                                                                                      | 10.5M/4.97G [00:00<01:33, 52.8MB/s]
Downloading 'vocab.json' to 'qwen-embedding-model-4b/.cache/huggingface/download/vocab.json.4783fe10ac3adce15ac8f358ef5462739852c569.incomplete'
tokenizer_config.json: 7.26kB [00:00, 279kB/s]                                                                                                                                                                                                                      | 1/14 [00:00<00:06,  1.96it/s]
Download complete. Moving file to qwen-embedding-model-4b/tokenizer_config.json
vocab.json: 2.78MB [00:00, 3.04MB/s]
Download complete. Moving file to qwen-embedding-model-4b/vocab.json                                                                                                                                                                                          | 73.4M/4.97G [00:01<01:16, 64.1MB/s]
tokenizer.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 11.4M/11.4M [00:04<00:00, 2.67MB/s]
Download complete. Moving file to qwen-embedding-model-4b/tokenizer.json                                                                                                                                                                                       | 273M/4.97G [00:04<01:20, 58.3MB/s]
model-00002-of-00002.safetensors: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3.08G/3.08G [01:42<00:00, 30.0MB/s]
Download complete. Moving file to qwen-embedding-model-4b/model-00002-of-00002.safetensors█████████████████████████████████████████████████████████████████████████████████████████████████████████████▎                                                      | 3.70G/4.97G [01:42<00:50, 25.0MB/s]
model-00002-of-00002.safetensors: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3.08G/3.08G [01:42<00:00, 40.4MB/s]

Wait for the various downloads to complete and move on to the next step.

Step 3 — Loading the Model

If you don’t have them already, you’ll need to install torch and transformers. You can do this with the following terminal command:

pip install torch, transformers

How you load the model is determined by the model itself. An easy way to see the correct method is to scroll down on the model card:

The Qwen 0.6b text generation model instructions for usage.

You will typically see something like this, which explains how you can use the model effectively. However, for clarity, I will show two standard examples for an embedding model and a text generation model.

Embedding Model

The easiest way to load one of these models is with SentenceTransformers. They provide an API that wraps most state-of-the-art embedding models and simplifies their usage.

To load your model, you can do this:

from sentence_transformers import SentenceTransformer
# Load the model
model = SentenceTransformer("path/to/your/model/directory")

Here’s an example of me loading the Qwen 4b embedding model:

from sentence_transformers import SentenceTransformer
# Load the model
model = SentenceTransformer("./qwen-embedding-model-4b")

Text Generation Model

To load text generation models, you use the standard Hugging Face transformers library:

from transformers import AutoModelForCausalLM, AutoTokenizer
# Path to the folder where you downloaded the model
model_path = "path/to/your/model/directory"
# load the tokenizer and the model
tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForCausalLM.from_pretrained(
    model_path,
    torch_dtype="auto",
    device_map="auto"
)

As an example, here’s me loading the Qwen 0.6b text generation model:

from transformers import AutoTokenizer, AutoModelForCausalLM
# Path to the folder where you downloaded the model
model_path = "./qwen-text-generation-model-0_6b"
# Load tokenizer and model from local files
tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForCausalLM.from_pretrained(model_path)

This gives you a chat API to use for generating text.

Step 4 — Using the Model

Just as in the previous step, the model card will typically show you how to use the loaded model. However, here are two examples for embedding and text generation:

Embedding with Sentence Transformers

As we’ve loaded the model with sentence_transformers we have access to the standard API. Which means we can embed and check similarity for queries and documents with:

# The queries and documents to embed
queries = [
    "What is the capital of China?",
    "Explain gravity",
]
documents = [
    "The capital of China is Beijing.",
    "Gravity is a force that attracts two bodies towards each other. It gives weight to physical objects and is responsible for the movement of planets around the sun.",
]
# Encode the queries and documents. Note that queries benefit from using a
# prompt. Here we use the prompt called "query" stored under `model.prompts`,
# but you can also pass your own prompt via the `prompt` argument
query_embeddings = model.encode(queries, prompt_name="query")
document_embeddings = model.encode(documents)
# Compute the (cosine) similarity between the query and document embeddings
similarity = model.similarity(query_embeddings, document_embeddings)
print(similarity)
# tensor([[0.7534, 0.1147],
#         [0.0320, 0.6258]])

This is straight from Qwen’s model documentation, but the only potentially unique part is prompt_name="query" on the query_embeddings line. This prompt is stored inside the model definition folder and is used to control how the text is embedded.

For Qwen 4b, you can find it under config_sentence_transformers.json:

{
  "prompts": {
    "query": "Instruct: Given a web search query, retrieve relevant passages that answer the query\nQuery:",
    "document": ""
  },
  "default_prompt_name": null,
  "similarity_fn_name": "cosine"
}

If the standard prompt doesn’t apply to your use case, you can add another one and use that instead. Make sure you use the Instruct: and \nQuery: format when you do.

Text Generation with Transformers

To generate text, you can use the standard transformers API:

# prepare the model input
prompt = "Give me a short introduction to large language model."
messages = [
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
    enable_thinking=True # Switches between thinking and non-thinking modes. Default is True.
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
# conduct text completion
generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=32768
)
output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist() 
# parsing thinking content
try:
    # rindex finding 151668 (</think>)
    index = len(output_ids) - output_ids[::-1].index(151668)
except ValueError:
    index = 0
thinking_content = tokenizer.decode(output_ids[:index], skip_special_tokens=True).strip("\n")
content = tokenizer.decode(output_ids[index:], skip_special_tokens=True).strip("\n")
print("thinking content:", thinking_content)
print("content:", content)

Again, this is taken directly from the Qwen 0.6b text generation model. But the same methodology can be applied to other thinking models.

Final Words

That’s it! I hope this helps you work with models without internet access and upload the weights and configuration directly to functions for your use case. Happy transforming!

Subscribe If

You like regular batteries-included tutorials that help you become a master Full Stack Developer!

Learn more How To Download and Use a Hugging Face Model Locally

How To Download and Use a Hugging Face Model Locally

Learn how to download a model from Hugging Face via the Terminal, load it locally, and run it in Python.

Prerequisites

Download Python

The official home of the Python Programming Language

Another way to get Python

uv

uv is an extremely fast Python package and project manager, written in Rust.

Step 1 — Download the Hugging Face CLI

Step 2 — Downloading the Model

Step 3 — Loading the Model

Embedding Model

Text Generation Model

Step 4 — Using the Model

Embedding with Sentence Transformers

Text Generation with Transformers

Final Words

Subscribe If

Like this:

By skyforbes

Leave a ReplyCancel reply

You Missed

25 Totally Free Ways to Find Blog Post Topics You have Never Thought

The Top Ten Legal Changes Coming Into Effect In 2020 – ByBlacks.com

Soundcore by Anker P20i True Wireless Earbuds, 10mm Drivers with Big Bass, Bluetooth 5.3, 30H Long Playtime, Water-Resistant, 2 Mics for AI Clear Calls, 22 Preset EQs, Customization via App (Black) : Amazon.com.au: Electronics

Peppermint Meringues: Light and Airy Holiday Treats

Archives

How To Download and Use a Hugging Face Model Locally

Learn how to download a model from Hugging Face via the Terminal, load it locally, and run it in Python.

Prerequisites

Download Python

The official home of the Python Programming Language

Another way to get Python

uv

uv is an extremely fast Python package and project manager, written in Rust.

Step 1 — Download the Hugging Face CLI

Step 2 — Downloading the Model

Step 3 — Loading the Model

Embedding Model

Text Generation Model

Step 4 — Using the Model

Embedding with Sentence Transformers

Text Generation with Transformers

Final Words

Subscribe If

Like this:

By skyforbes

Related Posts

High-Performance File Downloads in Python with PycURL

The Best 50 Free Writing Software And Free Writing Apps

The Art Of UX Design: Crafting Seamless User Experiences

Leave a ReplyCancel reply

You Missed

25 Totally Free Ways to Find Blog Post Topics You have Never Thought

The Top Ten Legal Changes Coming Into Effect In 2020 – ByBlacks.com

Soundcore by Anker P20i True Wireless Earbuds, 10mm Drivers with Big Bass, Bluetooth 5.3, 30H Long Playtime, Water-Resistant, 2 Mics for AI Clear Calls, 22 Preset EQs, Customization via App (Black) : Amazon.com.au: Electronics

Peppermint Meringues: Light and Airy Holiday Treats