NIVIDIA’s DGX Spark: Mini AI Supercomputer overview and review

My NVIDIA DGX Spark

An overview and hands-on review of NVIDIA’s new mini Grace Blackwell AI Supercomputer

In this article, I provide some background information on NVIDIA’s new DGX Spark system, the specs, the setup, benchmarking, price/performance comparisons to other systems and provide some guidance on who I think this system is best suited for.

Project DIGITS

NVIDA announced “Project DIGITS” at CES 2025 in Las Vegas on January 5, 2025 with the claim the where going to “put a Grace Blackwell powered AI super computer on every desk and at every AI developer’s fingertips.” It was reportedly going to be a personal AI supercomputer that will provide AI researchers, data scientists and students worldwide with access to the power of the NVIDIA Grace Blackwell platform. Project DIGITS was to feature the new NVIDIA GB10 Grace Blackwell Superchip, offering a petaflop of AI computing performance for prototyping, fine-tuning and running large AI models. It sounded very exciting and I was looking forward to learning more.

Renamed to DGX Spark ⚡️

A few months later NVIDIA announced that DIGITS was renamed to the “DGX Spark” and unveiled at GTC (GPU Technology Conference) in March 2025. It wasn’t available at the time but you could reserve one, which I promptly did that same day. The price for the NVIDA DGX Spark was advertised to be $2,999 with a projected release date of May or June.

Founder’s Edition and versions from other Vendors

NVIDIA is doing an initial release of their own DGX Spark system that they are calling the “Founder’s Edition” that’s comes with 4TB SSD and is blinged out with a gold metal case. After the founder’s editions are gone, that’s reportedly it. NVIDA has partnered with other vendors to create the standard DGX Spark systems. Here are just some (there are more) versions available via other vendors:

DGX Spark Specs 📊

The system specifications of the DGX Spark Founder’s Edition are shown in the table below provided by NVIDIA:

Official DGX Spark Specifications from NVIDA

Since my use case for this system is as a learning, development, prototyping and testing system, I don’t need the 200Gbps (RDMA capable) NIC; the 10GbE NIC or WiFi7 are good enough for my needs. I wonder how much lower the price could have been without the inclusion of that expensive networking option that I can’t use.

Even without the Blackwell GPU cores, this is a very powerful workstation with 20 ARM cores, 10 performance (Cortex-X925 operating at 4GHz) and 10 efficiency cores (Cortex-A725 operating at 2.8GHz). I did some quick single core performance tests, and I think the Cortex CPU performance cores in the GB10 are very similar in performance to Apple’s M4 CPU performance cores (very fast).

As mentioned before, the memory bandwidth (very important for inference) of 273GB/s was my main concern and hesitation about the system.

Release and Purchase Decision ⚖️

There were some delays, but on October 15th, I received and email from NVIDIA indicating that my Founder’s Edition unit was ready for purchase. I had seen the specs of the system during the delay (20 Arm CPU cores, 4TB SSD, GB10 GPU, with 128GB of unified LPDDR5 RAM). LPDDR5 is considerably slower than the dedicated VRAM on their dedicated GPUs. I was also disappointed to see that the memory bandwidth was still listed at 273GB/s and that the price had increased to $3,999. I thought about passing on it for a couple of days… but the FOMO was too great, and I caved in and purchased it 🤪.

It’s here! 📦

It arrived two days later via FedEx. It required a signature so I needed to stay home and wait for it. Even if it didn’t need a signature, I’d still have stayed home as I wouldn’t want it sitting outside my front door for several hours while I was at work.

Setup Options 🛠️

There are two ways to setup and use the DGX Spark. If you plug in a keyboard, mouse and monitor before you first power it on, it will boot into a desktop experience and let you use it like a Linux desktop system (they call it DGX OS, but it’s just Ubuntu 24.04 with their software and drivers pre-installed). If you power it up without any peripherals attached, it will boot up into headless mode. The box provides information that includes the SSID and password to remotely connect and set it up in headless mode. You can switch modes later if you like. I didn’t know how I wanted to use it yet, so decided to check it out in desktop mode first.

Desktop Mode 🖥️

My NVIDIA DGX Spark running in desktop mode

I later decided I’d rather use it like a server that I could remotely access and offload work to from my laptop to rather than use it like a desktop. I’ll talk more about that setup in the next section.

Headless Mode 🎃

To switch to headless mode, I simply removed the keyboard, mouse and monitor, moved it near my router/switch and connected it to the network with a 1GbE wired connection. I then SSH’d into the system and joined it to my Tailscale network (a network overlay that lets you connect your devices together remotely, privately and securely). After the system had successfully joined my tailscale network, I installed the NVIDA Sync client on my laptop and then connected to my DGX via its Tailscale endpoint. With this setup, I can access my DGX system via SSH, or with the NVIDIA Sync client from anywhere.

With the DGX Sync client, I can fire up the DGX Dashboard on my laptop, and from there open a JupyterLab notebook that’s executing on the remote DGX, or open a terminal, or open VSCode in remote mode and have my IDE connected to the Spark. It also allows you to setup custom connections as you can see in the image below. I’ve setup a connection to Ollama and Llama.cpp running on the Spark:

NVIDIA Sync client running on my laptop

When you open the DGX Dashboard, you can remotely monitor your Spark’s RAM and GPU Utilization, as well as open up a JupyterLab notebook session on the Spark:

The DGX Dashboard

Tailscale also works on Smartphones (iPhone and Android) so you can connect to web applications that you have running on your Spark via your phone when you are on the go.

The DGX Spark Playbooks 📚

NVIDIA has created a collection of “playbooks” specifically for the DGX Spark to help users get up to speed with their system and being productive. It covers running inference, generating images, training and fine tuning language and vision modes and other tasks to help you understand how to get the most out of your Spark. If you are a beginner and have purchased the DGX Spark as a learning and training tool, then the playbooks are the place to start. They are designed specifically for the DGX Spark, so you shouldn’t run into any surprises, just follow the provided instructions.

I’m going work through all of the playbooks as time allows. The system also came with a voucher code that can be redeemed to sign-up for one of NVIDIA’s self-paced training courses.

You can browse the DGX Spark Playbooks here: https://build.nvidia.com/spark

GPU Performance Benchmarks ⏱

NVIDA has published DGX Spark benchmarks for various tasks, including fine-tuning, image generation, inference and more.

NIVIDA’s Fine-tuning Benchmarks

DGX Spark Fine-tuning performance benchmarks provided by NVIDIA

NVIDIA’s Inference Benchmarks

DGX Spark Inference performance benchmarks provided by NVIDIA

My Benchmarks to verify NIVIDA’s results

To confirm NVIDIA’a benchmarks, at least for inference, I grabbed the latest version of llama.cpp and downloaded both the 20B and 120B versions of OpenAI’s GTP-OSS models from Huggingface.

Note: there’s currently no binary release for llama.cpp with both ARM64 and CUDA support for Linux; you’ll have to compile it first (requires libcurl). Here’s how to accomplish that:

git clone https://github.com/ggml-org/llama.cpp.git
cd llama.cpp
sudo apt install libcurl4-gnutls-dev
cmake -B build-cuda -DGGML_CUDA=ON
cmake --build build-cuda -j

After it’s been successfully built, you should have all the llama.cpp binaries available, including the llama-bench command, in the build-cuda/bin directory. Then download both models from Huggingface.

After the models are in place you can run the benchmarks

GPT-OSS-20B Benchmark

./llama-bench -m ../../models/gpt-oss-20b-Q4_K_M.gguf -fa 1 -d 0,4096,8192,16384,32768 -p 2048 -n 32 -ub 2048

Here are the results that show the ‘pp’ (‘pre-fill’, but also called ‘prompt processing’ by some, which is the initial, compute-intensive stage of an LLM request), and ‘tg’ is the number of response tokens generated per second for the gpt-oss-20B model:

  • PP (prefill) = 3685 tokens/second
  • TG (response tokens) = 85 tokens/second

Here is the results of the gpt-oss-20B benchmark:

Benchmarking GPT-OSS-20B on my DGX Spark with llama-bench (llama.cpp)

GPT-OSS-120B Benchmark

Next, I ran the same benchmark with the gpt-oss-120b model

./llama-bench -m ../../models/gpt-oss-120b-Q4_K_M.gguf -fa 1 -d 0,4096,8192,16384,32768 -p 2048 -n 32 -ub 2048

Here are the results that show the ‘pp’ (pre-fill/prompt-processing), and ‘tg’ which is number of response tokens generated per second for the gpt-oss-120b model:

  • PP (prefill/prompt-processing) = 1821 tokens/second
  • TG (response tokens) = 50 tokens/second

Here is the screen capture of the output:

Benchmarking GPT-OSS-120B on my DGX Spark with llama-bench (llama.cpp)

My benchmarks results of the gpt-oss 20B and 120B models are in agreement with the metrics provided by NVIDIA, which leads me to believe that the other benchmarks that NVIDIA has provided are also likely accurate.

I haven’t had time to perform any fine-tuning or image generation yet, but I will update this article or include the results in a new article when I do.

How does the DGX Compare to a Apple Mac Studio?

Like the DGX Spark, the Apple MacBook Pro or Mac Studio also provides fast ARM64 CPU cores, and GPUs cores with unified memory, so this is fair comparison.

Price Comparison

An equally spec’d out Mac Studio with an M4 Max, 128GB of RAM and 4TB SSD is about $700 more expensive than the DGX Spark:

NVIDIA DGX Spark: $3,999
– Apple M4 Max: $4,699

Now let’s compare the computational power and memory bandwidth of the two systems:

NVIDIA DGX Spark:
Computational Capacity: 1000 TOPS
– Memory Bandwidth: 273 GB/s
– Memory Size: 128 GB unified memory
– 4TB SSD

Apple Mac Studio with M4 Max:
Computational Capacity: 38 TOPS
– Memory Bandwidth: 546 GB/s
– Memory Size: 128 GB unified memory
– 4TB SSD

So while the M4 Max has twice the memory bandwidth, the DGX Spark has 26 times more the computational capability. I don’t have an M4 Max to directly compare the two, but I’d guess that the DGX Spark would be a much better system than the Mac for fine-tuning models, but not sure about inference speed, as memory bandwidth is critical for inference. If someone has an M4 Max system and can provide inference speed benchmarks for it with the gpt-oss-20b and/or gpt-oss-120b models, please provide the results in the comments or message me privately and I’ll update the article to include them.

The CUDA Advantage

One really big advantage that the DGX Spark has over the Apple Metal GPU, AMD, Intel, ATI, etc.. solutions is an NVIDIA GPU, and thus open up access to all of the CUDA ecosystem. I use a MacBook Pro (M1 Max w 64GB of RAM) and there are a lot of things that expect CUDA that I can’t work around with device=mps. Some things just require CUDA.

What about other NVIDA Options like the RTX 5090?

The RTX 5090 is certainly faster than the DGX Spark (much faster at both compute and memory bandwidth), and a little cheaper ($2,500 to $3,000), but it it only has 32GB of VRAM. 32GB of VRAM is very limiting; you can’t fine-tune or run inference on large models like the DGX Spark can. There’s the NVIDA RTX 6000 Pro (96GB VRAM), but that’s $9,000 if you can find one. And both the RTX 5090 and 6000 Pro are just GPU cards, you’ll still need a full tower desktop system with CPU, RAM, SSD and large power supply to host it; certainly not portable like the DGX Spark.

The DGX Spark isn’t cheap, but there simply isn’t an equivalent option that gets you both CUDA and 128GB of GPU accessible memory at this price point.

To get an idea what the DGX Spark is like, I’d have to compare it to an NVIDA RTX 5070 with 128GB of VRAM… but that’s just an imaginary system as the 5070 only has 12GB of VRAM. Both the GB10 Grace Blackwell and the RTX 4070 both have 6,144 CUDA cores.

Who is the DGX Spark for? 🧑‍💻

If you want a general purpose system that has some chops to occasionally run inference on LLMs, then stick with the excellent options from Apple, such as the Mac Mini, MacBook Pro, or Mac Studio.

If on the other hand, you are a data/computer scientist, researcher, AI developer or ML/AI ops engineer, fine-tuning and prototyping AI solutions, or trying to gain the experience required to enter those career fields, then the NVIDIA DGX Spark (or variations of it from partner vendors) might be a good option.

My Verdict 🧑‍⚖️

While I’ve only had my system for 7 days, and still need to use it a lot more and work through the DGX Spark Playbooks, I’m happy it with. It’s a nice piece of kit that will allow me to offload AI workloads from my laptop (M4 MacBook Air most days) and have access to the full CUDA ecosystem, while letting me fully control my setup and not have to worry about cloud costs.

Thanks for reading,

-Robert

Learn more about NIVIDIA’s DGX Spark: Mini AI Supercomputer overview and review

Leave a Reply