close
close

NVIDIA shrinks Mistral Nemo 12B to create Mistral-NeMo-Minitron 8B “Small Language Model”

NVIDIA shrinks Mistral Nemo 12B to create Mistral-NeMo-Minitron 8B “Small Language Model”

NVIDIA has released a “small language model” that compresses the giant Mistral Nemo 12B model the company worked on with Mistral AI enough to run on a workstation with an RTX graphics card—and if eight billion parameters are still too much for you, there’s a four-billion-parameter version tailored for use on devices like PCs and laptops.

“We combined two different AI (artificial intelligence) optimization methods – pruning to reduce Mistral NeMo’s 12 billion parameters to eight billion and distillation to improve accuracy,” explains Bryan Catanzaro, vice president of applied deep learning research at NVIDIA, about his team’s work. “In this way, Mistral-NeMo-Minitron 8B delivers comparable accuracy to the original model with less computational effort.”

Large language models are all the rage right now, powering everything from interactive fiction to AI assistants on your phone—but they all have the same problem: They’re big and require high-performance servers that preclude on-device use. Small language models, on the other hand, are designed to run on-device—although in the case of the still eight-billion-parameter Mistral-NeMo-Minitron 8B, those devices are high-end workstations equipped with NVIDIA’s RTX graphics accelerators.

Despite its smaller size, this version of the Mistral NeMo model delivers equivalent performance, largely by pruning model weights known to contribute the least to overall accuracy, and by maintaining a much smaller dataset than the original—retraining requires only one-fortieth of the computational power as NeMo’s original training. The same techniques were used to shrink the model even further to the application-specific Nemotron-4 4B Instruct, designed for use on consumer PCs and laptops to deliver what NVIDIA calls “cutting-edge digital human technology”: in gaming.

Mistral-Nemo-Minitron 8B is available as a NVIDIA NIM microservice or for download on Hugging Face now; “a downloadable NVIDIA NIM that can be deployed on any GPU-accelerated system within minutes will be available soon,” the company promises. A technical report has also been published for those who want to delve deeper into the details.

Leave a Reply

Your email address will not be published. Required fields are marked *