NVIDIA shrinks Mistral Nemo 12B to create Mistral-NeMo-Minitron 8B “Small Language Model”

NVIDIA has released a “small language model” that compresses the giant Mistral Nemo 12B model the company worked on with Mistral AI enough to run on a workstation with an RTX graphics card—and if eight billion parameters are still too much for you, there’s a four-billion-parameter version tailored for use on devices like PCs and laptops.

“We combined two different AI (artificial intelligence) optimization methods – pruning to reduce Mistral NeMo’s 12 billion parameters to eight billion and distillation to improve accuracy,” explains Bryan Catanzaro, vice president of applied deep learning research at NVIDIA, about his team’s work. “In this way, Mistral-NeMo-Minitron 8B delivers comparable accuracy to the original model with less computational effort.”

NVIDIA shrunk a 12B parameter language model to 8B and then to 4B – the latter for character AI in games like Mecha BREAK. (📷: Amazing Seasun Games)

Large language models are all the rage right now, powering everything from interactive fiction to AI assistants on your phone—but they all have the same problem: They’re big and require high-performance servers that preclude on-device use. Small language models, on the other hand, are designed to run on-device—although in the case of the still eight-billion-parameter Mistral-NeMo-Minitron 8B, those devices are high-end workstations equipped with NVIDIA’s RTX graphics accelerators.

Despite its smaller size, this version of the Mistral NeMo model delivers equivalent performance, largely by pruning model weights known to contribute the least to overall accuracy, and by maintaining a much smaller dataset than the original—retraining requires only one-fortieth of the computational power as NeMo’s original training. The same techniques were used to shrink the model even further to the application-specific Nemotron-4 4B Instruct, designed for use on consumer PCs and laptops to deliver what NVIDIA calls “cutting-edge digital human technology”: in gaming.

Mistral-Nemo-Minitron 8B is available as a NVIDIA NIM microservice or for download on Hugging Face now; “a downloadable NVIDIA NIM that can be deployed on any GPU-accelerated system within minutes will be available soon,” the company promises. A technical report has also been published for those who want to delve deeper into the details.

Related Posts

How to close a Capital One account

Journalists in Bangladesh feel the freedom to write and express themselves: “Breathe fresh air”

Travel companies thrive as Europeans switch to package holidays

Leave a Reply Cancel reply