Nvidia Plans to Invest $26 Billion in Developing Open-Weight AI Models, According to Filings

Nvidia Plans to Invest $26 Billion in Developing Open-Weight AI Models, According to Filings

Nvidia plans to invest $26 billion over the next five years to develop open source artificial intelligence models, as stated in a 2025 financial filing. Executives confirmed this previously unreported information during interviews with WIRED.

This significant investment could transform Nvidia from a chip manufacturer with a strong software ecosystem into a genuine frontier lab that competes with OpenAI and DeepSeek. It’s a strategic initiative that may solidify Nvidia’s position as the premier chip maker in the AI sector, given that the models are optimized for the company’s hardware.

Open source models are defined by the public release of their weights or parameters, which dictate the model’s functionality—often accompanied by details about its architecture and training. This accessibility allows anyone to download and utilize them on their own devices or in the cloud. In Nvidia’s case, the company also discloses the technical advancements used in creating and training its models, making it simpler for startups and researchers to adapt and build upon Nvidia’s innovations.

On Wednesday, Nvidia unveiled Nemotron 3 Super, its most advanced open-weight AI model yet. With 128 billion parameters (a metric of the model’s scale and sophistication), it broadly matches the largest version of OpenAI’s GPT-OSS, although Nvidia asserts it surpasses GPT-OSS and other models on multiple benchmarks.

Specifically, Nvidia claims Nemotron 3 Super achieved a score of 37 on the Artificial Intelligence Index, which evaluates models across ten distinct benchmarks. GPT-OSS scored 33, but several Chinese models performed even better. Nvidia states that Nemotron 3 Super excelled in secret tests conducted on PinchBench, a new benchmark aimed at evaluating a model’s performance in controlling OpenClaw, and achieved the top ranking on that assessment.

Nvidia also introduced several technical strategies employed in training Nemotron 3, including architectural and training methods that enhance the model’s reasoning capabilities, long-context management, and responsiveness to reinforcement learning.

“Nvidia is taking open model development much more seriously,” remarks Bryan Catanzaro, VP of applied deep learning research at Nvidia. “And we are making significant strides.”

Open Frontier

Meta was the first major AI company to launch an open model, Llama, in 2023. However, CEO Mark Zuckerberg recently revitalized the company’s AI initiatives and hinted that future models may not be fully open. OpenAI provides an open-weight model named GPT-OSS, but it is inferior to the company’s top proprietary versions and is not well-optimized for modification.

The leading US models from OpenAI, Anthropic, and Google can only be accessed via the cloud or through a chatbot interface. In contrast, the weights for several leading Chinese models, from DeepSeek, Alibaba, Moonshot AI, Z.ai, and MiniMax, are made available openly and free of charge. Consequently, numerous startups and researchers globally are currently building upon these Chinese models.

“It’s in our interest to foster ecosystem development,” states Catanzaro, who has been with Nvidia since 2011 and has been instrumental in the company’s transition from producing graphics cards for gaming to creating silicon for AI. Nvidia launched the initial Nemotron model in November 2023. He adds that Nvidia has recently completed the pretraining of a 550-billion-parameter model. (Pretraining entails processing vast amounts of data into a model distributed across numerous specialized chips operating in parallel.) Since then, Nvidia has rolled out a variety of models tailored for applications in fields such as robotics, climate modeling, and protein folding.

Kari Briski, VP of generative AI software for enterprise, mentions that Nvidia’s upcoming AI models will not only enhance its chips but also improve the supercomputer-scale data centers it constructs. “We design it to extend our systems and evaluate not just the compute but also the storage and networking, helping to shape our hardware architecture roadmap,” she explains.

https://in.linkedin.com/in/rajat-media

Helping D2C Brands Scale with AI-Powered Marketing & Automation 🚀 | $15M+ in Client Revenue | Meta Ads Expert | D2C Performance Marketing Consultant