NVIDIA (NASDAQ:NVDA) on Tuesday unveiled Nemotron 3 Nano Omni, an open multimodal AI model that integrates vision, audio and language processing into a single system intended to power interactive AI agents.
Rather than relying on separate perception models, Nemotron 3 Nano Omni incorporates both vision and audio encoders within a 30B-A3B hybrid mixture-of-experts architecture. NVIDIA says this combined design can deliver up to 9x greater throughput compared with other open omni models that offer similar levels of interactivity.
The model accepts a wide range of inputs - including text, still images, audio clips, video, documents, charts and graphical user interfaces - and produces text as its output. Nemotron 3 Nano Omni supports a 256K context window and implements Conv3D and EVS technologies as part of its architecture.
According to NVIDIA, the model has reached the top positions on six leaderboards that measure document intelligence as well as video and audio understanding.
Adoption and evaluation
NVIDIA listed a number of companies that have begun adopting the model, including Aible, Applied Scientific Intelligence, Eka Care, Foxconn, H Company, Palantir and Pyler. Several other firms are evaluating the model, among them Dell Technologies, DocuSign, Infosys, K-Dense, Lila, Oracle and Zefr.
Gautier Cloix, CEO of H Company, said the model allows agents to quickly interpret full HD screen recordings. In preliminary tests on the OSWorld benchmark, H Company’s computer usage agent powered by Nemotron 3 Nano Omni processed visual reasoning using a native input resolution of 1920×1080 pixels.
Workflows, compatibility and openness
NVIDIA designed Nemotron 3 Nano Omni to function alongside other models in the Nemotron 3 family, such as Nemotron 3 Super and Nemotron 3 Ultra, and to interoperate with proprietary models from other providers. The company highlights the model’s suitability for agentic workflows that include computer use automation, document intelligence tasks and audio-video reasoning.
The release includes open weights, datasets and descriptions of training techniques. Organizations that need to customize the model can use NVIDIA NeMo, and deploy the result in environments that satisfy regulatory or data localization requirements, NVIDIA said.
Availability and distribution
Nemotron 3 Nano Omni was made available on Tuesday through several channels: Hugging Face, OpenRouter and build.nvidia.com as an NVIDIA NIM microservice. It is also accessible via NVIDIA Cloud Partners, a range of inference platforms and cloud service providers.
NVIDIA also reported that the broader Nemotron 3 family has surpassed 50 million downloads over the past year.
This article presents the technical and market details released by NVIDIA on the Nemotron 3 Nano Omni model, including architecture, supported input types, partner adoption, benchmark outcomes and distribution methods.