nvidia
/

NVIDIA-Nemotron-Nano-12B-v2

Text Generation

Model card Files Files and versions

suhara commited on Oct 2

Commit

f6ed05d

·

verified ·

1 Parent(s): bb43b98

Update README.md

Files changed (1) hide show

README.md +4 -1

README.md CHANGED Viewed

@@ -122,7 +122,7 @@ Our models are designed and optimized to run on NVIDIA GPU-accelerated systems.
 ## Software Integration
 - Runtime Engine(s): NeMo 25.07.nemotron-nano-v2
-- Supported Hardware Microarchitecture Compatibility: NVIDIA A10G, NVIDIA H100-80GB, NVIDIA A100
 - Operating System(s): Linux
 ### **Use it with Transformers**
@@ -269,6 +269,9 @@ docker run --runtime nvidia --gpus all \
            --mamba_ssm_cache_dtype float32
 ```
 #### Using Budget Control with a vLLM Server
 The thinking budget allows developers to keep accuracy high and meet response‑time targets \- which is especially crucial for customer support, autonomous agent steps, and edge devices where every millisecond counts.

 ## Software Integration
 - Runtime Engine(s): NeMo 25.07.nemotron-nano-v2
+- Supported Hardware Microarchitecture Compatibility: NVIDIA A10G, NVIDIA H100-80GB, NVIDIA A100, Jetson AGX Thor
 - Operating System(s): Linux
 ### **Use it with Transformers**
            --mamba_ssm_cache_dtype float32
 ```
+For Jetson AGX Thor, please use [this vLLM container](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/vllm?version=25.09-py3).
 #### Using Budget Control with a vLLM Server
 The thinking budget allows developers to keep accuracy high and meet response‑time targets \- which is especially crucial for customer support, autonomous agent steps, and edge devices where every millisecond counts.