Text Generation
Transformers
Safetensors
PyTorch
nvidia
conversational
suhara commited on
Commit
f6ed05d
·
verified ·
1 Parent(s): bb43b98

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -1
README.md CHANGED
@@ -122,7 +122,7 @@ Our models are designed and optimized to run on NVIDIA GPU-accelerated systems.
122
  ## Software Integration
123
 
124
  - Runtime Engine(s): NeMo 25.07.nemotron-nano-v2
125
- - Supported Hardware Microarchitecture Compatibility: NVIDIA A10G, NVIDIA H100-80GB, NVIDIA A100
126
  - Operating System(s): Linux
127
 
128
  ### **Use it with Transformers**
@@ -269,6 +269,9 @@ docker run --runtime nvidia --gpus all \
269
  --mamba_ssm_cache_dtype float32
270
  ```
271
 
 
 
 
272
  #### Using Budget Control with a vLLM Server
273
 
274
  The thinking budget allows developers to keep accuracy high and meet response‑time targets \- which is especially crucial for customer support, autonomous agent steps, and edge devices where every millisecond counts.
 
122
  ## Software Integration
123
 
124
  - Runtime Engine(s): NeMo 25.07.nemotron-nano-v2
125
+ - Supported Hardware Microarchitecture Compatibility: NVIDIA A10G, NVIDIA H100-80GB, NVIDIA A100, Jetson AGX Thor
126
  - Operating System(s): Linux
127
 
128
  ### **Use it with Transformers**
 
269
  --mamba_ssm_cache_dtype float32
270
  ```
271
 
272
+ For Jetson AGX Thor, please use [this vLLM container](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/vllm?version=25.09-py3).
273
+
274
+
275
  #### Using Budget Control with a vLLM Server
276
 
277
  The thinking budget allows developers to keep accuracy high and meet response‑time targets \- which is especially crucial for customer support, autonomous agent steps, and edge devices where every millisecond counts.