Text Generation
Transformers
Safetensors
PyTorch
English
nvidia
conversational
bkartal commited on
Commit
c1e1094
·
verified ·
1 Parent(s): 1507d58

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -3
README.md CHANGED
@@ -29,7 +29,7 @@ The pretraining data has a cutoff date of September 2024\.
29
 
30
  ## Model Overview
31
 
32
- NVIDIA Nemotron-H-8B-Reasoning-128K is a large language model (LLM) developed by NVIDIA, designed as a unified model for both reasoning and non-reasoning tasks.It responds to user queries and tasks by first generating a reasoning trace and then concluding with a final response. The model's reasoning capabilities can be controlled via a system prompt. If the user prefers the model to provide its final answer without intermediate reasoning traces, it can be configured to do so, albeit with a slight decrease in accuracy for harder prompts that require reasoning. Conversely, allowing the model to generate reasoning traces first generally results in higher-quality final solutions to queries and tasks.
33
 
34
  The model uses a hybrid architecture consisting primarily of Mamba-2 and MLP layers combined with just four Attention layers. It is based on [Nemotron-H-8B-Base-8K](https://huggingface.co/nvidia/Nemotron-H-8B-Base-8K).
35
  The supported languages include: English, German, Spanish, French, Italian, Korean, Portuguese, Russian, Japanese, and Chinese.
@@ -55,8 +55,7 @@ This model has 8B of model parameters following [Nemotron-H-8B-Base-8K](https://
55
 
56
  ### Release Date: 06/06/2025
57
 
58
- Huggingface 04/09/2025 via [https://huggingface.co/](https://huggingface.co/)
59
- NGC 04/09/2025 via [https://catalog.ngc.nvidia.com/models](https://catalog.ngc.nvidia.com/models)
60
 
61
  ## References
62
 
 
29
 
30
  ## Model Overview
31
 
32
+ NVIDIA Nemotron-H-8B-Reasoning-128K-FP8 is a large language model (LLM) developed by NVIDIA, designed as a unified model for both reasoning and non-reasoning tasks.It responds to user queries and tasks by first generating a reasoning trace and then concluding with a final response. The model's reasoning capabilities can be controlled via a system prompt. If the user prefers the model to provide its final answer without intermediate reasoning traces, it can be configured to do so, albeit with a slight decrease in accuracy for harder prompts that require reasoning. Conversely, allowing the model to generate reasoning traces first generally results in higher-quality final solutions to queries and tasks.
33
 
34
  The model uses a hybrid architecture consisting primarily of Mamba-2 and MLP layers combined with just four Attention layers. It is based on [Nemotron-H-8B-Base-8K](https://huggingface.co/nvidia/Nemotron-H-8B-Base-8K).
35
  The supported languages include: English, German, Spanish, French, Italian, Korean, Portuguese, Russian, Japanese, and Chinese.
 
55
 
56
  ### Release Date: 06/06/2025
57
 
58
+ Huggingface 06/06/2025 via [https://huggingface.co/](https://huggingface.co/)
 
59
 
60
  ## References
61