nvidia
/

Nemotron-H-8B-Reasoning-128K-FP8

Text Generation

Model card Files Files and versions

bkartal commited on Jun 6

Commit

c1e1094

·

verified ·

1 Parent(s): 1507d58

Update README.md

Files changed (1) hide show

README.md +2 -3

README.md CHANGED Viewed

@@ -29,7 +29,7 @@ The pretraining data has a cutoff date of September 2024\.
 ## Model Overview
-NVIDIA Nemotron-H-8B-Reasoning-128K is a large language model (LLM) developed by NVIDIA, designed as a unified model for both reasoning and non-reasoning tasks.It responds to user queries and tasks by first generating a reasoning trace and then concluding with a final response. The model's reasoning capabilities can be controlled via a system prompt. If the user prefers the model to provide its final answer without intermediate reasoning traces, it can be configured to do so, albeit with a slight decrease in accuracy for harder prompts that require reasoning. Conversely, allowing the model to generate reasoning traces first generally results in higher-quality final solutions to queries and tasks.
 The model uses a hybrid architecture consisting primarily of Mamba-2 and MLP layers combined with just four Attention layers. It is based on [Nemotron-H-8B-Base-8K](https://huggingface.co/nvidia/Nemotron-H-8B-Base-8K).
 The supported languages include: English, German, Spanish, French, Italian, Korean, Portuguese, Russian, Japanese, and Chinese.
@@ -55,8 +55,7 @@ This model has 8B of model parameters following [Nemotron-H-8B-Base-8K](https://
 ### Release Date: 06/06/2025
-Huggingface 04/09/2025 via [https://huggingface.co/](https://huggingface.co/)
-NGC 04/09/2025 via [https://catalog.ngc.nvidia.com/models](https://catalog.ngc.nvidia.com/models)
 ## References

 ## Model Overview
+NVIDIA Nemotron-H-8B-Reasoning-128K-FP8 is a large language model (LLM) developed by NVIDIA, designed as a unified model for both reasoning and non-reasoning tasks.It responds to user queries and tasks by first generating a reasoning trace and then concluding with a final response. The model's reasoning capabilities can be controlled via a system prompt. If the user prefers the model to provide its final answer without intermediate reasoning traces, it can be configured to do so, albeit with a slight decrease in accuracy for harder prompts that require reasoning. Conversely, allowing the model to generate reasoning traces first generally results in higher-quality final solutions to queries and tasks.
 The model uses a hybrid architecture consisting primarily of Mamba-2 and MLP layers combined with just four Attention layers. It is based on [Nemotron-H-8B-Base-8K](https://huggingface.co/nvidia/Nemotron-H-8B-Base-8K).
 The supported languages include: English, German, Spanish, French, Italian, Korean, Portuguese, Russian, Japanese, and Chinese.
 ### Release Date: 06/06/2025
+Huggingface 06/06/2025 via [https://huggingface.co/](https://huggingface.co/)
 ## References