|
|
--- |
|
|
language: |
|
|
- en |
|
|
license: apache-2.0 |
|
|
pipeline_tag: text-generation |
|
|
tags: |
|
|
- chess |
|
|
- neuron |
|
|
- aws-trainium |
|
|
- vllm |
|
|
- optimum-neuron |
|
|
- continuous-batching |
|
|
base_model: karanps/ChessLM_Qwen3 |
|
|
--- |
|
|
|
|
|
# ChessLM Qwen3 - Neuron Traced (AWS Format Structure) |
|
|
|
|
|
This is a Neuron-traced version of [karanps/ChessLM_Qwen3](https://huggingface.co/karanps/ChessLM_Qwen3) optimized for AWS Trainium (trn2) instances using vLLM. |
|
|
|
|
|
This model follows the AWS Neuron repository structure with separate directories for compiled artifacts. |
|
|
|
|
|
This model is meant to be used from within the Neuron Workshop (https://github.com/aws-neuron/neuron-workshops) |
|
|
|
|
|
## Model Details |
|
|
|
|
|
- **Base Model**: Qwen3-8B fine-tuned for chess |
|
|
- **Compilation**: optimum-neuron[vllm]==0.3.0 |
|
|
- **Compiler Version**: neuronxcc 2.21.33363.0 |
|
|
- **Target Hardware**: AWS Trainium2 (trn2) |
|
|
- **Precision**: BF16 |
|
|
- **Tensor Parallelism**: 2 cores |
|
|
- **Batch Size**: 4 (continuous batching enabled) |
|
|
- **Max Sequence Length**: 2048 |
|
|
|
|
|
|
|
|
## Compilation instructions |
|
|
``` |
|
|
optimum-cli export neuron \ |
|
|
--model karanps/ChessLM_Qwen3 \ |
|
|
--task text-generation \ |
|
|
--sequence_length 2048 \ |
|
|
--batch_size 4 \ |
|
|
/home/ubuntu/environment/ml/qwen-chess/karanps/ChessLM_Qwen3_compiled |
|
|
``` |
|
|
|
|
|
### Key Files |
|
|
|
|
|
- **context_encoding_model/**: Compiled NEFF files for processing initial prompt sequences (up to 2048 tokens) |
|
|
- **token_generation_model/**: Compiled NEFF files for autoregressive token generation |
|
|
- **layout_opt/**: Layout optimization artifacts from compilation |
|
|
- **model.pt**: Main model file containing compiled graphs and embedded weights (17GB) |
|
|
- **neuron_config.json**: Neuron compilation configuration |
|
|
|
|
|
## Model Files |
|
|
|
|
|
| File | Purpose | |
|
|
|------|---------| |
|
|
| model.pt | Main model with embedded weights (17GB) | |
|
|
| config.json | Base model configuration | |
|
|
| neuron_config.json | Neuron compilation settings | |
|
|
| tokenizer* | Tokenizer files for text processing | |
|
|
|
|
|
## License |
|
|
|
|
|
This model inherits the license from the base model [karanps/ChessLM_Qwen3](https://huggingface.co/karanps/ChessLM_Qwen3). |
|
|
|
|
|
|