Configuration Parsing
Warning:
In config.json: "architectures" must be an array
YAML Metadata
Warning:
empty or missing yaml metadata in repo card
(https://huggingface.co/docs/hub/model-cards#model-card-metadata)
DeepSeek-V3 Mini 50M Parameters
A compact version of DeepSeek-V3 Mini with exactly 58,283,136 parameters (reduced from ~181M).
Model Specifications
- Parameters: 58,283,136
- Hidden Size: 448
- Layers: 6
- Attention Heads: 8
- Intermediate Size: 1200
- Memory (FP16): ~111.2 MB
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("./deepseek_v3_mini_50m")
tokenizer = AutoTokenizer.from_pretrained("./deepseek_v3_mini_50m")
# Quick test
inputs = tokenizer("The future of AI is", return_tensors="pt")
outputs = model.generate(**inputs, max_length=50)
print(tokenizer.decode(outputs[0]))
Reductions Applied
- Hidden Size: 448
- Layers: 6
- Attention Heads: 8
- Intermediate Size: 1200
- KV LoRA Rank: 96
- Downloads last month
- 2
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support