Configuration Parsing Warning: In config.json: "architectures" must be an array
YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

DeepSeek-V3 Mini 50M Parameters

A compact version of DeepSeek-V3 Mini with exactly 58,283,136 parameters (reduced from ~181M).

Model Specifications

  • Parameters: 58,283,136
  • Hidden Size: 448
  • Layers: 6
  • Attention Heads: 8
  • Intermediate Size: 1200
  • Memory (FP16): ~111.2 MB

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("./deepseek_v3_mini_50m")
tokenizer = AutoTokenizer.from_pretrained("./deepseek_v3_mini_50m")

# Quick test
inputs = tokenizer("The future of AI is", return_tensors="pt")
outputs = model.generate(**inputs, max_length=50)
print(tokenizer.decode(outputs[0]))

Reductions Applied

  • Hidden Size: 448
  • Layers: 6
  • Attention Heads: 8
  • Intermediate Size: 1200
  • KV LoRA Rank: 96
Downloads last month
2
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support