Upload folder using huggingface_hub

Browse files

Files changed (6) hide show

README.md +62 -0
config.json +62 -0
merges.txt +0 -0
pytorch_model.bin +3 -0
training.log +0 -0
vocab.json +0 -0

README.md ADDED Viewed

	@@ -0,0 +1,62 @@

+---
+license: apache-2.0
+base_model: "Qwen/Qwen3-0.6B"
+tags:
+- text-generation
+- deepspeed
+- fine-tuned
+language:
+- en
+library_name: transformers
+pipeline_tag: text-generation
+---
+# Qwen3-0.6B-v0.1
+DeepSpeed-Chat으로 파인튜닝된 언어 모델
+## Model Details
+이 모델은 DeepSpeed-Chat을 사용하여 파인튜닝된 모델입니다.
+- **Base Model**: 기본 모델 정보를 여기에 추가하세요
+- **Fine-tuning Method**: DeepSpeed-Chat
+- **Training Data**: 학습 데이터 정보를 여기에 추가하세요
+## Usage
+```python
+from transformers import AutoTokenizer, AutoModelForCausalLM
+tokenizer = AutoTokenizer.from_pretrained("mncai/Qwen3-0.6B-v0.1")
+model = AutoModelForCausalLM.from_pretrained("mncai/Qwen3-0.6B-v0.1")
+# 텍스트 생성
+input_text = "Your prompt here"
+inputs = tokenizer(input_text, return_tensors="pt")
+outputs = model.generate(**inputs, max_length=100)
+response = tokenizer.decode(outputs[0], skip_special_tokens=True)
+```
+## Training Details
+- **Training Framework**: DeepSpeed
+- **Training Script**: DeepSpeed-Chat Step 1 Supervised Fine-tuning
+- **Upload Date**: N/A
+## Limitations and Biases
+이 모델의 한계점과 편향성에 대한 정보를 여기에 추가하세요.
+## Citation
+DeepSpeed-Chat을 사용했다면 다음을 인용해주세요:
+```
+@misc{deepspeed-chat,
+  title={DeepSpeed-Chat: Easy, Fast and Affordable RLHF Training of ChatGPT-like Models at All Scales},
+  author={Yuxiao Zhuang et al.},
+  year={2023},
+  url={https://github.com/microsoft/DeepSpeed}
+}
+```

config.json ADDED Viewed

	@@ -0,0 +1,62 @@

+{
+  "architectures": [
+    "Qwen3ForCausalLM"
+  ],
+  "attention_bias": false,
+  "attention_dropout": 0.0,
+  "bos_token_id": 151643,
+  "end_token_id": 151645,
+  "eos_token_id": 151645,
+  "head_dim": 128,
+  "hidden_act": "silu",
+  "hidden_size": 1024,
+  "initializer_range": 0.02,
+  "intermediate_size": 3072,
+  "layer_types": [
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention"
+  ],
+  "max_position_embeddings": 40960,
+  "max_window_layers": 28,
+  "model_type": "qwen3",
+  "num_attention_heads": 16,
+  "num_hidden_layers": 28,
+  "num_key_value_heads": 8,
+  "pad_token_id": 151645,
+  "rms_norm_eps": 1e-06,
+  "rope_scaling": null,
+  "rope_theta": 1000000,
+  "sliding_window": null,
+  "tie_word_embeddings": true,
+  "torch_dtype": "float32",
+  "transformers_version": "4.53.0",
+  "use_cache": true,
+  "use_sliding_window": false,
+  "vocab_size": 151672
+}

merges.txt ADDED Viewed

The diff for this file is too large to render. See raw diff

pytorch_model.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:05568f391e8e2bb14dde542ba027cc2a37bbc80978ff922b7869b825e4d99358
+size 1191623484

training.log ADDED Viewed

The diff for this file is too large to render. See raw diff

vocab.json ADDED Viewed

The diff for this file is too large to render. See raw diff