File size: 2,995 Bytes
97ba7b6 c8a9128 97ba7b6 1ff682e c8a9128 1ff682e 97ba7b6 2eb4005 97ba7b6 2eb4005 97ba7b6 7d6c5c7 c8a9128 97ba7b6 c8a9128 97ba7b6 f9b0108 c8a9128 f9b0108 c8a9128 1ff682e c8a9128 1ff682e c8a9128 1ff682e c8a9128 f9b0108 1ff682e c8a9128 1ff682e f9b0108 2eb4005 c8a9128 2eb4005 1ff682e c8a9128 2eb4005 c8a9128 2eb4005 1ff682e c8a9128 1ff682e 2eb4005 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 |
---
language: en
tags:
- merge
- mergekit
- deepseek
- opt
- code-generation
datasets:
- openai_humaneval
base_model: deepseek-ai/deepseek-coder-1.3b-base
pipeline_tag: text-generation
library_name: transformers
model-index:
- name: DeepSeek-OPT-Merged-1.3B
results:
- task:
type: text-generation
dataset:
type: openai_humaneval
name: HumanEval
metrics:
- name: pass@1
type: pass@1
value: 0
verified: false
license: apache-2.0
---
# DeepSeek-OPT-Merged-1.3B
A merged model combining DeepSeek Coder 1.3B and OPT-350M using linear interpolation merge technique.
## 🔍 Model Description
This model is created by merging two foundation models:
- Primary: DeepSeek Coder 1.3B (code generation capabilities)
- Secondary: OPT-350M (general language understanding)
## 🛠️ Training/Merging Process
1. **Base Models Selection**:
- DeepSeek Coder 1.3B for code understanding
- OPT-350M for general language capabilities
2. **Merge Technique**:
- Method: Linear interpolation
- Weight ratio: α=0.5 (50% each model)
- No additional training, pure weight merging
3. **Technical Process**:
- Used PyTorch for model handling
- Applied float16 precision
- Implemented memory efficient merging
- Used device map auto-detection
## 🧩 Configuration
models:
model: deepseek-ai/deepseek-coder-1.3b-base # Base model
model: facebook/opt-350m # Target model
merge_method: linear
parameters:
alpha: 0.5
dtype: float16
## 💻 Usage
python
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
Load model and tokenizer
model = AutoModelForCausalLM.from_pretrained(
"grozmart1/deepseek-opt-merged-1.3b",
torch_dtype=torch.float16,
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("grozmart1/deepseek-opt-merged-1.3b")
Example usage
text = "Write a Python function to sort a list:"
inputs = tokenizer(text, return_tensors="pt")
outputs = model.generate(
inputs,
max_length=200,
temperature=0.7,
top_p=0.95,
do_sample=True
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
## 🔧 Technical Details
- Architecture: Transformer-based language model
- Parameters: ~1.3B
- Precision: float16
- Merge Method: Linear interpolation (α=0.5)
- Device Support: CPU/GPU (Auto device mapping)
- Memory Requirements: ~4GB GPU RAM or 8GB CPU RAM
## 📊 Model Evaluation
- Dataset: HumanEval (Code Generation Benchmark)
- Metric: pass@1 (Functional Correctness)
- Status: Pending evaluation
- Expected Capabilities:
- Code completion
- Function generation
- Technical documentation
- General text generation
## 📝 License
Apache 2.0
## 🚀 Intended Use
- Code generation and completion
- Technical documentation
- Programming assistance
- General text generation tasks
## ⚠️ Limitations
- Inherits limitations from both parent models
- May show inconsistencies in code generation
- Limited by context window of base models
- Performance varies by task type |