File size: 2,995 Bytes
97ba7b6
 
 
c8a9128
 
97ba7b6
 
 
1ff682e
c8a9128
 
1ff682e
 
97ba7b6
 
 
 
 
 
 
 
 
 
 
2eb4005
97ba7b6
2eb4005
97ba7b6
7d6c5c7
c8a9128
97ba7b6
c8a9128
97ba7b6
f9b0108
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
c8a9128
 
f9b0108
 
c8a9128
 
 
 
1ff682e
 
c8a9128
1ff682e
c8a9128
 
 
1ff682e
 
c8a9128
 
 
 
 
 
f9b0108
 
1ff682e
 
c8a9128
 
 
 
 
 
 
 
 
 
 
1ff682e
 
 
f9b0108
2eb4005
c8a9128
 
 
 
 
 
2eb4005
1ff682e
c8a9128
2eb4005
 
c8a9128
2eb4005
 
 
 
 
1ff682e
c8a9128
 
1ff682e
2eb4005
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
---
language: en
tags:
- merge
- mergekit
- deepseek
- opt
- code-generation
datasets:
- openai_humaneval
base_model: deepseek-ai/deepseek-coder-1.3b-base
pipeline_tag: text-generation
library_name: transformers
model-index:
- name: DeepSeek-OPT-Merged-1.3B
  results:
  - task:
      type: text-generation
      dataset:
        type: openai_humaneval
        name: HumanEval
    metrics:
    - name: pass@1
      type: pass@1
      value: 0
      verified: false
license: apache-2.0
---

# DeepSeek-OPT-Merged-1.3B

A merged model combining DeepSeek Coder 1.3B and OPT-350M using linear interpolation merge technique.

## 🔍 Model Description
This model is created by merging two foundation models:
- Primary: DeepSeek Coder 1.3B (code generation capabilities)
- Secondary: OPT-350M (general language understanding)

## 🛠️ Training/Merging Process
1. **Base Models Selection**:
   - DeepSeek Coder 1.3B for code understanding
   - OPT-350M for general language capabilities

2. **Merge Technique**:
   - Method: Linear interpolation
   - Weight ratio: α=0.5 (50% each model)
   - No additional training, pure weight merging

3. **Technical Process**:
   - Used PyTorch for model handling
   - Applied float16 precision
   - Implemented memory efficient merging
   - Used device map auto-detection

## 🧩 Configuration
models:
model: deepseek-ai/deepseek-coder-1.3b-base # Base model
model: facebook/opt-350m # Target model
merge_method: linear
parameters:
alpha: 0.5
dtype: float16


## 💻 Usage

python
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch


Load model and tokenizer
model = AutoModelForCausalLM.from_pretrained(
"grozmart1/deepseek-opt-merged-1.3b",
torch_dtype=torch.float16,
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("grozmart1/deepseek-opt-merged-1.3b")



Example usage
text = "Write a Python function to sort a list:"
inputs = tokenizer(text, return_tensors="pt")
outputs = model.generate(
inputs,
max_length=200,
temperature=0.7,
top_p=0.95,
do_sample=True
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))





## 🔧 Technical Details
- Architecture: Transformer-based language model
- Parameters: ~1.3B
- Precision: float16
- Merge Method: Linear interpolation (α=0.5)
- Device Support: CPU/GPU (Auto device mapping)
- Memory Requirements: ~4GB GPU RAM or 8GB CPU RAM

## 📊 Model Evaluation
- Dataset: HumanEval (Code Generation Benchmark)
- Metric: pass@1 (Functional Correctness)
- Status: Pending evaluation
- Expected Capabilities:
  - Code completion
  - Function generation
  - Technical documentation
  - General text generation

## 📝 License
Apache 2.0

## 🚀 Intended Use
- Code generation and completion
- Technical documentation
- Programming assistance
- General text generation tasks

## ⚠️ Limitations
- Inherits limitations from both parent models
- May show inconsistencies in code generation
- Limited by context window of base models
- Performance varies by task type