File size: 3,755 Bytes
c3f73a3
 
7d87a36
 
 
 
 
 
6d027d1
7d87a36
 
 
 
 
 
c3f73a3
 
7d87a36
 
ceebfd6
c3f73a3
ceebfd6
c3f73a3
ceebfd6
c3f73a3
ceebfd6
 
c3f73a3
ceebfd6
 
c3f73a3
ceebfd6
 
c3f73a3
ceebfd6
 
c3f73a3
ceebfd6
c3f73a3
ceebfd6
 
c3f73a3
ceebfd6
c3f73a3
ceebfd6
 
 
 
 
 
c3f73a3
ceebfd6
 
 
 
 
 
 
 
 
 
 
c3f73a3
ceebfd6
 
 
 
 
 
 
c3f73a3
ceebfd6
 
c3f73a3
ceebfd6
c3f73a3
ceebfd6
 
 
 
c3f73a3
ceebfd6
c3f73a3
ceebfd6
 
c3f73a3
ceebfd6
 
c3f73a3
ceebfd6
 
c3f73a3
ceebfd6
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
---
library_name: transformers
tags:
- text-generation-inference
- Deepseek
- code
- math
- RL
- R1
license: apache-2.0
language:
- en
base_model:
- deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
pipeline_tag: text-generation
---

![OR1.png](https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/OVqPEwaNZDvlDxsR0osHk.png)

# **Asterope-21-OpenR1**

> **Asterope-21-OpenR1** is a **distributed reinforcement learning (RL)** fine-tuned model based on **Qwen-1.5B**, purpose-built to enhance **coding proficiency**, **debugging accuracy**, and **step-by-step reasoning** in **software development tasks** across multiple programming languages. Compact yet capable, it's ideal for intelligent coding assistants, developer tools, and embedded reasoning engines.

## **Key Features**

1. **Code-Centric Chain-of-Thought Reasoning**  
   Optimized to generate structured, multi-step solutions for programming problems — including algorithm design, debugging, and code explanation — enabling developers to understand the "why" behind each step.

2. **Distributed Reinforcement Learning Fine-Tuning**  
   Trained with reinforcement learning across distributed environments to reinforce optimal coding strategies and accurate logical reasoning pathways.

3. **Multilingual Programming Support**  
   Supports various programming languages (e.g., **Python**, **JavaScript**, **C++**, **Java**, **Go**) and adapts to a wide range of development contexts from scripting to systems programming.

4. **Lightweight, Developer-Ready (1.5B Parameters)**  
   Designed for low-latency environments like IDE extensions, browser dev tools, and CLI bots, making it both fast and resource-efficient.

## **Quickstart with Transformers**

```python
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "prithivMLmods/Asterope-21-OpenR1"

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

prompt = "Debug the following Python code:\ndef add(a, b):\n  return a + b\nprint(add(5))"
messages = [
    {"role": "system", "content": "You are a skilled coding assistant capable of reasoning step-by-step to solve software development tasks."},
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=512
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
```

## **Intended Use**

- **Code Debugging Assistants**: Identifying, explaining, and fixing bugs with precision.
- **Educational Coding Tools**: Helping users learn how and why code works, with rich step-by-step walkthroughs.
- **Multi-language Code Generation**: Write clean, working code across languages and platforms.
- **Lightweight IDE Integration**: Embed into **editors**, **terminals**, or **web-based environments**.

## **Limitations**

1. **Focused Domain**:  
   Optimized for development workflows. May underperform in creative or non-technical tasks.

2. **Model Scale**:  
   Though efficient, complex multi-file or large-context debugging tasks may benefit from larger models.

3. **RL Bias Toward Code Tasks**:  
   Reinforcement learning favors coding reasoning paths — outputs for general-purpose Q&A may be limited.

4. **Prompt Structure Matters**:  
   More effective when inputs include structured error messages, full code context, or clear questions.