Asterope-21-OpenR1 / README.md

Update README.md

6d027d1 verified 7 months ago

3.76 kB

	---
	library_name: transformers
	tags:
	- text-generation-inference
	- Deepseek
	- code
	- math
	- RL
	- R1
	license: apache-2.0
	language:
	- en
	base_model:
	- deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
	pipeline_tag: text-generation
	---

	![OR1.png](https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/OVqPEwaNZDvlDxsR0osHk.png)

	# Asterope-21-OpenR1

	> Asterope-21-OpenR1 is a distributed reinforcement learning (RL) fine-tuned model based on Qwen-1.5B, purpose-built to enhance coding proficiency, debugging accuracy, and step-by-step reasoning in software development tasks across multiple programming languages. Compact yet capable, it's ideal for intelligent coding assistants, developer tools, and embedded reasoning engines.

	## Key Features

	1. Code-Centric Chain-of-Thought Reasoning
	Optimized to generate structured, multi-step solutions for programming problems — including algorithm design, debugging, and code explanation — enabling developers to understand the "why" behind each step.

	2. Distributed Reinforcement Learning Fine-Tuning
	Trained with reinforcement learning across distributed environments to reinforce optimal coding strategies and accurate logical reasoning pathways.

	3. Multilingual Programming Support
	Supports various programming languages (e.g., Python, JavaScript, C++, Java, Go) and adapts to a wide range of development contexts from scripting to systems programming.

	4. Lightweight, Developer-Ready (1.5B Parameters)
	Designed for low-latency environments like IDE extensions, browser dev tools, and CLI bots, making it both fast and resource-efficient.

	## Quickstart with Transformers

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer

	model_name = "prithivMLmods/Asterope-21-OpenR1"

	model = AutoModelForCausalLM.from_pretrained(
	model_name,
	torch_dtype="auto",
	device_map="auto"
	)
	tokenizer = AutoTokenizer.from_pretrained(model_name)

	prompt = "Debug the following Python code:\ndef add(a, b):\n return a + b\nprint(add(5))"
	messages = [
	{"role": "system", "content": "You are a skilled coding assistant capable of reasoning step-by-step to solve software development tasks."},
	{"role": "user", "content": prompt}
	]
	text = tokenizer.apply_chat_template(
	messages,
	tokenize=False,
	add_generation_prompt=True
	)
	model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

	generated_ids = model.generate(
	**model_inputs,
	max_new_tokens=512
	)
	generated_ids = [
	output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
	]

	response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
	```

	## Intended Use

	- Code Debugging Assistants: Identifying, explaining, and fixing bugs with precision.
	- Educational Coding Tools: Helping users learn how and why code works, with rich step-by-step walkthroughs.
	- Multi-language Code Generation: Write clean, working code across languages and platforms.
	- Lightweight IDE Integration: Embed into editors, terminals, or web-based environments.

	## Limitations

	1. Focused Domain:
	Optimized for development workflows. May underperform in creative or non-technical tasks.

	2. Model Scale:
	Though efficient, complex multi-file or large-context debugging tasks may benefit from larger models.

	3. RL Bias Toward Code Tasks:
	Reinforcement learning favors coding reasoning paths — outputs for general-purpose Q&A may be limited.

	4. Prompt Structure Matters:
	More effective when inputs include structured error messages, full code context, or clear questions.