Uploaded model
- Developed by: alibidaran
- License: apache-2.0
- Finetuned from model : alibidaran/LLAMA3-instructive_reasoning
This model is Fined-tune with GRPO algorithm to make reasoning responses for mental health and consulting applications. The following link illustrates how to design reward models to train our model with GRPO algorithm.
https://www.kaggle.com/code/alibidaran/reasoning-consueling
Direct Usages:
messages = [
{'content':system_prompt,
'role': 'system'},
{"role": "user", "content": "I want to cut down drinking alchohol but when I am with my firends I need to drink. what should I do?"},
]
inputs = tokenizer.apply_chat_template(
messages,
tokenize = True,
add_generation_prompt = True, # Must add for generation
return_tensors = "pt",
).to("cuda")
from transformers import TextStreamer
text_streamer = TextStreamer(tokenizer, skip_prompt = True)
_ = model.generate(input_ids = inputs, streamer = text_streamer, max_new_tokens = 1024,
use_cache = True, temperature = 0.7, min_p = 0.9)
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for alibidaran/GRPO_LLAMA3_Reasoning_Consultor
Base model
alibidaran/LLAMA3-instructive_reasoning