Uploaded model

  • Developed by: alibidaran
  • License: apache-2.0
  • Finetuned from model : alibidaran/LLAMA3-instructive_reasoning

This model is Fined-tune with GRPO algorithm to make reasoning responses for mental health and consulting applications. The following link illustrates how to design reward models to train our model with GRPO algorithm.

https://www.kaggle.com/code/alibidaran/reasoning-consueling

Direct Usages:

messages = [
    {'content':system_prompt,
  'role': 'system'},
    {"role": "user", "content": "I want to cut down drinking alchohol but when I am with my firends I need to drink. what should I do?"},
]
inputs = tokenizer.apply_chat_template(
    messages,
    tokenize = True,
    add_generation_prompt = True, # Must add for generation
    return_tensors = "pt",
).to("cuda")

from transformers import TextStreamer
text_streamer = TextStreamer(tokenizer, skip_prompt = True)
_ = model.generate(input_ids = inputs, streamer = text_streamer, max_new_tokens = 1024,
                   use_cache = True, temperature = 0.7, min_p = 0.9)
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for alibidaran/GRPO_LLAMA3_Reasoning_Consultor

Finetuned
(1)
this model

Dataset used to train alibidaran/GRPO_LLAMA3_Reasoning_Consultor