Uploaded model

Developed by: alibidaran
License: apache-2.0
Finetuned from model : alibidaran/LLAMA3-instructive_reasoning

This model is Fined-tune with GRPO algorithm to make reasoning responses for mental health and consulting applications. The following link illustrates how to design reward models to train our model with GRPO algorithm.

https://www.kaggle.com/code/alibidaran/reasoning-consueling

Direct Usages:

messages = [
    {'content':system_prompt,
  'role': 'system'},
    {"role": "user", "content": "I want to cut down drinking alchohol but when I am with my firends I need to drink. what should I do?"},
]
inputs = tokenizer.apply_chat_template(
    messages,
    tokenize = True,
    add_generation_prompt = True, # Must add for generation
    return_tensors = "pt",
).to("cuda")

from transformers import TextStreamer
text_streamer = TextStreamer(tokenizer, skip_prompt = True)
_ = model.generate(input_ids = inputs, streamer = text_streamer, max_new_tokens = 1024,
                   use_cache = True, temperature = 0.7, min_p = 0.9)

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for alibidaran/GRPO_LLAMA3_Reasoning_Consultor

Base model

alibidaran/LLAMA3-instructive_reasoning

Finetuned

(1)

this model

alibidaran
/

GRPO_LLAMA3_Reasoning_Consultor

Uploaded model

Direct Usages:

Model tree for alibidaran/GRPO_LLAMA3_Reasoning_Consultor

Dataset used to train alibidaran/GRPO_LLAMA3_Reasoning_Consultor