Why is a 'raw prompt' performing better than a 'chat template prompt'

#250

by MLooten - opened Jul 15

Jul 15

•

I currently have a prompt that is containing instruction (let's call it system part) and the sentence I want to use as 'user' input (user part).

On one side I use a single prompt composed of system + user (as a string) and I simply call tokenizer + model.generate.

                prompt = system_prompt + user_prompt
                inputs = tokenizer(prompt, return_tensors="pt", padding=True, truncation=True)
                outputs = model.generate(**inputs, max_new_tokens=500, temperature=temperature, do_sample=False)

On the other side I am creating a message, tokenize it with chat_template and apply model.generate

                messages = [
                    {"role": "system", "content": system_prompt},
                    {"role": "user", "content": user_prompt}
                ]
                prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) 
                inputs = tokenizer(prompt, return_tensors="pt", padding=True, truncation=True)
                outputs = model.generate(**inputs, max_new_tokens=500, temperature=temperature, do_sample=False)

The issue I am currently facing and do not understand actually is that the 'raw prompt' is performing better than the formatted one.

Can somebody explain my why?
If I look at the huggingface page of the model it seems that it is recommended to use system + user and format it correctly with the tokenizer.

Thanks in advance.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment