Commit
·
3cf19e4
1
Parent(s):
a89d0ee
Add granite-3.3-8b-instruct-lora-function-calling-scanner
Browse files
granite-3.3-8b-instruct-lora-function-calling-scanner/README.md
ADDED
|
@@ -0,0 +1,97 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
language:
|
| 4 |
+
- en
|
| 5 |
+
pipeline_tag: text-generation
|
| 6 |
+
library_name: transformers
|
| 7 |
+
---
|
| 8 |
+
|
| 9 |
+
# Granite 3.3 8B Instruct - Function Calling LoRA
|
| 10 |
+
|
| 11 |
+
Welcome to Granite Experiments!
|
| 12 |
+
|
| 13 |
+
Think of Experiments as a preview of what's to come. These projects are still under development, but we wanted to let the open-source community take them for spin! Use them, break them, and help us build what's next for Granite - we'll keep an eye out for feedback and questions. Happy exploring!
|
| 14 |
+
|
| 15 |
+
Just a heads-up: Experiments are forever evolving, so we can't commit to ongoing support or guarantee performance.
|
| 16 |
+
|
| 17 |
+
## Model Summary
|
| 18 |
+
|
| 19 |
+
This is a LoRA adapter for [ibm-granite/granite-3.3-8b-instruct](https://huggingface.co/ibm-granite/granite-3.3-8b-instruct),
|
| 20 |
+
adding the capability to detect incorrect function calling from LLM agents.
|
| 21 |
+
|
| 22 |
+
- **Developer:** IBM Research
|
| 23 |
+
- **Model type:** LoRA adapter for [ibm-granite/granite-3.3-8b-instruct](https://huggingface.co/ibm-granite/granite-3.3-8b-instruct)
|
| 24 |
+
- **License:** [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0)
|
| 25 |
+
|
| 26 |
+
|
| 27 |
+
### Model Sources
|
| 28 |
+
|
| 29 |
+
- **Paper:** This LoRA intrinsic is finetuned for detecting incorrect function calls from an LLM agent. This can be from simple LLM errors, or due to tool hijacking from jailbreak and prompt injection attacks.
|
| 30 |
+
|
| 31 |
+
|
| 32 |
+
## Usage
|
| 33 |
+
|
| 34 |
+
|
| 35 |
+
### Intended use
|
| 36 |
+
|
| 37 |
+
This LoRA intrinsic is finetuned for detecting incorrect function calls from an LLM agent. This can be from simple LLM errors, or due to tool hijacking from jailbreak and prompt injection attacks.
|
| 38 |
+
|
| 39 |
+
**Function Call Scanning**: The model identifies potential risks when the special role `<|start_of_role|>function_calling<|end_of_role|>` is included in prompts. Without this role, the model behaves like the base model.
|
| 40 |
+
|
| 41 |
+
### Quickstart Example
|
| 42 |
+
|
| 43 |
+
The following code describes how to use the LoRA adapter model to detect jailbreak attempts in the prompt.
|
| 44 |
+
|
| 45 |
+
```python
|
| 46 |
+
import torch
|
| 47 |
+
from transformers import AutoTokenizer, AutoModelForCausalLM
|
| 48 |
+
from peft import PeftModel
|
| 49 |
+
|
| 50 |
+
BASE_NAME = "ibm-granite/granite-3.3-8b-instruct"
|
| 51 |
+
LORA_NAME = "intrinsic/granite-3.3-8b-instruct-lora-function-calling-scanner"
|
| 52 |
+
device=torch.device('cuda' if torch.cuda.is_available() else 'cpu')
|
| 53 |
+
|
| 54 |
+
# Load model
|
| 55 |
+
tokenizer = AutoTokenizer.from_pretrained(BASE_NAME, padding_side='right', trust_remote_code=True)
|
| 56 |
+
model_base = AutoModelForCausalLM.from_pretrained(BASE_NAME, device_map="auto")
|
| 57 |
+
fc_scaner = PeftModel.from_pretrained(model_base, LORA_NAME)
|
| 58 |
+
|
| 59 |
+
# Detect incorrect function call
|
| 60 |
+
invocation_sequence = "<|start_of_role|>function_calling<|end_of_role|>"
|
| 61 |
+
|
| 62 |
+
sample = {"prompt": "Who is the current chess world champion?",
|
| 63 |
+
"tools": [
|
| 64 |
+
{"name": "stats.elo_rating", "description": "Get the elo rating of a specified player.", "parameters": {"type": "dict", "properties": {"player_name": {"type": "string", "description": "Name of FIDE rated player to fetch the current elo for"}}, "required": ["player_name"]}},
|
| 65 |
+
{"name": "stats.world_champion", "description": "Get the chess world champion by year.", "parameters": {"type": "dict", "properties": {"year": {"type": "int", "description": "Obtains the world chamption of this year."}}, "required": ["year"]}},
|
| 66 |
+
{"name": "stats.opening_moves", "description": "Get the main line moves for a chess opening.", "parameters": {"type": "dict", "properties": {"opening_name": {"type": "string", "description": "Name of the opening."}}, "required": ["opening_name"]}}
|
| 67 |
+
],
|
| 68 |
+
|
| 69 |
+
'answer': {'name': 'stats.opening_moves', 'arguments': {"Caro-Kann Defense"}}}
|
| 70 |
+
|
| 71 |
+
chat = [{"role": "user", "content": sample["prompt"]},
|
| 72 |
+
{"role": "tools", "content": str(sample["tools"])},
|
| 73 |
+
{"role": "assistant", "content": str(sample["answer"])}]
|
| 74 |
+
chat = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=False)
|
| 75 |
+
chat = chat + invocation_sequence
|
| 76 |
+
|
| 77 |
+
inputs = tokenizer(chat, return_tensors="pt")
|
| 78 |
+
output = fc_scaner.generate(inputs["input_ids"].to(device), attention_mask=inputs["attention_mask"].to(device), max_new_tokens=1)
|
| 79 |
+
output_text = tokenizer.decode(output[0][-1])
|
| 80 |
+
print(f"FC Error Detected: {output_text}")
|
| 81 |
+
|
| 82 |
+
# Y - yes, function calling error detected.
|
| 83 |
+
# N - no, correct function call.
|
| 84 |
+
```
|
| 85 |
+
|
| 86 |
+
## Evaluation
|
| 87 |
+
|
| 88 |
+
The LoRA was evaluated against [Granite Guardian](https://github.com/ibm-granite/granite-guardian/) on instances where a base LLM selected both correct and incorrrect function calling responses.
|
| 89 |
+
|
| 90 |
+
| Model | Accuracy | TPR | FPR |
|
| 91 |
+
| --- | --- | --- | --- |
|
| 92 |
+
| Granite Guardian 3.3 8B | 0.9122 | 0.8643 | 0.0419 |
|
| 93 |
+
| Granite 3.3 8B LoRA FC Scanner | 0.989 | 0.983 | 0.006 |
|
| 94 |
+
|
| 95 |
+
## Contact
|
| 96 |
+
|
| 97 |
+
Greta Dolcetti, Giulio Zizzo, Ambrish Rawat
|
granite-3.3-8b-instruct-lora-function-calling-scanner/adapter_config.json
ADDED
|
@@ -0,0 +1,38 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"alpha_pattern": {},
|
| 3 |
+
"auto_mapping": null,
|
| 4 |
+
"base_model_name_or_path": "ibm-granite/granite-3.3-8b-instruct",
|
| 5 |
+
"bias": "none",
|
| 6 |
+
"corda_config": null,
|
| 7 |
+
"eva_config": null,
|
| 8 |
+
"exclude_modules": null,
|
| 9 |
+
"fan_in_fan_out": false,
|
| 10 |
+
"inference_mode": true,
|
| 11 |
+
"init_lora_weights": true,
|
| 12 |
+
"layer_replication": null,
|
| 13 |
+
"layers_pattern": null,
|
| 14 |
+
"layers_to_transform": null,
|
| 15 |
+
"loftq_config": {},
|
| 16 |
+
"lora_alpha": 32,
|
| 17 |
+
"lora_bias": false,
|
| 18 |
+
"lora_dropout": 0.05,
|
| 19 |
+
"megatron_config": null,
|
| 20 |
+
"megatron_core": "megatron.core",
|
| 21 |
+
"modules_to_save": null,
|
| 22 |
+
"peft_type": "LORA",
|
| 23 |
+
"qalora_group_size": 16,
|
| 24 |
+
"r": 32,
|
| 25 |
+
"rank_pattern": {},
|
| 26 |
+
"revision": null,
|
| 27 |
+
"target_modules": [
|
| 28 |
+
"k_proj",
|
| 29 |
+
"v_proj",
|
| 30 |
+
"q_proj"
|
| 31 |
+
],
|
| 32 |
+
"target_parameters": null,
|
| 33 |
+
"task_type": "CAUSAL_LM",
|
| 34 |
+
"trainable_token_indices": null,
|
| 35 |
+
"use_dora": false,
|
| 36 |
+
"use_qalora": false,
|
| 37 |
+
"use_rslora": false
|
| 38 |
+
}
|
granite-3.3-8b-instruct-lora-function-calling-scanner/adapter_model.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:5eff6d002ca79c6e72366fd4b8fb41ebc1cd7e5cbe0d1901e8ef0d9c7545ada2
|
| 3 |
+
size 94404160
|