GiulioZizzo commited on
Commit
3cf19e4
·
1 Parent(s): a89d0ee

Add granite-3.3-8b-instruct-lora-function-calling-scanner

Browse files
granite-3.3-8b-instruct-lora-function-calling-scanner/README.md ADDED
@@ -0,0 +1,97 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ pipeline_tag: text-generation
6
+ library_name: transformers
7
+ ---
8
+
9
+ # Granite 3.3 8B Instruct - Function Calling LoRA
10
+
11
+ Welcome to Granite Experiments!
12
+
13
+ Think of Experiments as a preview of what's to come. These projects are still under development, but we wanted to let the open-source community take them for spin! Use them, break them, and help us build what's next for Granite - we'll keep an eye out for feedback and questions. Happy exploring!
14
+
15
+ Just a heads-up: Experiments are forever evolving, so we can't commit to ongoing support or guarantee performance.
16
+
17
+ ## Model Summary
18
+
19
+ This is a LoRA adapter for [ibm-granite/granite-3.3-8b-instruct](https://huggingface.co/ibm-granite/granite-3.3-8b-instruct),
20
+ adding the capability to detect incorrect function calling from LLM agents.
21
+
22
+ - **Developer:** IBM Research
23
+ - **Model type:** LoRA adapter for [ibm-granite/granite-3.3-8b-instruct](https://huggingface.co/ibm-granite/granite-3.3-8b-instruct)
24
+ - **License:** [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0)
25
+
26
+
27
+ ### Model Sources
28
+
29
+ - **Paper:** This LoRA intrinsic is finetuned for detecting incorrect function calls from an LLM agent. This can be from simple LLM errors, or due to tool hijacking from jailbreak and prompt injection attacks.
30
+
31
+
32
+ ## Usage
33
+
34
+
35
+ ### Intended use
36
+
37
+ This LoRA intrinsic is finetuned for detecting incorrect function calls from an LLM agent. This can be from simple LLM errors, or due to tool hijacking from jailbreak and prompt injection attacks.
38
+
39
+ **Function Call Scanning**: The model identifies potential risks when the special role `<|start_of_role|>function_calling<|end_of_role|>` is included in prompts. Without this role, the model behaves like the base model.
40
+
41
+ ### Quickstart Example
42
+
43
+ The following code describes how to use the LoRA adapter model to detect jailbreak attempts in the prompt.
44
+
45
+ ```python
46
+ import torch
47
+ from transformers import AutoTokenizer, AutoModelForCausalLM
48
+ from peft import PeftModel
49
+
50
+ BASE_NAME = "ibm-granite/granite-3.3-8b-instruct"
51
+ LORA_NAME = "intrinsic/granite-3.3-8b-instruct-lora-function-calling-scanner"
52
+ device=torch.device('cuda' if torch.cuda.is_available() else 'cpu')
53
+
54
+ # Load model
55
+ tokenizer = AutoTokenizer.from_pretrained(BASE_NAME, padding_side='right', trust_remote_code=True)
56
+ model_base = AutoModelForCausalLM.from_pretrained(BASE_NAME, device_map="auto")
57
+ fc_scaner = PeftModel.from_pretrained(model_base, LORA_NAME)
58
+
59
+ # Detect incorrect function call
60
+ invocation_sequence = "<|start_of_role|>function_calling<|end_of_role|>"
61
+
62
+ sample = {"prompt": "Who is the current chess world champion?",
63
+ "tools": [
64
+ {"name": "stats.elo_rating", "description": "Get the elo rating of a specified player.", "parameters": {"type": "dict", "properties": {"player_name": {"type": "string", "description": "Name of FIDE rated player to fetch the current elo for"}}, "required": ["player_name"]}},
65
+ {"name": "stats.world_champion", "description": "Get the chess world champion by year.", "parameters": {"type": "dict", "properties": {"year": {"type": "int", "description": "Obtains the world chamption of this year."}}, "required": ["year"]}},
66
+ {"name": "stats.opening_moves", "description": "Get the main line moves for a chess opening.", "parameters": {"type": "dict", "properties": {"opening_name": {"type": "string", "description": "Name of the opening."}}, "required": ["opening_name"]}}
67
+ ],
68
+
69
+ 'answer': {'name': 'stats.opening_moves', 'arguments': {"Caro-Kann Defense"}}}
70
+
71
+ chat = [{"role": "user", "content": sample["prompt"]},
72
+ {"role": "tools", "content": str(sample["tools"])},
73
+ {"role": "assistant", "content": str(sample["answer"])}]
74
+ chat = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=False)
75
+ chat = chat + invocation_sequence
76
+
77
+ inputs = tokenizer(chat, return_tensors="pt")
78
+ output = fc_scaner.generate(inputs["input_ids"].to(device), attention_mask=inputs["attention_mask"].to(device), max_new_tokens=1)
79
+ output_text = tokenizer.decode(output[0][-1])
80
+ print(f"FC Error Detected: {output_text}")
81
+
82
+ # Y - yes, function calling error detected.
83
+ # N - no, correct function call.
84
+ ```
85
+
86
+ ## Evaluation
87
+
88
+ The LoRA was evaluated against [Granite Guardian](https://github.com/ibm-granite/granite-guardian/) on instances where a base LLM selected both correct and incorrrect function calling responses.
89
+
90
+ | Model | Accuracy | TPR | FPR |
91
+ | --- | --- | --- | --- |
92
+ | Granite Guardian 3.3 8B | 0.9122 | 0.8643 | 0.0419 |
93
+ | Granite 3.3 8B LoRA FC Scanner | 0.989 | 0.983 | 0.006 |
94
+
95
+ ## Contact
96
+
97
+ Greta Dolcetti, Giulio Zizzo, Ambrish Rawat
granite-3.3-8b-instruct-lora-function-calling-scanner/adapter_config.json ADDED
@@ -0,0 +1,38 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "alpha_pattern": {},
3
+ "auto_mapping": null,
4
+ "base_model_name_or_path": "ibm-granite/granite-3.3-8b-instruct",
5
+ "bias": "none",
6
+ "corda_config": null,
7
+ "eva_config": null,
8
+ "exclude_modules": null,
9
+ "fan_in_fan_out": false,
10
+ "inference_mode": true,
11
+ "init_lora_weights": true,
12
+ "layer_replication": null,
13
+ "layers_pattern": null,
14
+ "layers_to_transform": null,
15
+ "loftq_config": {},
16
+ "lora_alpha": 32,
17
+ "lora_bias": false,
18
+ "lora_dropout": 0.05,
19
+ "megatron_config": null,
20
+ "megatron_core": "megatron.core",
21
+ "modules_to_save": null,
22
+ "peft_type": "LORA",
23
+ "qalora_group_size": 16,
24
+ "r": 32,
25
+ "rank_pattern": {},
26
+ "revision": null,
27
+ "target_modules": [
28
+ "k_proj",
29
+ "v_proj",
30
+ "q_proj"
31
+ ],
32
+ "target_parameters": null,
33
+ "task_type": "CAUSAL_LM",
34
+ "trainable_token_indices": null,
35
+ "use_dora": false,
36
+ "use_qalora": false,
37
+ "use_rslora": false
38
+ }
granite-3.3-8b-instruct-lora-function-calling-scanner/adapter_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5eff6d002ca79c6e72366fd4b8fb41ebc1cd7e5cbe0d1901e8ef0d9c7545ada2
3
+ size 94404160