CTCT-CT2
/

changeway_guardrails

Safetensors

deberta-v2

Model card Files Files and versions

xet

Community

JiaqiWei commited on Jul 25

Commit

9712682

verified ·

1 Parent(s): 6c522b4

Update README.md

Browse files

Files changed (1) hide show

README.md +4 -4

README.md CHANGED Viewed

@@ -141,7 +141,7 @@ Example:
 ### 2.3 Generation Algorithms for Prompt Injection Attacks
-In terms of technical approaches to induce AI systems into generating harmful content, researchers have proposed a variety of prompt injection generation algorithms, which continue to evolve rapidly. Notable influential algorithms include: Rewrite Attack[<sup>[Andriushchenko2024]</sup>](#Andriushchenko2024a)、PAIR[<sup>[Chao2025]</sup>](#Chao2025)、GCG[<sup>[Zou2023]</sup>](#Zou2023)、AutoDAN[<sup>[Liu2024]</sup>](#Liu2024)、TAP[<sup>[Mehrotra2024]</sup>](#Mehrotra2024)、Overload Attack[<sup>[Dong2024]</sup>](#Dong2024)、ArtPropmt[<sup>[Jiang2024]</sup>](#Jiang2024)、DeepInception[<sup>[Li2023]</sup>](#Li2023)、GPT4-Cipher[<sup>[Yuan2025]</sup>](#Yuan2025)、SCAV[<sup>[Xu2024]</sup>](#Xu2024)、RandomSearch[<sup>[Andriushchenko2024]</sup>](#Andriushchenko2024b)、ICA[<sup>[Wei2023]</sup>](#Wei2023)、Cold Attack[<sup>[Guo2024]</sup>](#Guo2024)、GPTFuzzer[<sup>[Yu2023]</sup>](#Yu2023)、ReNeLLMr[<sup>[Ding2023]</sup>](#Ding2023), among others.
 ## 3. Dataset Construction
@@ -459,7 +459,7 @@ We welcome feedback and contributions from the community!
 * ReNeLLM
 <a name="Ding2023"></a>
-[[Andriushchenko2024](https://arxiv.org/abs/2311.08268)] Ding, Peng, Jun Kuang, Dan Ma, Xuezhi Cao, Yunsen Xian, Jiajun Chen, and Shujian Huang. "A Wolf in Sheep's Clothing: Generalized Nested Jailbreak Prompts can Fool Large Language Models Easily." arXiv preprint arXiv:2311.08268 (2023).
 * Llama Prompt Guard2
@@ -490,8 +490,8 @@ We welcome feedback and contributions from the community!
 * goal prioritization
-<a name="Xie2024"></a>
-[[Xie2024](https://arxiv.org/abs/2311.09096)] Zhang, Zhexin, Junxiao Yang, Pei Ke, Fei Mi, Hongning Wang, and Minlie Huang. "Defending large language models against jailbreaking attacks through goal prioritization." arXiv preprint arXiv:2311.09096 (2023).
 * JailBreakBench

 ### 2.3 Generation Algorithms for Prompt Injection Attacks
+In terms of technical approaches to induce AI systems into generating harmful content, researchers have proposed a variety of prompt injection generation algorithms, which continue to evolve rapidly. Notable influential algorithms include: Rewrite Attack[<sup>[Andriushchenko2024]</sup>](#Andriushchenko2024a)、PAIR[<sup>[Chao2025]</sup>](#Chao2025)、GCG[<sup>[Zou2023]</sup>](#Zou2023)、AutoDAN[<sup>[Liu2024]</sup>](#Liu2024)、TAP[<sup>[Mehrotra2024]</sup>](#Mehrotra2024)、Overload Attack[<sup>[Dong2024]</sup>](#Dong2024)、ArtPropmt[<sup>[Jiang2024]</sup>](#Jiang2024)、DeepInception[<sup>[Li2023]</sup>](#Li2023)、GPT4-Cipher[<sup>[Yuan2025]</sup>](#Yuan2025)、SCAV[<sup>[Xu2024]</sup>](#Xu2024)、RandomSearch[<sup>[Andriushchenko2024]</sup>](#Andriushchenko2024b)、ICA[<sup>[Wei2023]</sup>](#Wei2023)、Cold Attack[<sup>[Guo2024]</sup>](#Guo2024)、GPTFuzzer[<sup>[Yu2023]</sup>](#Yu2023)、ReNeLLM[<sup>[Ding2023]</sup>](#Ding2023), among others.
 ## 3. Dataset Construction
 * ReNeLLM
 <a name="Ding2023"></a>
+[[Ding2023](https://arxiv.org/abs/2311.08268)] Ding, Peng, Jun Kuang, Dan Ma, Xuezhi Cao, Yunsen Xian, Jiajun Chen, and Shujian Huang. "A Wolf in Sheep's Clothing: Generalized Nested Jailbreak Prompts can Fool Large Language Models Easily." arXiv preprint arXiv:2311.08268 (2023).
 * Llama Prompt Guard2
 * goal prioritization
+<a name="Zhang2023"></a>
+[[Zhang2023](https://arxiv.org/abs/2311.09096)] Zhang, Zhexin, Junxiao Yang, Pei Ke, Fei Mi, Hongning Wang, and Minlie Huang. "Defending large language models against jailbreaking attacks through goal prioritization." arXiv preprint arXiv:2311.09096 (2023).
 * JailBreakBench