FinchResearch
/

llama2-stable-7b-lora

Question Answering

Model card Files Files and versions

Marcus Cedric R. Idia commited on Jul 27, 2023

Commit

bb8ceb9

·

1 Parent(s): 51d6404

Update README.md

Files changed (1) hide show

README.md +79 -13

README.md CHANGED Viewed

@@ -9,20 +9,86 @@ language:
 - en
 pipeline_tag: question-answering
 ---
-## Training procedure
-The following `bitsandbytes` quantization config was used during training:
-- load_in_8bit: False
-- load_in_4bit: True
-- llm_int8_threshold: 6.0
-- llm_int8_skip_modules: None
-- llm_int8_enable_fp32_cpu_offload: False
-- llm_int8_has_fp16_weight: False
-- bnb_4bit_quant_type: nf4
-- bnb_4bit_use_double_quant: False
-- bnb_4bit_compute_dtype: float16
-### Framework versions
-- PEFT 0.5.0.dev0

 - en
 pipeline_tag: question-answering
 ---
+ Here is a README.md explaining how to run the Archimedes model locally:
+# Archimedes Model
+This README provides instructions for running the Archimedes conversational AI assistant locally.
+## Requirements
+- Python 3.6+
+- [Transformers](https://huggingface.co/docs/transformers/installation)
+- [Peft](https://github.com/hazyresearch/peft)
+- PyTorch
+- Access to the LLAMA 2 model files or a cloned public model
+Install requirements:
+```
+!pip install huggingface
+!pip install -q -U trl transformers accelerate git+https://github.com/huggingface/peft.git
+!pip install -q datasets bitsandbytes einops wandb
+```
+## Usage
+```python
+import transformers
+from peft import LoraConfig, get_peft_model
+import torch
+from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
+# Load LLAMA 2 model
+model_name = "meta-llama/Llama-2-13b-chat-hf"
+# Quantization configuration
+bnb_config = BitsAndBytesConfig(
+    load_in_4bit=True,
+    bnb_4bit_quant_type="nf4",
+    bnb_4bit_compute_dtype=torch.float16,
+)
+# Load model
+model = AutoModelForCausalLM.from_pretrained(
+    model_name,
+    quantization_config=bnb_config,
+    trust_remote_code=True
+)
+# Load LoRA configuration
+lora_config = LoraConfig.from_pretrained('harpyerr/archimedes-300s-7b-chat')
+model = get_peft_model(model, lora_config)
+# Load tokenizer
+tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
+tokenizer.pad_token = tokenizer.eos_token
+# Define prompt
+text = "Can you tell me who made Space-X?"
+prompt = "You are a helpful assistant. Please provide an informative response. \n\n" + text
+# Generate response
+device = "cuda:0"
+inputs = tokenizer(prompt, return_tensors="pt").to(device)
+outputs = model.generate(**inputs, max_new_tokens=100)
+print(tokenizer.decode(outputs[0], skip_special_tokens=True))
+```
+This loads the LLAMA 2 model, applies 4-bit quantization and LoRA optimizations, constructs a prompt, and generates a response.
+See the [docs](https://huggingface.co/docs/transformers/model_doc/auto#transformers.AutoModelForCausalLM) for more details.
+## Training
+The model was trained by Anthropic using self-supervised learning. See the [model card](https://huggingface.co/USERNAME/archimedes) for details.
+## License
+Archimedes is released under the Apache 2.0 license.
+## Citation
+Coming soon!
+Please ⭐ if this repository was helpful!