aisquared
/

dlite-v1-355m

@@ -44,87 +44,47 @@ Just as with any other LLM, we advise users of this technology to exercise good
 ## Usage
-The code below shows how to use `dlite-v1-355m` in the way which it was trained.  While the model can be used "out of the box" using the
-`transformers` library, using the function defined below to create a response from the model will achieve better results.
-### Load Model and Tokenizer from this Repository Using the `transformers` Package
 ```python
-from transformers import AutoModelForCausalLM, AutoTokenizer
-import numpy as np
-import re
-model_id = 'aisquared/dlite-v1-355m'
-tokenizer = AutoTokenizer.from_pretrained(model_id, padding_side = 'left')
-model = AutoModelForCausalLM.from_pretrained(model_id, trust_remote_code = True, device_map = 'auto')
 ```
-### Create the Prompt Format and Other Variables
 ```python
-PROMPT = """Below is an instruction that describes a task. Write a response that appropriately completes the request.
-### Instruction:
-{instruction}
-### Response:
-"""
-END_KEY = '### End'
-RESPONSE_KEY = '### Response:\n'
 ```
-### Create a Function to Retrieve a Response
 ```python
-def create_response(
-        instruction,
-        model,
-        tokenizer,
-        do_sample = True,
-        max_new_tokens = 256,
-        top_p = 0.92,
-        top_k = 0,
-        **kwargs
-):
-    """
-    Create a response from the model by using a formatted prompt
-    """
-    input_ids = tokenizer(
-        PROMPT.format(instruction=instruction), return_tensors="pt"
-    ).input_ids
-    gen_tokens = model.generate(
-        input_ids,
-        pad_token_id=tokenizer.pad_token_id,
-        do_sample=do_sample,
-        max_new_tokens=max_new_tokens,
-        top_p=top_p,
-        top_k=top_k,
-        **kwargs,
-    )
-    decoded = tokenizer.batch_decode(gen_tokens)[0]
-    # The response appears after "### Response:".  The model has been trained to append "### End" at the end.
-    m = re.search(r"#+\s*Response:\s*(.+?)#+\s*End", decoded, flags=re.DOTALL)
-    response = None
-    if m:
-        response = m.group(1).strip()
-    else:
-        # The model might not generate the "### End" sequence before reaching the max tokens.  In this case, return
-        # everything after "### Response:".
-        m = re.search(r"#+\s*Response:\s*(.+)", decoded, flags=re.DOTALL)
-        if m:
-            response = m.group(1).strip()
-        else:
-            pass
-    return response
 ```
 ### Model Performance Metrics
 We present the results from various model benchmarks on the EleutherAI LLM Evaluation Harness for all models in the DLite family.

 ## Usage
+To use the model with the `transformers` library on a machine with GPUs, first make sure you have the `transformers` and `accelerate` libraries installed.
+From your terminal, run:
 ```python
+pip install "accelerate>=0.16.0,<1" "transformers[torch]>=4.28.1,<5" "torch>=1.13.1,<2"
 ```
+The instruction following pipeline can be loaded using the `pipeline` function as shown below.  This loads a custom `InstructionTextGenerationPipeline`
+found in the model repo [here](https://huggingface.co/aisquared/dlite-v1-355m/blob/main/instruct_pipeline.py), which is why `trust_remote_code=True` is required.
+Including `torch_dtype=torch.bfloat16` is generally recommended if this type is supported in order to reduce memory usage.  It does not appear to impact output quality.
+It is also fine to remove it if there is sufficient memory.
 ```python
+from transformers import pipeline
+import torch
+generate_text = pipeline(model="aisquared/dlite-v1-355m", torch_dtype=torch.bfloat16, trust_remote_code=True, device_map="auto")
+```
+You can then use the pipeline to answer instructions:
+```python
+res = generate_text("Who was George Washington?")
+print(res[0]["generated_text"])
 ```
+Alternatively, if you prefer to not use `trust_remote_code=True` you can download [instruct_pipeline.py](https://huggingface.co/aisquared/dlite-v1-355m/blob/main/instruct_pipeline.py),
+store it alongside your notebook, and construct the pipeline yourself from the loaded model and tokenizer:
 ```python
+from instruct_pipeline import InstructionTextGenerationPipeline
+from transformers import AutoModelForCausalLM, AutoTokenizer
+import torch
+tokenizer = AutoTokenizer.from_pretrained("aisquared/dlite-v1-355m", padding_side="left")
+model = AutoModelForCausalLM.from_pretrained("aisquared/dlite-v1-355m", device_map="auto", torch_dtype=torch.bfloat16)
+generate_text = InstructionTextGenerationPipeline(model=model, tokenizer=tokenizer)
 ```
 ### Model Performance Metrics
 We present the results from various model benchmarks on the EleutherAI LLM Evaluation Harness for all models in the DLite family.