Model Card for Model ID

Model Details

Model Description

AWQ quant for nnethercott/llava-v1.5-7b-gpt4OCR-hf. autoawq quantization config in files.

The two datasets used for fine tuning are:

We use 10k samples from GRIT where each sample has an image-caption CLIP similarity larger than 0.35 and where the caption does not contain any proper nouns (filtered using spaCy).

How to Get Started with the Model

Use the code below to get started with the model:

from transformers import (
    AutoProcessor,
)
from awq import AutoAWQForCausalLM
import time 

import requests 
from PIL import Image  
import torch 

awq_model_id = "/home/nathaniel/models/llava/llava-v1.5-7b-ocr-pretrain-hf-AWQ"
processor = AutoProcessor.from_pretrained(awq_model_id)
model = AutoAWQForCausalLM.from_quantized(awq_model_id, safetensors=True, device_map={"": 0}, fuse_layers=False)


image = "https://adquick-public.imgix.net/landing+images/media_formats/billboard-carvana.png?auto=format"
prompt = "USER:<image>/ngenerate a descriptive caption for this image. ASSISTANT: "
image = Image.open(requests.get(image_file, stream=True).raw).convert("RGB")

with torch.no_grad():
    inputs = processor(prompt, image, return_tensors = 'pt').to(0, torch.float16)

    start = time.perf_counter()
    out = model.generate(
        **inputs, 
        **generation_kwargs,
    )
    stop = time.perf_counter()

    print(processor.tokenizer.batch_decode(out[:,len(processor.tokenizer.encode(args.prompt)):], skip_special_tokens = True)[0])
    print(f'generation speed: {round(len(out[0])/(stop-start), 1)} [t/s]')

Output for nnethercott/llava-v1.5-7b-gpt4OCR-hf-AWQ:

The image captures a Carvana billboard under a clear blue sky, showcasing a red sports car being towed by a white Carvana truck. The billboard prominently features the Carvana logo and the slogan "Buy your next car from your couch.

Downloads last month: 29

Safetensors

Model size

1B params

Tensor type

I32

F16

Datasets used to train nnethercott/llava-v1.5-7b-gpt4OCR-hf-AWQ

Collection including nnethercott/llava-v1.5-7b-gpt4OCR-hf-AWQ

llava-v1.5-7b-gpt4OCR

Collection

Collection of checkpoints for llava-v1.5-7b fine-tuned for OCR tasks. • 5 items • Updated Jun 29