Degreaded point performance on some examples compared to the playground

by AndyB12 - opened Sep 22

Sep 22

•

When running this code with the below image in the playground you get very good results, with the detection of the majority of the spectators in the image.

In contrast, running this code on the same image finds only a single spectator. Does anyone know why? What is the difference between the playground and the model from huggingface?

import torch
import numpy as np
from PIL import Image
from transformers import AutoModelForCausalLM

moondream_model = AutoModelForCausalLM.from_pretrained(
    "moondream/moondream3-preview", trust_remote_code=True, torch_dtype=torch.float32, device_map={"": "cuda"}
)

image = Image.open("baseball.png")
result = moondream_model.point(image, "spectator")

for i, point in enumerate(result["points"]):
    print(f"Point {i+1}: x={point['x']:.3f}, y={point['y']:.3f}")

vikhyatk

moondream org Sep 28

One thing that comes to mind is that the playground runs the model in bfloat16 -- since it was trained in that precision it's possible running in float32 causes issues?

zgh1997

Oct 13

Me too. The performance on their website is much better than when I download the model and test it locally.

JCX1999

17 days ago

One thing that comes to mind is that the playground runs the model in bfloat16 -- since it was trained in that precision it's possible running in float32 causes issues?

I run the model in torch.bfloat16 but still got different results compared to the results in the playground.

JCX1999

17 days ago

Me too. The performance on their website is much better than when I download the model and test it locally.

Hi, have you found a solution to solve this issue?

err805

moondream org 16 days ago

Can you share the image and your results from running it locally?

JCX1999

15 days ago

Can you share the image and your results from running it locally?

Thanks for your reply. The image is attached, and the text for point query is "all floor areas"

The results from remote are:

The code I use locally is:

from transformers import AutoModelForCausalLM
from PIL import Image
import torch
import time
import matplotlib.pyplot as plt

if __name__ == '__main__':
    model = AutoModelForCausalLM.from_pretrained(
        "moondream/moondream3-preview",
        trust_remote_code=True,
        dtype=torch.bfloat16,
        device_map="cuda"
    )
    model.eval()
    model.compile()
    with torch.inference_mode():
        image = Image.open("/media/jcx/SSD_2T/NeRF_Dataset/Replica/room0/rgb/frame000000.jpg")
        encoded_image = model.encode_image(image)

        result = model.point(encoded_image, "all floor areas")
        points = result["points"]
        print(f"Found {len(points)} all floor areas")

        # Visualize the points
        plt.figure(figsize=(10, 10))
        plt.imshow(image)

        for point in points:
            # Convert normalized coordinates to pixel values
            x = point["x"] * image.width
            y = point["y"] * image.height

            # Plot the point
            plt.plot(x, y, 'ro', markersize=15, alpha=0.7)
            plt.text(
                x + 10, y, "Face",
                color='white', fontsize=12,
                bbox=dict(facecolor='red', alpha=0.5)
            )

        plt.axis('off')
        plt.savefig("output_with_points.jpg")
        plt.show()

And the results from local code is:

vweissen

14 days ago

Would also be very interested to know why performance differs so much between playground and local deployment, because the playground version rund really well.

err805

moondream org 14 days ago

@JCX1999 Thanks for sharing your code. The Cloud API (and by extension the playground) uses our new inference engine (Kestrel), which could be contributing to a slight change in outputs. Has the playground been consistently better, or is it just this example?

Aside: can you give me some context as to your use case?

Kohulan

13 days ago

@err805 When we tested out as well. We found the playground to be consistently better. The local model and playground are completely far apart when reproducing the results using the same prompt.

JCX1999

13 days ago

@JCX1999 Thanks for sharing your code. The Cloud API (and by extension the playground) uses our new inference engine (Kestrel), which could be contributing to a slight change in outputs. Has the playground been consistently better, or is it just this example?

Aside: can you give me some context as to your use case?

@err805 Thank you for your reply. I’ve found the Playground consistently better. I’m using Moondream for semantic labeling and was wondering if it’s possible to run the model locally with the new Kestrel inference engine.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment