6 12 53

Jason Corkill

jasoncorkill

https://rapidata.ai

AI & ML interests

Human data annotation

Recent Activity

liked a dataset 2 days ago

Rapidata/text-2-video-human-preferences-genmo-mochi-1

updated a dataset 2 days ago

Rapidata/text-2-video-human-preferences-veo3

replied to their post 12 days ago

"Why did the bee get married?" "Because he found his honey!" This was the "funniest" joke out of 10'000 jokes we generated with LLMs. With 68% of respondents rating it as "funny". Original jokes are particularly hard for LLMs, as jokes are very nuanced and a lot of context is needed to understand if something is "funny". Something that can only reliably be measured using humans. LLMs are not equally good at generating jokes in every language. Generated English jokes turned out to be way funnier than the Japanese ones. 46% of English-speaking voters on average found the generated joke funny. The same statistic for other languages: Vietnamese: 44% Portuguese: 40% Arabic: 37% Japanese: 28% There is not much variance in generation quality among models for any fixed language. But still Claude Sonnet 4 slightly outperforms others in Vietnamese, Arabic and Japanese and Gemini 2.5 Flash in Portuguese and English We have release the 1 Million (!) native speaker ratings and the 10'000 jokes as a dataset for anyone to use: https://huggingface.co/datasets/Rapidata/multilingual-llm-jokes-4o-claude-gemini

View all activity

Organizations

liked a dataset 2 days ago

Rapidata/text-2-video-human-preferences-genmo-mochi-1

Viewer • Updated 2 days ago • 1.1k • 80 • 8

updated a dataset 2 days ago

Rapidata/text-2-video-human-preferences-veo3

Viewer • Updated 2 days ago • 1.02k • 339 • 18

replied to their post 12 days ago

Perhaps we can provide a couple of thousand human annotations

replied to their post 12 days ago

Interesting, what kind of data are you collecting?

replied to their post 14 days ago

Funny, we also noticed that these models will almost always revert to the Question - Answer Style Joke if not prompted otherwise.

reacted to their post with 👀 15 days ago

Post

3202

"Why did the bee get married?"

"Because he found his honey!"

This was the "funniest" joke out of 10'000 jokes we generated with LLMs. With 68% of respondents rating it as "funny".

Original jokes are particularly hard for LLMs, as jokes are very nuanced and a lot of context is needed to understand if something is "funny". Something that can only reliably be measured using humans.

LLMs are not equally good at generating jokes in every language. Generated English jokes turned out to be way funnier than the Japanese ones. 46% of English-speaking voters on average found the generated joke funny. The same statistic for other languages:

Vietnamese: 44%
Portuguese: 40%
Arabic: 37%
Japanese: 28%

There is not much variance in generation quality among models for any fixed language. But still Claude Sonnet 4 slightly outperforms others in Vietnamese, Arabic and Japanese and Gemini 2.5 Flash in Portuguese and English

We have release the 1 Million (!) native speaker ratings and the 10'000 jokes as a dataset for anyone to use:
Rapidata/multilingual-llm-jokes-4o-claude-gemini

7 replies

posted an update 15 days ago

Post

3202

"Why did the bee get married?"

"Because he found his honey!"

This was the "funniest" joke out of 10'000 jokes we generated with LLMs. With 68% of respondents rating it as "funny".

Original jokes are particularly hard for LLMs, as jokes are very nuanced and a lot of context is needed to understand if something is "funny". Something that can only reliably be measured using humans.

LLMs are not equally good at generating jokes in every language. Generated English jokes turned out to be way funnier than the Japanese ones. 46% of English-speaking voters on average found the generated joke funny. The same statistic for other languages:

Vietnamese: 44%
Portuguese: 40%
Arabic: 37%
Japanese: 28%

There is not much variance in generation quality among models for any fixed language. But still Claude Sonnet 4 slightly outperforms others in Vietnamese, Arabic and Japanese and Gemini 2.5 Flash in Portuguese and English

We have release the 1 Million (!) native speaker ratings and the 10'000 jokes as a dataset for anyone to use:
Rapidata/multilingual-llm-jokes-4o-claude-gemini

7 replies

liked a dataset 16 days ago

Rapidata/multilingual-llm-jokes-4o-claude-gemini

Viewer • Updated 26 days ago • 9.98k • 361 • 11

reacted to their post with 🧠❤️🚀 about 2 months ago

Post

2427

Imagine you could have an Image Arena score equivalent at each checkpoint during training. We released the first version of just that:
Crowd-Eval

Add one line of code to your training loop and you will have a new real human loss curve in your W&B dashboard.

Thousands of real humans from around the world rating your model in real time at the cost of a few dollars per checkpoint is a game changer.

Check it out here: https://github.com/RapidataAI/crowd-eval

First 5 people to put it in their loop get 100'000 human responses for free! (ping me)

posted an update about 2 months ago

Post

2427

New activity in Rapidata/text-2-video-human-preferences-veo3 about 2 months ago

Update README.md

#2 opened 2 months ago by

Winkinglilyox

replied to their post 2 months ago

Good catch :) yes, we uploaded them shortly after!

liked 3 datasets 2 months ago

replied to their post 2 months ago

Hey Jackson, can you please elaborate?

reacted to their post with 👍👀 2 months ago

Post

3900

Benchmark Update: @google Veo3 (Text-to-Video)

Two months ago, we benchmarked @google ’s Veo2 model. It fell short, struggling with style consistency and temporal coherence, trailing behind Runway, Pika, @tencent , and even @alibaba-pai .

That’s changed.

We just wrapped up benchmarking Veo3, and the improvements are substantial. It outperformed every other model by a wide margin across all key metrics. Not just better, dominating across style, coherence, and prompt adherence. It's rare to see such a clear lead in today’s hyper-competitive T2V landscape.

Dataset coming soon. Stay tuned.

5 replies

Jason Corkill

AI & ML interests

Recent Activity

Organizations

jasoncorkill's activity

Update README.md