Things I've translated: courses, blog posts, guides. More on my personal blog (https://lbourdois.github.io/blog/).
Loïck BOURDOIS
lbourdois
AI & ML interests
👀
Recent Activity
updated
a Space
1 day ago
lbourdois/Free_online_AI_courses_in_French
updated
a dataset
7 days ago
Bretagne/Lingua_Libre
new activity
9 days ago
agents-course/notebooks:Add French notebooks
Organizations
French prompts
French prompts dataset developped when I worked at CATIE (https://hf.co/CATIE-AQ). Over 30,000 downloads.
French QA
QA models & datasets developped when I worked at CATIE (https://hf.co/CATIE-AQ). Over 150,000 downloads.
French caption datasets
Datasets I cleaned with an image, a prompt question (like "describe this image") and an answer.
Can be used to train VLMs.
French retriever datasets
Datasets I cleaned with an image and a question.
Can be used to train visual retrievers (ColPali and co.).
-
CATIE-AQ/retriever-vidore-vdsid_french-clean
Viewer • Updated • 5k • 50 -
CATIE-AQ/retriever-vidore-tabfquad_test_subsampled-clean
Viewer • Updated • 280 • 36 -
CATIE-AQ/retriever-manu-tabfquad_retrieving-clean
Viewer • Updated • 1.83k • 41 -
CATIE-AQ/retriever-princeton-nlp-CharXiv-clean
Viewer • Updated • 1.32k • 42
FAT5
Flash Attention T5 (FAT5) models developped when I worked at CATIE (https://hf.co/CATIE-AQ).
French NER
NER models & datasets developped when I worked at CATIE (https://hf.co/CATIE-AQ). Over 170,000 downloads.
-
CATIE-AQ/NERmembert-base-3entities
Token Classification • 0.1B • Updated • 30 • 2 -
CATIE-AQ/NERmembert-large-3entities
Token Classification • 0.3B • Updated • 2.43k • 2 -
CATIE-AQ/frenchNER_3entities
Viewer • Updated • 425k • 77 • 1 -
CATIE-AQ/NERmembert-base-4entities
Token Classification • 0.1B • Updated • 12 • 2
French VQA datasets
VQA datasets I cleaned with an image, a question and an answer.
Can be used to train VLMs.
French OCR datasets
Datasets I cleaned with an image, a prompt question (like "transcribe the text in this image") and an answer.
Can be used to train VLMs.
French audio datasets (pretraining)
Around 117K hours of audio in French for research purpose
French Translations
Things I've translated: courses, blog posts, guides. More on my personal blog (https://lbourdois.github.io/blog/).
FAT5
Flash Attention T5 (FAT5) models developped when I worked at CATIE (https://hf.co/CATIE-AQ).
French prompts
French prompts dataset developped when I worked at CATIE (https://hf.co/CATIE-AQ). Over 30,000 downloads.
French NER
NER models & datasets developped when I worked at CATIE (https://hf.co/CATIE-AQ). Over 170,000 downloads.
-
CATIE-AQ/NERmembert-base-3entities
Token Classification • 0.1B • Updated • 30 • 2 -
CATIE-AQ/NERmembert-large-3entities
Token Classification • 0.3B • Updated • 2.43k • 2 -
CATIE-AQ/frenchNER_3entities
Viewer • Updated • 425k • 77 • 1 -
CATIE-AQ/NERmembert-base-4entities
Token Classification • 0.1B • Updated • 12 • 2
French QA
QA models & datasets developped when I worked at CATIE (https://hf.co/CATIE-AQ). Over 150,000 downloads.
French VQA datasets
VQA datasets I cleaned with an image, a question and an answer.
Can be used to train VLMs.
French caption datasets
Datasets I cleaned with an image, a prompt question (like "describe this image") and an answer.
Can be used to train VLMs.
French OCR datasets
Datasets I cleaned with an image, a prompt question (like "transcribe the text in this image") and an answer.
Can be used to train VLMs.
French retriever datasets
Datasets I cleaned with an image and a question.
Can be used to train visual retrievers (ColPali and co.).
-
CATIE-AQ/retriever-vidore-vdsid_french-clean
Viewer • Updated • 5k • 50 -
CATIE-AQ/retriever-vidore-tabfquad_test_subsampled-clean
Viewer • Updated • 280 • 36 -
CATIE-AQ/retriever-manu-tabfquad_retrieving-clean
Viewer • Updated • 1.83k • 41 -
CATIE-AQ/retriever-princeton-nlp-CharXiv-clean
Viewer • Updated • 1.32k • 42
French audio datasets (pretraining)
Around 117K hours of audio in French for research purpose