You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

To help us better understand how the model is being used and by whom, we ask you to provide some basic information. This will support future improvements and help ensure the model continues to meet the needs of its user community. Please note: this model is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) license.

Log in or Sign Up to review the conditions and access this model content.

data2vec-HAT-1.4K-base

This repository provides access to a data2vec1-Base model for Haitian Creole (hat).

Model

Model and data description

The model was pretrained on the following data sets:

The pre-processing scripts are located here : https://gin.g-node.org/CREAM/SSL-Haitian/ The original fairseq models where converted to HuggingFace format using the following code https://github.com/LLL-Orleans/convert_data2vec_to_hf The original fairseq model is also available, enabling continued pre-training or fine-tuning using this framework.

For more details, see the paper.

Intended uses & limitations

This model is distributed under the Creative Commons Attribution Non Commercial Share Alike 4.0 license.

This is a gated model. Access will be given on a per-user basis, pending formal approval by CREAM PI Pr. Emmanuel Schang.

Acknowledgments

The authors acknowledge the support of the French Agence Nationale de la Recherche (ANR), under grant ANR-20-CE38-0006 (project CREAM). Experiments were conducted using Grid'5000, developed under INRIA ALADDIN with support from CNRS, RENATER, and various universities (see https://www.grid5000.fr). Additional resources include the CaSciModOT cluster (https://cascimodot.fr/) at Centre de Calcul Scientifique en région Centre-Val de Loire and HPC resources from IDRIS provided by GENCI (allocation 2024-AD011014940).

Referencing this model

@inproceedings{havard-et-al-taln25,
    author = "Havard, William N. and Govain, Renauld and Lecouteux, Benjamin and Schang, Emmanuel",
    title = "Mod\`eles auto-supervis\'es de traitement de la parole pour le Cr\'eole Haitien",
    booktitle = "Actes de CORIA-TALN-RJCRI-RECITAL 2025. Actes des 32\`eme Conf\'erence sur le Traitement Automatique des Langues Naturelles (TALN),  volume 1 : articles scientifiques originaux",
    month = "6",
    year = "2025",
    address = "Marseille, France",
    publisher = "Association pour le Traitement Automatique des Langues",
    pages = "543-555",
    note = "",
    url = "https://talnarchives.atala.org/TALN/TALN-2025/98.pdf"
}
Downloads last month
-
Safetensors
Model size
93.2M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support