AI & ML interests
Natural Language Processing, Signal Processing
Recent Activity
Latxa: An Open Language Model and Evaluation Suite for Basque
			
	
	We present  GoLLIE, a Large Language Model trained to follow annotation guidelines that outperforms previous approaches on zero-shot IE. 
			
	
	Datasets and models for metaphor detection and interpretation via NLI in Spanish and English
			
	
	- 
	
	
	Leveraging a New Spanish Corpus for Multilingual and Crosslingual Metaphor DetectionPaper • 2210.10358 • Published
- 
	
	
	HiTZ/cometaViewer • Updated • 3.63k • 7
- 
	
	
	  HiTZ/xlm-roberta-large-metaphor-detection-esToken Classification • Updated
- 
	
	
	  HiTZ/mdeberta-base-metaphor-detection-esToken Classification • Updated • 7
Does Corpus Quality Really Matter for Low-Resource Languages?
			
	
	Alpaca LoRA MT models and dataset
			
	
	Basque Pretraining Datasets
			
	
	Basque Instruction Datasets
			
	
	OPT reward models
			
	
	An open-source text-to-text multilingual model for the medical domain.
			
	
	A Bilingual Corpus of Basque Parliamentary Transcriptions
			
	
	Basque Speech to Text models
			
	
	- 
	
	
	3Demo Basque ASR🎤Transcribe speech from an audio file 
- 
	
	
	  HiTZ/stt_eu_conformer_ctc_largeAutomatic Speech Recognition • Updated • 27 • 2
- 
	
	
	  HiTZ/stt_eu_conformer_transducer_largeAutomatic Speech Recognition • Updated • 30 • 2
- 
	
	
	Whisper-LM: Improving ASR Models with Language Models for Low-Resource LanguagesPaper • 2503.23542 • Published • 9
Conversational Question Answering in Low Resource Scenarios: A Dataset and Case Study for Basque
			
	
	IXA Submission for the 2024 ODESIA Challenge
			
	
	Instructing Large Language Models for Low-Resource Languages: A Systematic Study for Basque
			
	
	- 
	
	
	Instructing Large Language Models for Low-Resource Languages: A Systematic Study for BasquePaper • 2506.07597 • Published
- 
	
	
	  HiTZ/Latxa-Llama-3.1-8B-InstructText Generation • 8B • Updated • 1.2k • • 10
- 
	
	
	  HiTZ/Latxa-Llama-3.1-70B-InstructText Generation • 71B • Updated • 523 • 4
- 
	
	
	  HiTZ/Latxa-Llama-3.1-70B-Instruct-FP8Text Generation • 71B • Updated • 92 • 1
Truth Knows No Language: Evaluating Truthfulness Beyond English
			
	
	Ask2Transformers models
			
	
	- 
	
	
	Ask2Transformers: Zero-Shot Domain labelling with Pre-trained Language ModelsPaper • 2101.02661 • Published
- 
	
	
	Label Verbalization and Entailment for Effective Zero- and Few-Shot Relation ExtractionPaper • 2109.03659 • Published
- 
	
	
	Textual Entailment for Event Argument Extraction: Zero- and Few-Shot with Multi-Source LearningPaper • 2205.01376 • Published
- 
	
	
	ZS4IE: A toolkit for Zero-Shot Information Extraction with simple VerbalizationsPaper • 2203.13602 • Published • 1
Vision-Language Models Struggle to Align Entities across Modalities
			
	
	Basque Encoders for Representing Natural Textual Diversity
			
	
	On the Role of Morphological Information for Contextual Lemmatization
			
	
	- 
	
	
	On the Role of Morphological Information for Contextual LemmatizationPaper • 2302.00407 • Published
- 
	
	
	  HiTZ/xlm-roberta-large-lemma-euToken Classification • Updated • 315
- 
	
	
	  HiTZ/xlm-roberta-large-lemma-enToken Classification • Updated • 1
- 
	
	
	  HiTZ/xlm-roberta-large-lemma-trToken Classification • Updated • 2
Basque Evaluation Datasets
			
	
	Basque Encoder Language Models
			
	
	- 
	
	
	  ixa-ehu/roberta-eus-euscrawl-large-casedFill-Mask • 0.4B • Updated • 15 • 3
- 
	
	
	  ixa-ehu/roberta-eus-euscrawl-base-casedFill-Mask • Updated • 15 • 2
- 
	
	
	  ixa-ehu/roberta-eus-cc100-base-casedFill-Mask • 0.2B • Updated • 1 • 1
- 
	
	
	  ixa-ehu/roberta-eus-mc4-base-casedFill-Mask • Updated • 6 • 1
State-of-the-art encoder-only models for Spanish. From the paper "Lessons learned from the evaluation of Spanish Language Models" 
			
	
	A Large Negation Benchmark to Challenge Large Language Models
			
	
	Counternarrative Generation in Basque and Spanish
			
	
	Give your Text Representation Models some Love: the Case for Basque
			
	
	Data and models generated within the Antidote Project (https://univ-cotedazur.eu/antidote)
			
	
	- 
	
	
	HiTZ@Antidote: Argumentation-driven Explainable Artificial Intelligence for Digital MedicinePaper • 2306.06029 • Published
- 
	
	
	Medical mT5: An Open-Source Multilingual Text-to-Text LLM for The Medical DomainPaper • 2404.07613 • Published
- 
	
	
	HiTZ/casimedicos-expViewer • Updated • 2.49k • 521 • 3
- 
	
	
	HiTZ/casimedicos-squadPreview • Updated • 10 • 1
XNLIeu: a dataset for cross-lingual NLI in Basque
			
	
	Instructing Large Language Models for Low-Resource Languages: A Systematic Study for Basque
			
	
	- 
	
	
	Instructing Large Language Models for Low-Resource Languages: A Systematic Study for BasquePaper • 2506.07597 • Published
- 
	
	
	  HiTZ/Latxa-Llama-3.1-8B-InstructText Generation • 8B • Updated • 1.2k • • 10
- 
	
	
	  HiTZ/Latxa-Llama-3.1-70B-InstructText Generation • 71B • Updated • 523 • 4
- 
	
	
	  HiTZ/Latxa-Llama-3.1-70B-Instruct-FP8Text Generation • 71B • Updated • 92 • 1
Latxa: An Open Language Model and Evaluation Suite for Basque
			
	
	Truth Knows No Language: Evaluating Truthfulness Beyond English
			
	
	We present  GoLLIE, a Large Language Model trained to follow annotation guidelines that outperforms previous approaches on zero-shot IE. 
			
	
	Ask2Transformers models
			
	
	- 
	
	
	Ask2Transformers: Zero-Shot Domain labelling with Pre-trained Language ModelsPaper • 2101.02661 • Published
- 
	
	
	Label Verbalization and Entailment for Effective Zero- and Few-Shot Relation ExtractionPaper • 2109.03659 • Published
- 
	
	
	Textual Entailment for Event Argument Extraction: Zero- and Few-Shot with Multi-Source LearningPaper • 2205.01376 • Published
- 
	
	
	ZS4IE: A toolkit for Zero-Shot Information Extraction with simple VerbalizationsPaper • 2203.13602 • Published • 1
Datasets and models for metaphor detection and interpretation via NLI in Spanish and English
			
	
	- 
	
	
	Leveraging a New Spanish Corpus for Multilingual and Crosslingual Metaphor DetectionPaper • 2210.10358 • Published
- 
	
	
	HiTZ/cometaViewer • Updated • 3.63k • 7
- 
	
	
	  HiTZ/xlm-roberta-large-metaphor-detection-esToken Classification • Updated
- 
	
	
	  HiTZ/mdeberta-base-metaphor-detection-esToken Classification • Updated • 7
Vision-Language Models Struggle to Align Entities across Modalities
			
	
	Does Corpus Quality Really Matter for Low-Resource Languages?
			
	
	Basque Encoders for Representing Natural Textual Diversity
			
	
	Alpaca LoRA MT models and dataset
			
	
	On the Role of Morphological Information for Contextual Lemmatization
			
	
	- 
	
	
	On the Role of Morphological Information for Contextual LemmatizationPaper • 2302.00407 • Published
- 
	
	
	  HiTZ/xlm-roberta-large-lemma-euToken Classification • Updated • 315
- 
	
	
	  HiTZ/xlm-roberta-large-lemma-enToken Classification • Updated • 1
- 
	
	
	  HiTZ/xlm-roberta-large-lemma-trToken Classification • Updated • 2
Basque Pretraining Datasets
			
	
	Basque Evaluation Datasets
			
	
	Basque Instruction Datasets
			
	
	Basque Encoder Language Models
			
	
	- 
	
	
	  ixa-ehu/roberta-eus-euscrawl-large-casedFill-Mask • 0.4B • Updated • 15 • 3
- 
	
	
	  ixa-ehu/roberta-eus-euscrawl-base-casedFill-Mask • Updated • 15 • 2
- 
	
	
	  ixa-ehu/roberta-eus-cc100-base-casedFill-Mask • 0.2B • Updated • 1 • 1
- 
	
	
	  ixa-ehu/roberta-eus-mc4-base-casedFill-Mask • Updated • 6 • 1
OPT reward models
			
	
	An open-source text-to-text multilingual model for the medical domain.
			
	
	State-of-the-art encoder-only models for Spanish. From the paper "Lessons learned from the evaluation of Spanish Language Models" 
			
	
	A Bilingual Corpus of Basque Parliamentary Transcriptions
			
	
	A Large Negation Benchmark to Challenge Large Language Models
			
	
	Basque Speech to Text models
			
	
	- 
	
	
	3Demo Basque ASR🎤Transcribe speech from an audio file 
- 
	
	
	  HiTZ/stt_eu_conformer_ctc_largeAutomatic Speech Recognition • Updated • 27 • 2
- 
	
	
	  HiTZ/stt_eu_conformer_transducer_largeAutomatic Speech Recognition • Updated • 30 • 2
- 
	
	
	Whisper-LM: Improving ASR Models with Language Models for Low-Resource LanguagesPaper • 2503.23542 • Published • 9
Counternarrative Generation in Basque and Spanish
			
	
	Give your Text Representation Models some Love: the Case for Basque
			
	
	Conversational Question Answering in Low Resource Scenarios: A Dataset and Case Study for Basque
			
	
	Data and models generated within the Antidote Project (https://univ-cotedazur.eu/antidote)
			
	
	- 
	
	
	HiTZ@Antidote: Argumentation-driven Explainable Artificial Intelligence for Digital MedicinePaper • 2306.06029 • Published
- 
	
	
	Medical mT5: An Open-Source Multilingual Text-to-Text LLM for The Medical DomainPaper • 2404.07613 • Published
- 
	
	
	HiTZ/casimedicos-expViewer • Updated • 2.49k • 521 • 3
- 
	
	
	HiTZ/casimedicos-squadPreview • Updated • 10 • 1
XNLIeu: a dataset for cross-lingual NLI in Basque
			
	
	IXA Submission for the 2024 ODESIA Challenge