arxiv:2309.14568

Introducing DictaLM -- A Large Generative Language Model for Modern Hebrew

Published on Sep 25, 2023

Authors:

Shaltiel Shmidman ,

Abstract

DictaLM and DictaLM-Rab are large-scale language models for Modern Hebrew and Rabbinic/Historical Hebrew, respectively, designed for various Hebrew-specific tasks and released under a Creative Commons license.

AI-generated summary

We present DictaLM, a large-scale language model tailored for Modern Hebrew. Boasting 7B parameters, this model is predominantly trained on Hebrew-centric data. As a commitment to promoting research and development in the Hebrew language, we release both the foundation model and the instruct-tuned model under a Creative Commons license. Concurrently, we introduce DictaLM-Rab, another foundation model geared towards Rabbinic/Historical Hebrew. These foundation models serve as ideal starting points for fine-tuning various Hebrew-specific tasks, such as instruction, Q&A, sentiment analysis, and more. This release represents a preliminary step, offering an initial Hebrew LLM model for the Hebrew NLP community to experiment with.