Papers
arXiv:2511.03498

BanglaSTEM: A Parallel Corpus for Technical Domain Bangla-English Translation

Published on Nov 5
Authors:
,
,
,

Abstract

BanglaSTEM, a dataset of technical Bangla-English sentence pairs, improves translation accuracy for technical content, enabling better use of English-focused language models by Bangla speakers.

AI-generated summary

Large language models work well for technical problem solving in English but perform poorly when the same questions are asked in Bangla. A simple solution would be to translate Bangla questions into English first and then use these models. However, existing Bangla-English translation systems struggle with technical terms. They often mistranslate specialized vocabulary, which changes the meaning of the problem and leads to wrong answers. We present BanglaSTEM, a dataset of 5,000 carefully selected Bangla-English sentence pairs from STEM fields including computer science, mathematics, physics, chemistry, and biology. We generated over 12,000 translations using language models and then used human evaluators to select the highest quality pairs that preserve technical terminology correctly. We train a T5-based translation model on BanglaSTEM and test it on two tasks: generating code and solving math problems. Our results show significant improvements in translation accuracy for technical content, making it easier for Bangla speakers to use English-focused language models effectively. Both the BanglaSTEM dataset and the trained translation model are publicly released at https://huggingface.co/reyazul/BanglaSTEM-T5.

Community

Sign up or log in to comment

Models citing this paper 1

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2511.03498 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2511.03498 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.