new

Get trending papers in your email inbox!

Subscribe

Daily Papers

byAK and the research community

Jul 29

DeepHateExplainer: Explainable Hate Speech Detection in Under-resourced Bengali Language

The exponential growths of social media and micro-blogging sites not only provide platforms for empowering freedom of expressions and individual voices, but also enables people to express anti-social behaviour like online harassment, cyberbullying, and hate speech. Numerous works have been proposed to utilize textual data for social and anti-social behaviour analysis, by predicting the contexts mostly for highly-resourced languages like English. However, some languages are under-resourced, e.g., South Asian languages like Bengali, that lack computational resources for accurate natural language processing (NLP). In this paper, we propose an explainable approach for hate speech detection from the under-resourced Bengali language, which we called DeepHateExplainer. Bengali texts are first comprehensively preprocessed, before classifying them into political, personal, geopolitical, and religious hates using a neural ensemble method of transformer-based neural architectures (i.e., monolingual Bangla BERT-base, multilingual BERT-cased/uncased, and XLM-RoBERTa). Important(most and least) terms are then identified using sensitivity analysis and layer-wise relevance propagation(LRP), before providing human-interpretable explanations. Finally, we compute comprehensiveness and sufficiency scores to measure the quality of explanations w.r.t faithfulness. Evaluations against machine learning~(linear and tree-based models) and neural networks (i.e., CNN, Bi-LSTM, and Conv-LSTM with word embeddings) baselines yield F1-scores of 78%, 91%, 89%, and 84%, for political, personal, geopolitical, and religious hates, respectively, outperforming both ML and DNN baselines.

DeepEraser: Deep Iterative Context Mining for Generic Text Eraser

In this work, we present DeepEraser, an effective deep network for generic text removal. DeepEraser utilizes a recurrent architecture that erases the text in an image via iterative operations. Our idea comes from the process of erasing pencil script, where the text area designated for removal is subject to continuous monitoring and the text is attenuated progressively, ensuring a thorough and clean erasure. Technically, at each iteration, an innovative erasing module is deployed, which not only explicitly aggregates the previous erasing progress but also mines additional semantic context to erase the target text. Through iterative refinements, the text regions are progressively replaced with more appropriate content and finally converge to a relatively accurate status. Furthermore, a custom mask generation strategy is introduced to improve the capability of DeepEraser for adaptive text removal, as opposed to indiscriminately removing all the text in an image. Our DeepEraser is notably compact with only 1.4M parameters and trained in an end-to-end manner. To verify its effectiveness, extensive experiments are conducted on several prevalent benchmarks, including SCUT-Syn, SCUT-EnsText, and Oxford Synthetic text dataset. The quantitative and qualitative results demonstrate the effectiveness of our DeepEraser over the state-of-the-art methods, as well as its strong generalization ability in custom mask text removal. The codes and pre-trained models are available at https://github.com/fh2019ustc/DeepEraser