SyntaxMind at BLP-2025 Task 1: Leveraging Attention Fusion of CNN and GRU for Hate Speech Detection

SyntaxMind at BLP-2025 Task 1: Leveraging Attention Fusion of CNN and GRU for Hate Speech Detection
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

This paper describes our system used in the BLP-2025 Task 1: Hate Speech Detection. We participated in Subtask 1A and Subtask 1B, addressing hate speech classification in Bangla text. Our approach employs a unified architecture that integrates BanglaBERT embeddings with multiple parallel processing branches based on GRUs and CNNs, followed by attention and dense layers for final classification. The model is designed to capture both contextual semantics and local linguistic cues, enabling robust performance across subtasks. The proposed system demonstrated high competitiveness, obtaining 0.7345 micro F1-Score (2nd place) in Subtask 1A and 0.7317 micro F1-Score (5th place) in Subtask 1B.


💡 Research Summary

The paper presents the authors’ entry to the BLP‑2025 Task 1 competition, which focuses on hate‑speech detection in Bangla text. Two subtasks were addressed: 1A, a multi‑class classification of hate‑speech types, and 1B, identification of the target group of hateful content. Both subtasks share a dataset of 35,522 annotated instances, heavily imbalanced toward a “None” (non‑hate) class, with several minority categories such as “Sexism” or “Religious Hate” representing only a few hundred examples.

To mitigate language‑specific noise, the authors devised a preprocessing pipeline that removes URLs, lower‑cases Latin characters, converts emojis to Bangla equivalents, merges comma‑separated numbers, normalizes Unicode using NFC and the bnUnicodeNormalizer, and replaces percentage symbols with Bangla terms. This pipeline aims to produce a clean, consistent input for downstream modeling.

The core model uses BanglaBERT to generate contextual token embeddings (max sequence length = 128). These embeddings are fed in parallel to two feature‑extraction branches. The CNN branch applies three parallel convolutions with kernel sizes 1, 2, 3, each with 128 filters, followed by ReLU, adaptive max‑pooling, layer normalization, and a single‑head self‑attention layer to capture salient n‑gram patterns. The Bi‑GRU branch consists of two bidirectional GRU layers (hidden size = 128) with layer normalization and a single‑head self‑attention mechanism to model sequential dependencies.

Outputs from both branches are concatenated in a fusion layer, projected to a 128‑dimensional space, passed through ReLU, layer normalization, and a dropout of 0.3. The final linear classifier maps the fused representation to the required number of classes, producing logits for each label.

Training employed a batch size of 16, learning rate 1e‑5, AdamW optimizer, Cross‑Entropy loss, and gradient clipping. These hyper‑parameters reflect the need for careful fine‑tuning of a large pretrained model on a relatively small, imbalanced dataset.

Results show that the system achieved a micro‑averaged F1 score of 0.7345 (2nd place) on Subtask 1A and 0.7317 (5th place) on Subtask 1B. The narrow gaps to the top‑ranked teams (0.0017 and 0.0039 points respectively) demonstrate the effectiveness of the hybrid CNN‑BiGRU‑Attention architecture in handling both contextual and local linguistic cues. The authors acknowledge that performance on minority classes remains a challenge and suggest future work involving class‑wise weighting, data augmentation, multi‑head attention, or transformer‑based encoders to further improve robustness.

Overall, the paper contributes a well‑engineered, language‑aware pipeline that successfully combines a Bangla‑specific pretrained transformer with parallel convolutional and recurrent pathways, offering a solid baseline for future Bangla hate‑speech detection research.


Comments & Academic Discussion

Loading comments...

Leave a Comment