Source: CrisisMMD (Alam et al., 2017)
Data Type: Multimodal β each sample includes:
tweet_text (social media text)
tweet_image (corresponding image from the tweet)
Total Samples Used: ~18,802(from the dataset)
Class Labels:
0 β Non-informative
1 β Informative
Collect only values where tweet_text and tweet_image are equal. (thus collected 12,743 tweets and convert it into test and train .pt files)
β Preprocessing Done Text:
Tokenized using BERT tokenizer (bert-base-uncased)
Extracted input_ids and attention_mask
Image:
Processed using ResNet-50
Extracted 2048-dimensional feature vectors
Label:
Encoded to 0 or 1 as per class
The final preprocessed dataset was saved as .pt files:
train_info.pt
test_info.pt
Each contains: input_ids, attention_mask, image_vector, and label tensors.
β Model Architecture A custom multimodal neural network combining both BERT and ResNet features:
Component Details Text Encoder BERT base model (bert-base-uncased) β outputs pooler_output (768-d) Image Encoder ResNet-50 pre-extracted features (2048-d) Fusion Concatenation β FC layers β Softmax Classifier Fully connected layers with BatchNorm, ReLU, Dropout
β Training Setup Loss Function: CrossEntropyLoss
Optimizer: AdamW
Scheduler: StepLR (Ξ³ = 0.9)
Epochs: 8
Batch Size: 16
Device: CUDA (if available)
β Evaluation Metrics Accuracy
Precision
Recall
F1 Score
β Test Accuracy : 0.8518 β Precision : 0.8289 β Recall : 0.8032 β F1 Score : 0.8142
Newly created dataset: https://huggingface.co/datasets/Henishma/crisisMMD_cleaned_task1