|
|
--- |
|
|
datasets: |
|
|
- QCRI/CrisisMMD |
|
|
language: |
|
|
- en |
|
|
metrics: |
|
|
- accuracy |
|
|
- f1 |
|
|
- recall |
|
|
- precision |
|
|
base_model: |
|
|
- google-bert/bert-base-uncased |
|
|
- microsoft/resnet-50 |
|
|
--- |
|
|
Source: CrisisMMD dataset (Alam et al., 2017) |
|
|
|
|
|
✅Original Labels (8 classes from annotations): |
|
|
|
|
|
Infrastructure and utility damage |
|
|
|
|
|
Vehicle damage |
|
|
|
|
|
Rescue, volunteering, or donation efforts |
|
|
|
|
|
Affected individuals |
|
|
|
|
|
Injured or dead people |
|
|
|
|
|
Missing or found people |
|
|
|
|
|
Other relevant information |
|
|
|
|
|
Not humanitarian |
|
|
|
|
|
✅Label Preprocessing (Class Merging): |
|
|
|
|
|
Vehicle damage merged into Infrastructure and utility damage |
|
|
|
|
|
Missing or found people merged into Affected individuals |
|
|
|
|
|
Not humanitarian retained as a separate class |
|
|
|
|
|
Removed very low-frequency categories (e.g., "Missing or found people" as a separate class) |
|
|
|
|
|
✅Final Label Set (5 classes total): |
|
|
|
|
|
Infrastructure and utility damage |
|
|
|
|
|
Rescue, volunteering, or donation efforts |
|
|
|
|
|
Affected individuals |
|
|
|
|
|
Injured or dead people |
|
|
|
|
|
Not humanitarian |
|
|
|
|
|
✅Multimodal Consistency: |
|
|
|
|
|
Selected only those posts where text and image annotations matched |
|
|
|
|
|
Resulted in a total of 8,219 consistent samples: |
|
|
|
|
|
Train set: 6,574 posts |
|
|
|
|
|
Test set: 1,644 posts |
|
|
|
|
|
✅ Preprocessing Done |
|
|
Text: |
|
|
|
|
|
Tokenized using BERT tokenizer (bert-base-uncased) |
|
|
|
|
|
Extracted input_ids and attention_mask |
|
|
|
|
|
Image: |
|
|
|
|
|
Processed using ResNet-50 |
|
|
|
|
|
Extracted 2048-dimensional image features |
|
|
|
|
|
The preprocessed data was saved in PyTorch .pt format: |
|
|
|
|
|
train_human.pt and test_human.pt |
|
|
|
|
|
Each contains: input_ids, attention_mask, image_vector, and label |
|
|
|
|
|
✅ Model Architecture |
|
|
A custom multimodal classifier that combines BERT and ResNet-50 outputs: |
|
|
|
|
|
Component Details |
|
|
Text Encoder BERT base (bert-base-uncased) – outputs pooler_output (768-d) |
|
|
Image Encoder Pre-extracted ResNet-50 image features (2048-d) |
|
|
Fusion Concatenation → FC layers → Softmax over 5 classes |
|
|
Classifier Fully connected layers with BatchNorm, ReLU, Dropout |
|
|
|
|
|
✅ Training Setup |
|
|
Loss Function: CrossEntropyLoss |
|
|
|
|
|
Optimizer: AdamW |
|
|
|
|
|
Scheduler: StepLR (γ = 0.9) |
|
|
|
|
|
Epochs Tried: 1, 3, 5, 8, 10 |
|
|
|
|
|
Batch Size: 16 |
|
|
|
|
|
Runtime: ~2 minutes 20 seconds per epoch on Google Colab (T4 GPU) |
|
|
|
|
|
✅ Evaluation Metrics |
|
|
Accuracy |
|
|
|
|
|
Precision |
|
|
|
|
|
Recall |
|
|
|
|
|
F1 Score |
|
|
|
|
|
✅ Metrics(epoch 3 with highest accuracy) |
|
|
|
|
|
✅ Test Accuracy : 0.8820 |
|
|
✅ Precision : 0.6854 |
|
|
✅ Recall : 0.7176 |
|
|
✅ F1 Score : 0.7005 |
|
|
|
|
|
The new dataset created: https://huggingface.co/datasets/Henishma/crisisMMD_cleaned_task2 |
|
|
|
|
|
|