File size: 2,403 Bytes
157bea6
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
28917ee
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
---
datasets:
- QCRI/CrisisMMD
language:
- en
metrics:
- accuracy
- f1
- recall
- precision
base_model:
- google-bert/bert-base-uncased
- microsoft/resnet-50
---
Source: CrisisMMD dataset (Alam et al., 2017)

✅Original Labels (8 classes from annotations):

Infrastructure and utility damage

Vehicle damage

Rescue, volunteering, or donation efforts

Affected individuals

Injured or dead people

Missing or found people

Other relevant information

Not humanitarian

✅Label Preprocessing (Class Merging):

Vehicle damage merged into Infrastructure and utility damage

Missing or found people merged into Affected individuals

Not humanitarian retained as a separate class

Removed very low-frequency categories (e.g., "Missing or found people" as a separate class)

✅Final Label Set (5 classes total):

Infrastructure and utility damage

Rescue, volunteering, or donation efforts

Affected individuals

Injured or dead people

Not humanitarian

✅Multimodal Consistency:

Selected only those posts where text and image annotations matched

Resulted in a total of 8,219 consistent samples:

Train set: 6,574 posts

Test set: 1,644 posts

✅ Preprocessing Done
Text:

Tokenized using BERT tokenizer (bert-base-uncased)

Extracted input_ids and attention_mask

Image:

Processed using ResNet-50

Extracted 2048-dimensional image features

The preprocessed data was saved in PyTorch .pt format:

train_human.pt and test_human.pt

Each contains: input_ids, attention_mask, image_vector, and label

✅ Model Architecture
A custom multimodal classifier that combines BERT and ResNet-50 outputs:

Component	Details
Text Encoder	BERT base (bert-base-uncased) – outputs pooler_output (768-d)
Image Encoder	Pre-extracted ResNet-50 image features (2048-d)
Fusion	Concatenation → FC layers → Softmax over 5 classes
Classifier	Fully connected layers with BatchNorm, ReLU, Dropout

✅ Training Setup
Loss Function: CrossEntropyLoss

Optimizer: AdamW

Scheduler: StepLR (γ = 0.9)

Epochs Tried: 1, 3, 5, 8, 10

Batch Size: 16

Runtime: ~2 minutes 20 seconds per epoch on Google Colab (T4 GPU)

✅ Evaluation Metrics
Accuracy

Precision

Recall

F1 Score

✅ Metrics(epoch 3 with highest accuracy)
 
✅ Test Accuracy : 0.8820
✅ Precision     : 0.6854
✅ Recall        : 0.7176
✅ F1 Score      : 0.7005

The new dataset created: https://huggingface.co/datasets/Henishma/crisisMMD_cleaned_task2