t5-3b-samsum-deepspeed
This model was trained using Microsoft's AzureML and DeepSpeed's ZeRO 2 optimization. It was fine-tuned on the SAMSum corpus from t5-3b checkpoint.
More information on the fine-tuning process (includes samples and benchmarks):
(currently still WIP, updates coming soon: 7/6/21~7/9/21)
Resource Usage
These results are retrieved from AzureML Studio's resource monitoring module. All experiments were ran on AzureML's low priority clusters.
| key | value |
|---|---|
| AzureML SKU | ND40rs_v2 (8 X V100 32GB) |
| Region | US West 2 |
| Run Duration | 43m 51.05s |
| Compute Cost (LowPriority/Dedicated) | $3.22/$16.10 (USD) |
| Average CPU Utilization | 46.0% |
| Average GPU Utilization | 56.9% |
| GPU Memory Usage (Avg/Peak) | 26.77/30.49 (GB) |
| Total GPU Energy Usage | 2448.69 (kJ) |
*Compute cost is calculated from run duration and SKU's price per hour. Updated SKU pricing could be found here: https://azure.microsoft.com/en-us/pricing/details/machine-learning/
*Peak memory usage is calculated from average peak across all utilized GPUs.
Carbon Emissions
These results are obtained using codecarbon. The carbon emission is estimated from training runtime only (excluding setup and evaluation runtime).
CodeCarbon: https://github.com/mlco2/codecarbon
| key | value |
|---|---|
| timestamp | 2021-07-06T21:57:39 |
| duration | 1841.4621863365173 |
| emissions | 0.17802492531467784 |
| energy_consumed | 0.5982020339874927 |
| country_name | USA |
| region | Washington |
| cloud_provider | azure |
| cloud_region | westus2 |
Hyperparameters
fp16: True
per device batch size: 2
effective batch size: 16
epoch: 3.0
learning rate: 3e-5
weight decay: 0.0
seed: 1
*Same per device batch size for evaluations
DeepSpeed
Optimizer = AdamW, Scheduler = WarmupDecayLR, Offload = none
"zero_optimization": {
"stage": 2,
"allgather_partitions": true,
"allgather_bucket_size": 1000000000,
"overlap_comm": true,
"reduce_scatter": true,
"reduce_bucket_size": 1000000000,
"contiguous_gradients": true
}
Usage
from transformers import pipeline
summarizer = pipeline("summarization", model="henryu-lin/t5-3b-samsum-deepspeed")
conversation = '''Henry: Hey, is Nate coming over to watch the movie tonight?
Kevin: Yea, he said he'll be arriving a bit later at around 7 since he gets off of work at 6. Have you taken out the garbage yet? It's starting to make the kitchen really smell.
Henry: Oh I forgot. I'll do that once I'm finished with my assignment for my math class.
Kevin: Yea, you should take it out as soon as possible. And also, Nate is bringing his girlfriend too.
Henry: Nice, I'm really looking forward to seeing them again.
'''
summarizer(conversation)
Results
| ROUGE | Score |
|---|---|
| eval_rouge1 | 54.7875 |
| eval_rouge2 | 30.565 |
| eval_rougeL | 45.7625 |
| eval_rougeLsum | 50.3915 |
| predict_rouge1 | 53.6628 |
| predict_rouge2 | 29.0196 |
| predict_rougeL | 45.1257 |
| predict_rougeLsum | 49.171 |
| Metric | Value |
|---|---|
| eval_gen_len | 25.3399 |
| predict_gen_len | 24.9133 |
| train_loss | 1.1206104169494209 |
| eval_loss | 1.0732421875 |
| predict_loss | 1.087890625 |
| train_runtime | 1841.3751 |
| train_samples | 14732 |
| train_samples_per_second | 24.002 |
| train_steps_per_second | 1.501 |
| eval_runtime | 163.8357 |
| eval_samples | 818 |
| eval_samples_per_second | 4.993 |
| eval_steps_per_second | 0.317 |
| predict_runtime | 168.8245 |
| predict_samples | 819 |
| predict_samples_per_second | 4.851 |
| predict_steps_per_second | 0.308 |
| total_steps | 2763 |
| total_flos | 1.84452086400811e+17 |
- Downloads last month
- -