KomeijiForce
/

t5-base-emojilm

Text Generation

text2text-generation

text-generation-inference

Model card Files Files and versions

KomeijiForce commited on Nov 10, 2023

Commit

d023078

·

1 Parent(s): b657463

Update README.md

Files changed (1) hide show

README.md +54 -48

README.md CHANGED Viewed

@@ -1,50 +1,56 @@
 ---
-tags:
-- generated_from_trainer
-model-index:
-- name: t5-base-emolm
-  results: []
----
-<!-- This model card has been generated automatically according to the information the Trainer had access to. You
-should probably proofread and complete it, then remove this comment. -->
-# t5-base-emolm
-This model is a fine-tuned version of [saved_models/t5-base](https://huggingface.co/saved_models/t5-base) on an unknown dataset.
-## Model description
-More information needed
-## Intended uses & limitations
-More information needed
-## Training and evaluation data
-More information needed
-## Training procedure
-### Training hyperparameters
-The following hyperparameters were used during training:
-- learning_rate: 0.0003
-- train_batch_size: 32
-- eval_batch_size: 128
-- seed: 42
-- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
-- lr_scheduler_type: linear
-- num_epochs: 2.0
-### Training results
-### Framework versions
-- Transformers 4.29.2
-- Pytorch 2.0.0+cu117
-- Datasets 2.12.0
-- Tokenizers 0.12.1

+datasets:
+  - KomeijiForce/Text2Emoji
+language:
+  - en
+metrics:
+  - bertscore
+pipeline_tag: text2text-generation
 ---
+# EmojiLM
+This is a [T5](https://huggingface.co/t5-base) model pre-trained on the [Text2Emoji](https://huggingface.co/datasets/KomeijiForce/Text2Emoji) dataset to translate setences into series of emojis.
+For instance, "I love pizza" will be translated into "🍕😍".
+An example implementation for translation:
+```python
+from transformers import T5Tokenizer, T5ForConditionalGeneration
+path = "saved_models/t5-base-emolm"
+tokenizer = T5Tokenizer.from_pretrained(path)
+generator = T5ForConditionalGeneration.from_pretrained(path)
+prefix = "translate into emojis:"
+sentence = "I travel to enjoy the taste of sushi!"
+inputs = tokenizer(prefix+" "+sentence, return_tensors="pt")
+generated_ids = generator.generate(inputs["input_ids"], num_beams=4, do_sample=True, max_length=100)
+decoded = tokenizer.decode(generated_ids[0], skip_special_tokens=True).replace(" ", "")
+print(decoded)
+```
+You will probably get some output like "🇯🇵🍣🍱😋".
+If you find this model & dataset resource useful, please consider cite our paper:
+```
+@article{DBLP:journals/corr/abs-2311-01751,
+  author       = {Letian Peng and
+                  Zilong Wang and
+                  Hang Liu and
+                  Zihan Wang and
+                  Jingbo Shang},
+  title        = {EmojiLM: Modeling the New Emoji Language},
+  journal      = {CoRR},
+  volume       = {abs/2311.01751},
+  year         = {2023},
+  url          = {https://doi.org/10.48550/arXiv.2311.01751},
+  doi          = {10.48550/ARXIV.2311.01751},
+  eprinttype    = {arXiv},
+  eprint       = {2311.01751},
+  timestamp    = {Tue, 07 Nov 2023 18:17:14 +0100},
+  biburl       = {https://dblp.org/rec/journals/corr/abs-2311-01751.bib},
+  bibsource    = {dblp computer science bibliography, https://dblp.org}
+}
+```