Add pipeline tag and license

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +30 -15
README.md CHANGED
@@ -1,17 +1,32 @@
1
- **Bibtex**
2
- ```bibtex
3
- @misc{sternlicht2025chimeraknowledgebaseidea,
4
- title={CHIMERA: A Knowledge Base of Idea Recombination in Scientific Literature},
5
- author={Noy Sternlicht and Tom Hope},
6
- year={2025},
7
- eprint={2505.20779},
8
- archivePrefix={arXiv},
9
- primaryClass={cs.CL},
10
- url={https://arxiv.org/abs/2505.20779},
11
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
12
  ```
13
 
14
- **Quick Links**
15
- - 🌐 [Project](https://noy-sternlicht.github.io/CHIMERA-Web)
16
- - 📃 [Paper](https://arxiv.org/abs/2505.20779)
17
- - 🛠️ [Code](https://github.com/noy-sternlicht/CHIMERA-KB)
 
1
+ ---
2
+ datasets:
3
+ - togethercomputer/RedPajama-Data-V2
4
+ language:
5
+ - de
6
+ library_name: transformers
7
+ license: other
8
+ pipeline_tag: feature-extraction
9
+ tags:
10
+ - masked-lm
11
+ - long-context
12
+ - modernbert
13
+ ---
14
+
15
+ # ModernGBERT 1B
16
+
17
+ This is a German ModernBERT 1B language model trained from scratch using the ModernBERT [codebase](https://github.com/AnswerDotAI/ModernBERT) and the same German portion of [RedPajama V2](https://huggingface.co/datasets/togethercomputer/RedPajama-Data-V2) as our [LLäMmlein](https://huggingface.co/collections/LSX-UniWue/llammlein-6732ff41f3705c686e605762) family.
18
+ Find more details in our [preprint](https://arxiv.org/abs/2505.13136)!
19
+
20
+ ### Usage
21
+
22
+ ```python
23
+ from transformers import AutoModel, AutoTokenizer
24
+
25
+ model = AutoModel.from_pretrained("LSX-UniWue/ModernGBERT_1B")
26
+
27
+ tokenizer = AutoTokenizer.from_pretrained("LSX-UniWue/ModernGBERT_1B")
28
  ```
29
 
30
+
31
+ ### Performance
32
+ We evaluated our model on the [SuperGLEBer](https://lsx-uniwue.github.io/SuperGLEBer-site/) benchmark.