noystl
/

mistral-base-model

Add pipeline tag and license

by nielsr HF Staff - opened Jun 3

←

Files changed (1) hide show

README.md CHANGED Viewed

@@ -1,17 +1,32 @@
-**Bibtex**
-```bibtex
-@misc{sternlicht2025chimeraknowledgebaseidea,
-      title={CHIMERA: A Knowledge Base of Idea Recombination in Scientific Literature},
-      author={Noy Sternlicht and Tom Hope},
-      year={2025},
-      eprint={2505.20779},
-      archivePrefix={arXiv},
-      primaryClass={cs.CL},
-      url={https://arxiv.org/abs/2505.20779},
-}
 ```
-**Quick Links**
-- 🌐 [Project](https://noy-sternlicht.github.io/CHIMERA-Web)
-- 📃 [Paper](https://arxiv.org/abs/2505.20779)
-- 🛠️ [Code](https://github.com/noy-sternlicht/CHIMERA-KB)

+---
+datasets:
+- togethercomputer/RedPajama-Data-V2
+language:
+- de
+library_name: transformers
+license: other
+pipeline_tag: feature-extraction
+tags:
+- masked-lm
+- long-context
+- modernbert
+---
+# ModernGBERT 1B
+This is a German ModernBERT 1B language model trained from scratch using the ModernBERT [codebase](https://github.com/AnswerDotAI/ModernBERT) and the same German portion of [RedPajama V2](https://huggingface.co/datasets/togethercomputer/RedPajama-Data-V2) as our [LLäMmlein](https://huggingface.co/collections/LSX-UniWue/llammlein-6732ff41f3705c686e605762) family.
+Find more details in our [preprint](https://arxiv.org/abs/2505.13136)!
+### Usage
+```python
+from transformers import AutoModel, AutoTokenizer
+model = AutoModel.from_pretrained("LSX-UniWue/ModernGBERT_1B")
+tokenizer = AutoTokenizer.from_pretrained("LSX-UniWue/ModernGBERT_1B")
 ```
+### Performance
+We evaluated our model on the [SuperGLEBer](https://lsx-uniwue.github.io/SuperGLEBer-site/) benchmark.