rustemgareev
/

mdeberta-v3-base-lite

+---
+language:
+- multilingual
+- bg
+- en
+- fr
+- de
+- ru
+- es
+- sw
+- tr
+- vi
+tags:
+- deberta
+- deberta-v3
+- mdeberta
+license: mit
+---
+# mdeberta-v3-base-lite
+This model was created through vocabulary pruning of the original [microsoft/mdeberta-v3-base](https://huggingface.co/microsoft/mdeberta-v3-base) model while maintaining full quality for Latin and Cyrillic-based languages.
+## Supported Languages
+- Bulgarian
+- English
+- French
+- German
+- Russian
+- Spanish
+- Swahili
+- Turkish
+- Vietnamese
+## Usage
+```python
+from transformers import AutoTokenizer, AutoModel
+tokenizer = AutoTokenizer.from_pretrained("rustemgareev/mdeberta-v3-base-lite")
+model = AutoModel.from_pretrained("rustemgareev/mdeberta-v3-base-lite")
+# Example usage
+text = "This is an example text in English."
+inputs = tokenizer(text, return_tensors="pt")
+outputs = model(**inputs)
+```
+## Performance Evaluation
+### Size Comparison
+| Metric | Original Model | Lite Model | Reduction |
+|--------|----------------|------------|-----------|
+| Vocabulary Size | 250,102 tokens | 163,211 tokens | 34.74% |
+| Disk Size | 1.06 GB | 817 MB | 23.23% |
+### VRAM Usage Comparison
+*Estimated using [Hugging Face Accelerate Model Estimator](https://huggingface.co/docs/accelerate/main/en/usage_guides/model_size_estimator).*
+| Metric | Original Model | Lite Model | Reduction |
+|--------|----------------|------------|-----------|
+| Largest Layer (float32) | 735.35 MB | 478.16 MB | 34.99% |
+| Total Size (float32) | 1.04 GB | 804.13 MB | 22.68% |
+| Training using Adam (Peak vRAM) | 4.15 GB | 3.14 GB | 24.34% |
+### Semantic Similarity Comparison
+**Evaluation Method**: Cosine similarity between embeddings of parallel sentences in different languages, using English as reference.
+**Test Phrases Used**:
+- English: "Artificial intelligence learns to understand human languages and helps people communicate."
+- Bulgarian: "Изкуственият интелект се учи да разбира човешките езици и помага на хората да общуват."
+- French: "L'intelligence artificielle apprend à comprendre les langages humains et aide les gens à communiquer."
+- German: "Künstliche Intelligenz lernt, menschliche Sprachen zu verstehen und hilft Menschen bei der Kommunikation."
+- Russian: "Искусственный интеллект учится понимать человеческие языки и помогает людям общаться."
+- Spanish: "La inteligencia artificial aprende a entender los idiomas humanos y ayuda a las personas a comunicarse."
+- Swahili: "Akili ya kisasa inajifunza kuelewa lugha za wanadamu na kusaidia watu kuwasiliana."
+- Turkish: "Yapay zeka, insan dillerini anlamayı öğrenir ve insanların iletişim kurmasına yardımcı olur."
+- Vietnamese: "Trí tuệ nhân tạo học cách hiểu ngôn ngữ con người và giúp mọi người giao tiếp."
+**Similarity Results**:
+| Language Pair | Original Similarity | Lite Similarity | Difference |
+|---------------|-----------------|-----------------|------------|
+| English-Bulgarian | 0.9276 | 0.9276 | 0.0000 |
+| English-French | 0.9322 | 0.9322 | 0.0000 |
+| English-German | 0.9178 | 0.9178 | 0.0000 |
+| English-Russian | 0.9335 | 0.9335 | 0.0000 |
+| English-Spanish | 0.9228 | 0.9228 | 0.0000 |
+| English-Swahili | 0.9591 | 0.9591 | 0.0000 |
+| English-Turkish | 0.9450 | 0.9450 | 0.0000 |
+| English-Vietnamese | 0.7955 | 0.7955 | 0.0000 |
+## License
+This model is distributed under the [MIT License](https://opensource.org/licenses/MIT).