Dataset : Custom Dataset
VCTI-RoBERTa-Fiber
Model Summary
This model is a domain-adapted RoBERTa-base model fine-tuned using Masked Language Modeling (MLM) on optical communication and photonics data. It is optimized for generating domain-specific embeddings that capture the nuances and technical jargon of the optical domain.
โ ๏ธ Note: This is the basic version of our ongoing development.
A significantly improved version trained on much larger and more diverse optical corpora will be released soon!
Training Data
The model was trained on:
- 1000+ Optical Wikipedia Articles
- 120+ Optical Communication & Photonics Textbooks
- 500+ ITUT and IEEE Papers
- 1000+ Web Articles
The training corpus includes content related to:
- Optical fibers
- Photonic devices
- Multiplexing (WDM, TDM, OTN )
- Optical amplifiers
- Modulation techniques
- Communication networks
- Laser systems ....etc
โ๏ธ Training Details
Parameter | Value | Description |
---|---|---|
batch_size | 64 | Number of samples per training batch |
epochs | 15 | Number of training epochs |
patience | 6 | Early stopping patience |
learning_rate | 5e-5 | Learning rate for the AdamW optimizer |
weight_decay | 0.01 | Weight decay for regularization |
objective | MLM | Masked Language Modeling |
The training was performed using the transformers
library by Hugging Face.
Core Use Case: Domain-Specific Embeddings
The fine-tuned model is particularly effective at generating context-aware embeddings for the optical domain. This makes it highly suitable for tasks such as:
- Semantic Search across technical documents
- Retrieval-Augmented Generation (RAG) for Q&A systems
- Topic Modeling and document clustering
- Similarity Matching between questions, answers, or papers
How to Use
Load the model
from transformers import RobertaTokenizerFast, RobertaModel
tokenizer = RobertaTokenizerFast.from_pretrained("quantum-leap-vcti/VCTI-RoBERTa-Fiber")
model = RobertaModel.from_pretrained("quantum-leap-vcti/VCTI-RoBERTa-Fiber")
text = "Wavelength-division multiplexing increases the capacity of optical fibers."
inputs = tokenizer(text, return_tensors="pt")
outputs = model(**inputs)
embedding = outputs.last_hidden_state.mean(dim=1)
MIT License
Copyright (c) 2025 Velankani Communications Technologies Inc.
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files , to deal in the Software without restriction
- Downloads last month
- 148
Model tree for quantum-leap-vcti/VCTI-RoBERTa-Fiber
Base model
FacebookAI/roberta-base