Dataset : Custom Dataset

VCTI-RoBERTa-Fiber

Model Summary

This model is a domain-adapted RoBERTa-base model fine-tuned using Masked Language Modeling (MLM) on optical communication and photonics data. It is optimized for generating domain-specific embeddings that capture the nuances and technical jargon of the optical domain.

โš ๏ธ Note: This is the basic version of our ongoing development.
A significantly improved version trained on much larger and more diverse optical corpora will be released soon!


Training Data

The model was trained on:

  • 1000+ Optical Wikipedia Articles
  • 120+ Optical Communication & Photonics Textbooks
  • 500+ ITUT and IEEE Papers
  • 1000+ Web Articles

The training corpus includes content related to:

  • Optical fibers
  • Photonic devices
  • Multiplexing (WDM, TDM, OTN )
  • Optical amplifiers
  • Modulation techniques
  • Communication networks
  • Laser systems ....etc

โš™๏ธ Training Details

Parameter Value Description
batch_size 64 Number of samples per training batch
epochs 15 Number of training epochs
patience 6 Early stopping patience
learning_rate 5e-5 Learning rate for the AdamW optimizer
weight_decay 0.01 Weight decay for regularization
objective MLM Masked Language Modeling

The training was performed using the transformers library by Hugging Face.


Core Use Case: Domain-Specific Embeddings

The fine-tuned model is particularly effective at generating context-aware embeddings for the optical domain. This makes it highly suitable for tasks such as:

  • Semantic Search across technical documents
  • Retrieval-Augmented Generation (RAG) for Q&A systems
  • Topic Modeling and document clustering
  • Similarity Matching between questions, answers, or papers

How to Use

Load the model

from transformers import RobertaTokenizerFast, RobertaModel

tokenizer = RobertaTokenizerFast.from_pretrained("quantum-leap-vcti/VCTI-RoBERTa-Fiber")
model = RobertaModel.from_pretrained("quantum-leap-vcti/VCTI-RoBERTa-Fiber")

text = "Wavelength-division multiplexing increases the capacity of optical fibers."
inputs = tokenizer(text, return_tensors="pt")
outputs = model(**inputs)
embedding = outputs.last_hidden_state.mean(dim=1)

MIT License

Copyright (c) 2025 Velankani Communications Technologies Inc.

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files , to deal in the Software without restriction

Downloads last month
148
Safetensors
Model size
125M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for quantum-leap-vcti/VCTI-RoBERTa-Fiber

Finetuned
(1803)
this model