SentenceTransformer based on sentence-transformers/all-distilroberta-v1
This is a sentence-transformers model finetuned from sentence-transformers/all-distilroberta-v1. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
Model Details
Model Description
- Model Type: Sentence Transformer
- Base model: sentence-transformers/all-distilroberta-v1
- Maximum Sequence Length: 512 tokens
- Output Dimensionality: 768 dimensions
- Similarity Function: Cosine Similarity
Model Sources
- Documentation: Sentence Transformers Documentation
- Repository: Sentence Transformers on GitHub
- Hugging Face: Sentence Transformers on Hugging Face
Full Model Architecture
SentenceTransformer(
(0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: RobertaModel
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("Vishnu7796/my-finetuned-model")
# Run inference
sentences = [
'Marketing effectiveness measurement, content performance analysis, A/B testing for social media',
'Skills:5+ years of marketing or business analytics experience with synthesizing large-scale data sets to generate insights and recommendations.5+ years of working experience using SQL, Excel, Tableau, and/or Power B. R & Python knowledge are preferred.Understanding of the data science models used for measuring marketing incrementality, e.g. multi-touch attribution, marketing mix models, causal inference, time-series regression, match market test, etc....Understanding of the full-funnel cross-platform marketing and media landscape and experience evolving analytics and measurement capabilities.Flexibility in priority shifts and fast iterations/agile working environment.Strong problem-solving skills, and ability to structure problems into an analytics plan.\nPride Global offers eligible employee’s comprehensive healthcare coverage (medical, dental, and vision plans), supplemental coverage (accident insurance, critical illness insurance and hospital indemnity), 401(k)-retirement savings, life & disability insurance, an employee assistance program, legal support, auto, home insurance, pet insurance and employee discounts with preferred vendors.',
'Hi All,\nThis is Nithya from TOPSYSIT, We have a job requirement for Data Scientist with GenAI. If anyone interested please send me your updated resume along with contact details to [email protected]\nAny Visa is Fine on W2 except H1B ,OPT and CPT.If GC holders who can share PPN along with proper documentation are eligible\nJob Title Data Scientist with GenAILocation: Plano, TX-OnsiteEXP: 10 Years Description:Competencies: SQL, Natural Language Processing (NLP), Python, PySpark/ApacheSpark, Databricks.Python libraries: Numpy, Pandas, SK-Learn, Matplotlib, Tensorflow, PyTorch.Deep Learning: ANN, RNN, LSTM, CNN, Computer vision.NLP: NLTK, Word Embedding, BOW, TF-IDF, World2Vec, BERT.Framework: Flask or similar.\nThanks & Regards,Nithya Kandee:[email protected]:678-899-6898',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
Evaluation
Metrics
Triplet
- Datasets:
ai-job-validationandai-job-test - Evaluated with
TripletEvaluator
| Metric | ai-job-validation | ai-job-test |
|---|---|---|
| cosine_accuracy | 0.9875 | 0.9756 |
Training Details
Training Dataset
Unnamed Dataset
- Size: 647 training samples
- Columns:
query,job_description_pos, andjob_description_neg - Approximate statistics based on the first 647 samples:
query job_description_pos job_description_neg type string string string details - min: 8 tokens
- mean: 15.05 tokens
- max: 40 tokens
- min: 7 tokens
- mean: 350.34 tokens
- max: 512 tokens
- min: 7 tokens
- mean: 352.82 tokens
- max: 512 tokens
- Samples:
query job_description_pos job_description_neg healthcare data analytics, pregnancy identification algorithms, causal modeling techniquesexperience in using, manipulating, and extracting insights from healthcare data with a particular focus on using machine learning with claims data. The applicant will be driven by curiosity, collaborating with a cross-functional team of Product Managers, Software Engineers, and Data Analysts.
Responsibilities
Apply data science, machine learning, and healthcare domain expertise to advance and oversee Lucina’s pregnancy identification and risk-scoring algorithms.Analyze healthcare data to study patterns of care and patient conditions which correlate to specific outcomes.Collaborate on clinical committee research and development work.Complete ad hoc analyses and reports from internal or external customers prioritized by management throughout the year.
Qualifications
Degree or practical experience in Applied Math, Statistics, Engineering, Information Management with 3 or more years of data analytics experience, Masters degree a plus.Experience manipulating and analyzing healthcare dat...Experience of Delta Lake, DWH, Data Integration, Cloud, Design and Data Modelling. Proficient in developing programs in Python and SQLExperience with Data warehouse Dimensional data modeling. Working with event based/streaming technologies to ingest and process data. Working with structured, semi structured and unstructured data. Optimize Databricks jobs for performance and scalability to handle big data workloads. Monitor and troubleshoot Databricks jobs, identify and resolve issues or bottlenecks. Implement best practices for data management, security, and governance within the Databricks environment. Experience designing and developing Enterprise Data Warehouse solutions. Proficient writing SQL queries and programming including stored procedures and reverse engineering existing process. Perform code reviews to ensure fit to requirements, optimal execution patterns and adherence to established standards.
Requirements:
You are:
Minimum 9+ years of experience is required. 5+ years...Data Engineer Python Azure API integrationexperience preferred but not required.
Must-Have Skills:10+ years of total IT experience required.of 4 years of proven and relevant experience in a similar Data Engineer role and/or Python Dev role.Strong proficiency in Python programming is essential for data manipulation, pipeline development, and integration tasks.In-depth knowledge of SQL for database querying, data manipulation, and performance optimization.Experience working with RESTful APIs and integrating data from external sources using API calls.Azure: Proficiency in working with Microsoft Azure cloud platform, including services like Azure Data Factory, Azure Databricks, and Azure Storage.requirements;Research & implement new data products or capabilitiesAutomate data visualization and reporting capabilities that empower users (both internal and external) to access data on their own thereby improving quality, accuracy and speedSynthesize raw data into actionable insights to drive business results, identify key trends and opportunities for business teams and report the findings in a simple, compelling wayEvaluate and approve additional data partners or data assets to be utilized for identity resolution, targeting or measurementEnhance PulsePoint's data reporting and insights generation capability by publishing internal reports about Health dataAct as the “Subject Matter Expert” to help internal teams understand the capabilities of our platforms, how to implement & troubleshoot
RequirementsWhat are the ‘must haves’ we’re looking for?Minimum 3-5 years of relevant experience in:Creating SQL queries from scratch using real business data;Highly proficient knowledge of Excel (...Data Engineer big data technologies, cloud data warehousing, real-time data streamingexperience in machine learning, distributed microservices, and full stack systems Utilize programming languages like Java, Scala, Python and Open Source RDBMS and NoSQL databases and Cloud based data warehousing services such as Redshift and Snowflake Share your passion for staying on top of tech trends, experimenting with and learning new technologies, participating in internal & external technology communities, and mentoring other members of the engineering community Collaborate with digital product managers, and deliver robust cloud-based solutions that drive powerful experiences to help millions of Americans achieve financial empowerment Perform unit tests and conduct reviews with other team members to make sure your code is rigorously designed, elegantly coded, and effectively tuned for performance
Basic Qualifications:
Bachelor’s Degree At least 2 years of experience in application development (Internship experience does not apply) At least 1 year of experience in big d...requirements of analyses and reports.Transform requirements into actionable, high-quality deliverables.Perform periodic and ad-hoc operations data analysis to measure performance and conduct root cause analysis for Claims, FRU, G&A, Provider and UM data.Compile, analyze and provide reporting that identifies and defines actionable information or recommends possible solutions for corrective actions.Partner with other Operations areas as needed to provide technical and other support in the development, delivery, maintenance, and enhancement of analytical reports and analyses.Collaborate with Operations Tower Leaders in identifying and recommending operational performance metrics; map metrics against targets and the company’s operational plans and tactical/strategic goals to ensure alignment and focus.Serve as a liaison with peers in other departments to ensure data integrity.Code and schedule reports using customer business requirements from Claims, FRU, G&A, Provider and UM data.
Princi... - Loss:
MultipleNegativesRankingLosswith these parameters:{ "scale": 20.0, "similarity_fct": "cos_sim" }
Evaluation Dataset
Unnamed Dataset
- Size: 80 evaluation samples
- Columns:
query,job_description_pos, andjob_description_neg - Approximate statistics based on the first 80 samples:
query job_description_pos job_description_neg type string string string details - min: 8 tokens
- mean: 14.9 tokens
- max: 25 tokens
- min: 14 tokens
- mean: 354.31 tokens
- max: 512 tokens
- min: 31 tokens
- mean: 334.05 tokens
- max: 512 tokens
- Samples:
query job_description_pos job_description_neg Data analysis, operations reporting, SQL expertiserequirements, determine technical issues, and design reports to meet data analysis needsDeveloping and maintaining web-based dashboards for real-time reporting of key performance indicators for Operations. Dashboards must be simple to use, easy to understand, and accurate.Maintenance of current managerial reports and development of new reportsDevelop and maintain reporting playbook and change logOther duties in the PUA department as assigned
What YOU Will Bring To C&F
Solid analytical and problem solving skillsIntuitive, data-oriented with a creative, solutions-based approachAbility to manage time, multi-task and prioritizes multiple assignments effectivelyAbility to work independently and as part of a teamAble to recognize and analyze business and data issues with minimal supervision, ability to escalate when necessaryAble to identify cause and effect relationships in data and work process flows
Requirements
3 years in an Analyst role is requiredA Bachelor’s degree in associated f...experience in data engineering, software engineering, data analytics, or machine learning.Strong expertise working with one or more cloud data platforms (Snowflake, Sagemaker, Databricks, etc.)Experience managing Snowflake infrastructure with terraform.Experience building batch, near real-time, and real-time data integrations with multiple sources including event streams, APIs, relational databases, noSQL databases, graph databases, document stores, and cloud object stores.Strong ability to debug, write, and optimize SQL queries in dbt. Experience with dbt is a must.Strong programming experience in one or more modern programming languages (Python, Clojure, Scala, Java, etc.)Experience working with both structured and semi-structured data.Experience with the full software development lifecycle including requirements gathering, design, implementation, testing, deployment, and iteration.Strong understanding of CI/CD principles.Strong ability to document, diagram, and deliver detailed pres...AWS Sagemaker, ML Model Deployment, Feedback Loop AutomationQualifications
AWS tools and solutions including Sagemaker, Redshift, AthenaExperience with Machine learning libraries such as PyTorchHands-on experience with designing, developing and deploying workflows with ML models with feedback loops; Uses Bitbucket workflows and has experience with CI/CDDeep experience in at least two of the following languages: PySpark/Spark, Python, CWorking knowledge of AI/ML algorithms. Large language models (LLMs), Retrieval-augmented generation (RAN), Clustering algorithms (such as K-Means), Binary classifiers (such as XGBoost)High level of self-starter, learning, and initiative behaviors Preferred:Background as a software engineer and experience as a data scientistFeatures Stores
Why Teaching Strategies
At Teaching Strategies, our solutions and services are only as strong as the teams that create them. By bringing passion, dedication, and creativity to your job every day, there's no telling what you can do and where you can go! We provide a competitive...requirements and metrics.
Provide training and support to end-users on data quality best practices and tools.
Develop and maintain documentation related to data quality processes.
Education Qualification:
Bachelor's degree in a related field such as Data Science, Computer Science, or Information Systems.
Required Skills:
Experience working as a BA/Data Analyst in a Data warehouse/Data governance platform.
Strong analytical and problem-solving skills.
Proficiency in SQL, data analysis, and data visualization tools.
Critical thinking.
Ability to understand and examine complex datasets.
Ability to interpret Data quality results and metrics.
Desired Skills:
Knowledge of Data quality standards and processes.
Proven experience in a Data Quality Analyst or similar role.
Experience with data quality tools such as Informatica, PowerCurve, or Collibra DQ is preferred.
Certifications in data management or quality assurance (e.g.
Certified Data Management Professional, Certified Quality ...Financial analysis, process re-engineering, client relationship managementskills:
BA/BS degree in finance-related field and/or 2+ years working in finance or related field Strong working knowledge of Microsoft Office (especially Excel) Ability to work in a fast-paced environment and attention to detail. This role includes reviews and reconciliation of financial information.
General Position Summary
The Business Analyst performs professional duties related to the review, assessment and development of business systems and processes as well as new client requirements. This includes reviewing existing processes to develop strong QA procedures as well as maximizing review efficiencies and internal controls through process re-engineering. The Business Analyst will assist with the development of seamless solutions for unique requirements of new clients, delivered and implemented on time and within scope. This role will ensure that all activity, reconciliation, reporting, and analysis is carried out in an effective, timely and accurate manner and will look for cont...Skills / Experience:Required: Proficiency with Python, pyTorch, Linux, Docker, Kubernetes, Jupyter. Expertise in Deep Learning, Transformers, Natural Language Processing, Large Language Models
Preferred: Experience with genomics data, molecular genetics. Distributed computing tools like Ray, Dask, Spark.
Thanks & RegardsBharat Priyadarshan GuntiHead of Recruitment & OperationsStellite Works LLC4841 W Stonegate Circle Lake Orion MI - 48359Contact: 313 221 [email protected] - Loss:
MultipleNegativesRankingLosswith these parameters:{ "scale": 20.0, "similarity_fct": "cos_sim" }
Training Hyperparameters
Non-Default Hyperparameters
eval_strategy: stepsper_device_train_batch_size: 16per_device_eval_batch_size: 16learning_rate: 2e-05num_train_epochs: 1warmup_ratio: 0.1batch_sampler: no_duplicates
All Hyperparameters
Click to expand
overwrite_output_dir: Falsedo_predict: Falseeval_strategy: stepsprediction_loss_only: Trueper_device_train_batch_size: 16per_device_eval_batch_size: 16per_gpu_train_batch_size: Noneper_gpu_eval_batch_size: Nonegradient_accumulation_steps: 1eval_accumulation_steps: Nonetorch_empty_cache_steps: Nonelearning_rate: 2e-05weight_decay: 0.0adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08max_grad_norm: 1.0num_train_epochs: 1max_steps: -1lr_scheduler_type: linearlr_scheduler_kwargs: {}warmup_ratio: 0.1warmup_steps: 0log_level: passivelog_level_replica: warninglog_on_each_node: Truelogging_nan_inf_filter: Truesave_safetensors: Truesave_on_each_node: Falsesave_only_model: Falserestore_callback_states_from_checkpoint: Falseno_cuda: Falseuse_cpu: Falseuse_mps_device: Falseseed: 42data_seed: Nonejit_mode_eval: Falseuse_ipex: Falsebf16: Falsefp16: Falsefp16_opt_level: O1half_precision_backend: autobf16_full_eval: Falsefp16_full_eval: Falsetf32: Nonelocal_rank: 0ddp_backend: Nonetpu_num_cores: Nonetpu_metrics_debug: Falsedebug: []dataloader_drop_last: Falsedataloader_num_workers: 0dataloader_prefetch_factor: Nonepast_index: -1disable_tqdm: Falseremove_unused_columns: Truelabel_names: Noneload_best_model_at_end: Falseignore_data_skip: Falsefsdp: []fsdp_min_num_params: 0fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap: Noneaccelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed: Nonelabel_smoothing_factor: 0.0optim: adamw_torchoptim_args: Noneadafactor: Falsegroup_by_length: Falselength_column_name: lengthddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falsedataloader_pin_memory: Truedataloader_persistent_workers: Falseskip_memory_metrics: Trueuse_legacy_prediction_loop: Falsepush_to_hub: Falseresume_from_checkpoint: Nonehub_model_id: Nonehub_strategy: every_savehub_private_repo: Nonehub_always_push: Falsegradient_checkpointing: Falsegradient_checkpointing_kwargs: Noneinclude_inputs_for_metrics: Falseinclude_for_metrics: []eval_do_concat_batches: Truefp16_backend: autopush_to_hub_model_id: Nonepush_to_hub_organization: Nonemp_parameters:auto_find_batch_size: Falsefull_determinism: Falsetorchdynamo: Noneray_scope: lastddp_timeout: 1800torch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Nonedispatch_batches: Nonesplit_batches: Noneinclude_tokens_per_second: Falseinclude_num_input_tokens_seen: Falseneftune_noise_alpha: Noneoptim_target_modules: Nonebatch_eval_metrics: Falseeval_on_start: Falseuse_liger_kernel: Falseeval_use_gather_object: Falseaverage_tokens_across_devices: Falseprompts: Nonebatch_sampler: no_duplicatesmulti_dataset_batch_sampler: proportional
Training Logs
| Epoch | Step | ai-job-validation_cosine_accuracy | ai-job-test_cosine_accuracy |
|---|---|---|---|
| 0 | 0 | 0.85 | - |
| 1.0 | 41 | 0.9875 | 0.9756 |
Framework Versions
- Python: 3.11.12
- Sentence Transformers: 3.3.1
- Transformers: 4.48.0
- PyTorch: 2.6.0+cu124
- Accelerate: 1.5.2
- Datasets: 2.14.4
- Tokenizers: 0.21.1
Citation
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
MultipleNegativesRankingLoss
@misc{henderson2017efficient,
title={Efficient Natural Language Response Suggestion for Smart Reply},
author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
year={2017},
eprint={1705.00652},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
- Downloads last month
- -
Model tree for Vishnu7796/my-finetuned-model
Base model
sentence-transformers/all-distilroberta-v1Evaluation results
- Cosine Accuracy on ai job validationself-reported0.988
- Cosine Accuracy on ai job testself-reported0.976