Olib AI Logo

Whisper to Oliver - FP16

Fine-tuned Whisper for Real-World Conversational Audio (FP16 Version)

Model on HF License: MIT Olib AI

🎯 Model Description

Whisper to Oliver FP16 is the half-precision version of our specialized fine-tuned Whisper model, optimized for real-world conversational audio with challenging acoustic conditions. This FP16 version offers faster inference and reduced memory usage while maintaining excellent transcription quality.

✨ Key Features

  • πŸŽ™οΈ Enhanced Performance on Poor Quality Audio: Fine-tuned on 170K conversational datasets with minor to poor audio quality
  • πŸ“ž Phone Call Optimized: Specifically trained on short conversational segments typical of phone calls
  • πŸš€ Turbo Performance: Inherits the speed advantages of whisper-large-v3-turbo
  • πŸ’Ό Enterprise Ready: Developed by Olib AI for business applications
  • ⚑ FP16 Optimized: Half-precision format for 2x faster inference and 50% memory reduction
  • 🎯 Production Ready: Ideal balance between speed and accuracy for deployment

πŸ“Š Training Details

  • Base Model: openai/whisper-large-v3-turbo
  • Training Dataset: 170,000 conversational audio samples
  • Audio Characteristics: Minor to poor quality recordings
  • Focus: Short conversational segments typical of phone interactions
  • Precision: FP16 (converted from FP32 original)
  • Developer: Olib AI - Building AI Services for Businesses

πŸš€ Usage

Using the Transformers Library

import torch
from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor, pipeline

device = "cuda:0" if torch.cuda.is_available() else "cpu"
torch_dtype = torch.float16 if torch.cuda.is_available() else torch.float32

model_id = "olib-ai/whisper-to-oliver-fp16"

model = AutoModelForSpeechSeq2Seq.from_pretrained(
    model_id, torch_dtype=torch_dtype, low_cpu_mem_usage=True, use_safetensors=True
)
model.to(device)

processor = AutoProcessor.from_pretrained(model_id)

pipe = pipeline(
    "automatic-speech-recognition",
    model=model,
    tokenizer=processor.tokenizer,
    feature_extractor=processor.feature_extractor,
    torch_dtype=torch_dtype,
    device=device,
)

# Transcribe audio
result = pipe("audio.mp3")
print(result["text"])

Advanced Usage with Parameters

# For better results with phone calls or poor quality audio
result = pipe(
    "phone_call.mp3",
    chunk_length_s=30,
    batch_size=16,
    return_timestamps=True,
)
print(result["text"])

🎯 FP16 vs FP32 Comparison

Metric FP32 Version FP16 Version
Model Size ~1.5GB ~760MB
Inference Speed 1x ~2x faster
Memory Usage 1x ~50% less
Accuracy Baseline ~99.9% retained

For maximum accuracy, consider using our FP32 version.

πŸ“ˆ Performance

Whisper to Oliver shows significant improvements over the base model when dealing with:

  • πŸ“ž Phone call recordings
  • πŸŽ™οΈ Low-quality microphone inputs
  • 🌐 Conversational speech with background noise
  • πŸ’¬ Short dialogue segments

🎯 Intended Use

This model is designed for:

  • Customer service call transcription
  • Meeting transcription with variable audio quality
  • Voice assistant applications
  • Real-time conversation analysis
  • Accessibility applications for hearing-impaired users
  • Edge deployment where memory and speed are critical

⚠️ Limitations and Ethical Considerations

Following the ethical guidelines of the base Whisper model:

  • Should not be used to transcribe recordings without consent
  • Not recommended for "subjective classification" tasks
  • Should undergo robust evaluation before deployment in high-risk contexts
  • May show performance variations across different languages and demographics
  • FP16 precision may have minimal impact on edge cases compared to FP32

πŸ“œ License

This model is released under the MIT License, allowing for commercial and non-commercial use with proper attribution.

πŸ“– Citation

If you use this model in your research or applications, please cite both our work and the original Whisper paper:

@misc{whisper-to-oliver,
  author = {{Olib AI}},
  title = {Whisper to Oliver: Fine-tuned Whisper for Real-World Conversational Audio},
  year = {2024},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/olib-ai/whisper-to-oliver-fp16}},
}

@misc{radford2022whisper,
  doi = {10.48550/ARXIV.2212.04356},
  url = {https://arxiv.org/abs/2212.04356},
  author = {Radford, Alec and Kim, Jong Wook and Xu, Tao and Brockman, Greg and McLeavey, Christine and Sutskever, Ilya},
  title = {Robust Speech Recognition via Large-Scale Weak Supervision},
  publisher = {arXiv},
  year = {2022},
  copyright = {arXiv.org perpetual, non-exclusive license}
}

πŸ‘₯ About Olib AI

Olib AI specializes in building AI services for businesses. Our team focuses on creating practical AI solutions that solve real-world problems.

Contact Us:


Built with ❀️ by Olib AI
Downloads last month
21
Safetensors
Model size
809M params
Tensor type
F16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support