whisper-to-oliver / README.md

Upload folder using huggingface_hub

f0147e6 verified 4 months ago

5.48 kB

	---
	language:
	- en
	license: mit
	tags:
	- whisper
	- automatic-speech-recognition
	- speech
	- audio
	- transcription
	- phone-calls
	- conversational
	pipeline_tag: automatic-speech-recognition
	---

	<div align="center">
	<img src="https://olib.ai/logo.png" alt="Olib AI Logo" width="200"/>

	# Whisper to Oliver

	Fine-tuned Whisper for Real-World Conversational Audio

	[![Model on HF](https://img.shields.io/badge/🤗-Model%20on%20HF-yellow.svg)](https://huggingface.co/olib-ai/whisper-to-oliver)
	[![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](https://opensource.org/licenses/MIT)
	[![Olib AI](https://img.shields.io/badge/🌐-Olib%20AI-green.svg)](https://www.olib.ai)
	</div>

	## 🎯 Model Description

	Whisper to Oliver is a specialized fine-tuned version of OpenAI's `whisper-large-v3-turbo` model, optimized for real-world conversational audio with challenging acoustic conditions. This model is specifically designed to excel at transcribing phone calls and conversations where audio quality may be compromised.

	### ✨ Key Features

	- 🎙️ Enhanced Performance on Poor Quality Audio: Fine-tuned on 170K conversational datasets with minor to poor audio quality
	- 📞 Phone Call Optimized: Specifically trained on short conversational segments typical of phone calls
	- 🚀 Turbo Performance: Inherits the speed advantages of whisper-large-v3-turbo
	- 💼 Enterprise Ready: Developed by [Olib AI](https://www.olib.ai) for business applications
	- 🔧 FP32 Precision: Full precision model for maximum accuracy

	## 📊 Training Details

	- Base Model: [openai/whisper-large-v3-turbo](https://huggingface.co/openai/whisper-large-v3-turbo)
	- Training Dataset: 170,000 conversational audio samples
	- Audio Characteristics: Minor to poor quality recordings
	- Focus: Short conversational segments typical of phone interactions
	- Developer: [Olib AI](https://www.olib.ai) - Building AI Services for Businesses

	## 🚀 Usage

	### Using the Transformers Library

	```python
	import torch
	from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor, pipeline

	device = "cuda:0" if torch.cuda.is_available() else "cpu"
	torch_dtype = torch.float16 if torch.cuda.is_available() else torch.float32

	model_id = "olib-ai/whisper-to-oliver"

	model = AutoModelForSpeechSeq2Seq.from_pretrained(
	model_id, torch_dtype=torch_dtype, low_cpu_mem_usage=True, use_safetensors=True
	)
	model.to(device)

	# Note: This model is in FP32 format

	processor = AutoProcessor.from_pretrained(model_id)

	pipe = pipeline(
	"automatic-speech-recognition",
	model=model,
	tokenizer=processor.tokenizer,
	feature_extractor=processor.feature_extractor,
	torch_dtype=torch_dtype,
	device=device,
	)

	# Transcribe audio
	result = pipe("audio.mp3")
	print(result["text"])
	```

	### Advanced Usage with Parameters

	```python
	# For better results with phone calls or poor quality audio
	result = pipe(
	"phone_call.mp3",
	chunk_length_s=30,
	batch_size=16,
	return_timestamps=True,
	)
	print(result["text"])
	```

	## 📈 Performance

	Whisper to Oliver shows significant improvements over the base model when dealing with:
	- 📞 Phone call recordings
	- 🎙️ Low-quality microphone inputs
	- 🌐 Conversational speech with background noise
	- 💬 Short dialogue segments

	## 🎯 Intended Use

	This model is designed for:
	- Customer service call transcription
	- Meeting transcription with variable audio quality
	- Voice assistant applications
	- Real-time conversation analysis
	- Accessibility applications for hearing-impaired users

	## ⚠️ Limitations and Ethical Considerations

	Following the ethical guidelines of the base Whisper model:
	- Should not be used to transcribe recordings without consent
	- Not recommended for "subjective classification" tasks
	- Should undergo robust evaluation before deployment in high-risk contexts
	- May show performance variations across different languages and demographics

	## 📜 License

	This model is released under the MIT License, allowing for commercial and non-commercial use with proper attribution.

	## 📖 Citation

	If you use this model in your research or applications, please cite both our work and the original Whisper paper:

	```bibtex
	@misc{whisper-to-oliver,
	author = {{Olib AI}},
	title = {Whisper to Oliver: Fine-tuned Whisper for Real-World Conversational Audio},
	year = {2024},
	publisher = {Hugging Face},
	howpublished = {\url{https://huggingface.co/olib-ai/whisper-to-oliver}},
	}

	@misc{radford2022whisper,
	doi = {10.48550/ARXIV.2212.04356},
	url = {https://arxiv.org/abs/2212.04356},
	author = {Radford, Alec and Kim, Jong Wook and Xu, Tao and Brockman, Greg and McLeavey, Christine and Sutskever, Ilya},
	title = {Robust Speech Recognition via Large-Scale Weak Supervision},
	publisher = {arXiv},
	year = {2022},
	copyright = {arXiv.org perpetual, non-exclusive license}
	}
	```

	## 👥 About Olib AI

	[Olib AI](https://www.olib.ai) specializes in building AI services for businesses. Our team focuses on creating practical AI solutions that solve real-world problems.

	Contact Us:
	- 🌐 Website: [www.olib.ai](https://www.olib.ai)
	- 📧 Akram H. Sharkar: [[email protected]](mailto:[email protected])
	- 📧 Maya M. Sharkar: [[email protected]](mailto:[email protected])
	- 💻 GitHub: [https://github.com/Olib-AI](https://github.com/Olib-AI)

	---

	<div align="center">
	<strong>Built with ❤️ by Olib AI</strong>
	</div>