metadata

title: ClipQuery
emoji: 🔍
colorFrom: indigo
colorTo: blue
sdk: gradio
sdk_version: 5.41.0
app_file: app.py
pinned: false

ClipQuery – Ask Questions of Any Podcast / Video and Hear the Answer

ClipQuery turns any local audio or video file into a searchable, conversational experience. It automatically transcribes the media, indexes each sentence with embeddings, and lets you ask natural-language questions. It returns:

A 30-second audio clip from the exact place in the media where the answer occurs.
The timestamp of the clip.
A live LangChain debug log so you can inspect what happened behind the scenes.

How It Works

┌───────────────┐   transcribe & segment   ┌───────────────┐
│  audio / mp4  │ ───────────────────────▶ │  transcripts  │
└───────────────┘                          └───────────────┘
        │                                           │
        │ build embeddings (SBERT)                  │  metadata: {start, end}
        ▼                                           ▼
┌─────────────────┐    store vectors     ┌────────────────────┐
│ HuggingFace     │────────────────────▶│ FAISS VectorStore  │
│ Sentence-Transformer│                 └────────────────────┘
└─────────────────┘                             ▲
                                                │ retrieve top-k
                                                ▼
                                         ┌────────────────────┐
                                         │ ChatOllama (phi3)  │
                                         │  RetrievalQA chain │
                                         └────────────────────┘

Transcription – index_builder.py uses faster-whisper to generate word-level timestamps, saved as segments.json.
Embedding + Index – Sentence-Transformer (miniLM) embeddings are stored in a FAISS index (data/*).
Question Answering – A local LLM (Ollama phi3) is wrapped in RetrievalQAWithSourcesChain to pull the most relevant transcript chunks and generate an answer.
Clip Extraction – clipper.py calls ffmpeg to cut a 30 s MP3 between the start and end timestamps (extended to 30 s if shorter).
Debug Logging – A custom JSONLCallbackHandler dumps every LangChain event to langchain_debug.jsonl; the Gradio UI streams it live in the Debug Log tab.

Installation

Prerequisites

Python 3.9+ (3.10 recommended)
FFmpeg (for audio processing)
For GPU acceleration: CUDA-compatible GPU (optional but recommended)

Quick Start (CPU/Spaces Mode)

# 1. Clone and set up
python -m venv .venv && source .venv/bin/activate  # Linux/macOS
# OR on Windows: .venv\Scripts\activate

pip install -r requirements.txt

# 2. Run the app (uses flan-t5-base by default)
python app.py

Local GPU Setup (Optional)

For better performance with local models:

Install Ollama

# macOS/Linux
curl -fsSL https://ollama.com/install.sh | sh

# Windows: Download from https://ollama.com/download

Download Models (pick one)

# Small & fast (4GB VRAM+)
ollama pull phi3

# Larger & more capable (8GB VRAM+)
ollama pull mistral

# Start Ollama in the background
ollama serve &

Run with Local Model

# The app will automatically detect Ollama if running
python app.py

FFmpeg Setup

# macOS
brew install ffmpeg

# Ubuntu/Debian
sudo apt update && sudo apt install ffmpeg

# Windows (with Chocolatey)
choco install ffmpeg

FFmpeg

clipper.py requires FFmpeg. On macOS:

brew install ffmpeg

Quick Start

Launch the App
```
python app.py
```
This starts a local web server at http://127.0.0.1:7860
First Run
- The first time you run with a new model, it will download the necessary files (1-5GB for flan-t5-base).
- Subsequent starts will be faster as files are cached.
Using the App
1. Upload any audio/video file (mp3, mp4, etc.)
2. Select a model:
  - For CPU/Spaces: flan-t5-base
  - For local GPU: phi3 or tinyllama (requires Ollama)
3. Ask a question in the Ask tab
4. The app will:
  - Transcribe the media (first time only)
  - Find the most relevant 30-second clip
  - Play the audio and show the timestamp
Debugging
- Check the terminal for transcription progress
- View the Debug Log tab for detailed LLM interactions
- Logs are saved to langchain_debug.jsonl

Project Layout

├── app.py              # Gradio UI and orchestration
├── clipper.py          # ffmpeg clip extraction helper
├── index_builder.py    # transcription + FAISS index builder
├── qa_engine.py        # load index, build RetrievalQA chain, JSONL logging
├── logging_config.py   # basic logger
├── requirements.txt
└── README.md

Generated artifacts:

downloads/audio.mp3 – copy of uploaded audio
data/faiss_index* – FAISS vector store
data/segments.json – transcript chunks with timestamps
langchain_debug.jsonl – streaming debug log

Customising

Change minimum clip length – Modify MIN_CLIP_SEC logic in app.py (currently hard-coded to 30 s).
Use a different LLM – Change the ChatOllama(model=...) argument in qa_engine.py (any Ollama-served model works).
Prompt template – Supply chain_type_kwargs={"prompt": custom_prompt} when calling RetrievalQAWithSourcesChain.from_chain_type.
Rotate / clean logs – Delete langchain_debug.jsonl; it will be recreated on the next query.

Troubleshooting

Common Issues

Issue	Solution
Ollama not detected	Run `ollama serve` in a separate terminal
CUDA Out of Memory	Use a smaller model (`phi3` instead of `mistral`) or reduce batch size
FFmpeg not found	Install FFmpeg and ensure it's in your PATH
Slow performance on CPU	Use `phi3` or `tinyllama` with Ollama for GPU acceleration
Model download errors	Check internet connection and disk space

Advanced

Reducing VRAM Usage:

# In app.py, reduce context length
llm = ChatOllama(model="phi3", num_ctx=2048)  # default is 4096

Faster Transcriptions:

# Pre-convert to 16kHz mono WAV
ffmpeg -i input.mp3 -ar 16000 -ac 1 -c:a pcm_s16le input_16k.wav

Debug Logs:
- Check langchain_debug.jsonl for detailed traces
- Set LOG_LEVEL=DEBUG for verbose output

Local Development

Environment Variables

# For HuggingFace models (required for Spaces)
export HUGGINGFACEHUB_API_TOKEN="your_token_here"

# For debugging
export LOG_LEVEL=DEBUG

Running Tests

# Install test dependencies
pip install pytest pytest-mock

# Run tests
pytest tests/

Building for Production

# Create a standalone executable (using PyInstaller)
pip install pyinstaller
pyinstaller --onefile app.py

License

MIT – do what you want, no warranty.