Offline AI Music Search for Cars with Qdrant Edge

Semantic Search Pipeline Delivers Driver-Safe Latency

Process user queries (voice, text, or mood) through a fully local chain: OpenAI Whisper small transcribes speech on-device to text; FastEmbed all-MiniLM-L6-v2 generates 384-dimensional vectors; Qdrant Edge performs cosine similarity HNSW ANN search on a 7,994-song index, returning results in <10ms. This enables natural-language queries like "upbeat hip hop" or "calm folk acoustic guitar" with zero network dependency, critical for in-car safety where delays distract drivers.

Mood search maps one-tap buttons (Happy, Sad, Energetic, Chill, Romantic, Party) to predefined embeddings for instant filtering. Results feed a Spotify-styled Streamlit UI with dark theme, green accents, pill controls, Inter font, and custom HTML5 player for real MP3 playback from 8,000 royalty-free Free Music Archive tracks.

Data Ingestion Builds Portable On-Device Index

Start with FMA-small dataset (8,000 MP3s): prepare_dataset.py uses mutagen to extract ID3 tags into songs.csv (7,994 rows × 13 columns). Then ingest.py embeds titles/descriptions/artists with FastEmbed (~36s at 220 tracks/sec on CPU) and indexes into a single Qdrant Edge shard file (data/qdrant_shard/).

Qdrant Edge outperforms cloud vector DBs for cars: <10ms in-process queries vs 50-200ms network latency; full privacy (no data leaves device); offline operation; zero-cost deployment as a Python lib (no Docker/server). Tradeoff: Limited to single-shard scale (~8k points here), but portable disk storage suits embedded infotainment.

search.py handles queries; voice.py manages Whisper; player.py streams MP3 bytes; audio_player.py renders custom controls (play/pause/seek/volume).

Streamlit Deployment for Quick Prototyping

app.py launches on localhost:8501. One-off setup: pip install from requirements.txt/pyproject.toml (UV); download FMA-small; run prep script (scans to 7,994 tracks); ingest (builds shard); launch. Icons load dynamically from icons/ PNGs via icon_loader.py. Entire stack (Whisper, FastEmbed, Qdrant, audio) runs on CPU with ONNX inference, proving viable for resource-constrained car hardware without GPUs.