Convert Song Metadata to Semantic Text Descriptions for Embedding

Extract ID3 tags from 8,000 Free Music Archive MP3s using Mutagen to build a songs.csv with 7,994 valid tracks, including title, artist, album, genre, and duration. Map genres to heuristic audio features (energy 0-1, valence 0-1, danceability 0-1, tempo BPM): hip-hop (0.75 energy, 0.80 dance, 95 BPM), folk (0.35 energy, 0.40 dance, 95 BPM), punk (0.90 energy, 150 BPM). Threshold moods into words—energy >0.7=energetic, <0.3=calm; valence >0.7=happy, <0.3=melancholic—to form descriptions like "Food by AWOL from AWOL - A Way Of Life. Genre: Hip-Hop. Mood: energetic, danceable." Embed these 384-dim all-MiniLM-L6-v2 vectors (FastEmbed ONNX, 220 tracks/sec on CPU, 36s total) instead of raw floats, as text embeddings capture semantic ties like "calm acoustic folk" better. Result: 11.7 MB raw vectors.

Build Portable Vector Index with Qdrant Edge

Create in-process Qdrant Edge shard (no server) with Cosine distance for 384-dim vectors, HNSW for sub-ms ANN search (95%+ recall). Upsert batches of 500 points with full payloads (track_id, metadata, audio_path, features). Shard is portable—copy data/qdrant_shard/ directory to any machine, loads instantly without re-indexing. Beats alternatives: SQLite-vec slower HNSW; FAISS lacks native persistence/payload filters; ChromaDB larger footprint; cloud DBs need internet. Lazy-load shard/model as singletons for zero-query startup cost after first use.

Voice, Mood, and UI Pipeline for In-Car Use

Transcribe voice locally with Whisper small (461 MB disk, CPU fp16=False): tap button, record WAV bytes, temp file, get text like "calm folk acoustic guitar." Expand moods: "chill" → "calm relaxing lo-fi ambient chill song" for richer embeddings. Search: embed query, optional genre filter (MatchTextAny), top_k=5 results by score. Streamlit UI (dark Spotify theme) shows results; custom HTML5 player base64-encodes MP3 bytes (data URI, autoplay) with play/pause icons—handles full tracks but large files bloat DOM. Player state machine loads relative/absolute audio_path from payload. Config.py centralizes paths/models (e.g., EMBEDDING_MODEL="sentence-transformers/all-MiniLM-L6-v2"). Full offline: works airplane mode on automotive CPU.