Offline In-Car Music Search with Local AI Embeddings
CarTune enables voice-activated semantic music discovery on 7,994 songs using local Whisper transcription, FastEmbed vectors, and Qdrant Edge—no internet, runs fully on-device at 220 embeds/sec on CPU.
Convert Song Metadata to Embeddable Text Descriptions for Semantic Search
Extract ID3 tags from Free Music Archive's 8,000 royalty-free MP3s using Mutagen to build a songs.csv with 7,994 valid tracks, including title, artist, album, genre, and duration (skipping files under 5s). Map genres to mood heuristics like hip-hop (energy 0.75, valence 0.55, danceability 0.80, tempo 95) or folk (0.35, 0.55, 0.40, 95) since FMA lacks Spotify-style audio features. Transform each track into a natural-language string, e.g., "Food by AWOL from AWOL - A Way Of Life. Genre: Hip-Hop. Mood: energetic, danceable." Threshold moods: energy >0.7=energetic, <0.3=calm; valence >0.7=happy, <0.3=melancholic; danceability >0.7=danceable. Embed these 384-dim vectors with FastEmbed's all-MiniLM-L6-v2 (ONNX, CPU-only) at 220 tracks/sec, taking 36s total—text outperforms raw floats for capturing vibes like "calm folk acoustic guitar."
Index and Query with Portable Qdrant Edge Shard
Create a Qdrant Edge shard (in-process, no server) using Cosine distance for semantic similarity (direction over magnitude). Batch upsert points with full payloads (track_id, metadata, audio_path, moods) in 500-song chunks; total shard ~11.7MB vectors + HNSW index, portable by copying directory—no re-indexing across devices. At query time, embed input (e.g., "upbeat hip hop for long drive") and run HNSW ANN search (sub-ms on 7,994 points, ~95% recall). Filter by genre via payload (MatchTextAny). Expand moods: "chill" → "calm relaxing lo-fi ambient chill song" for richer embedding coverage, boosting recall on calm/ambient tracks. Lazy-load shard/model as singletons for instant queries post-startup.
Integrate Local Voice and Streamlit Playback
Transcribe voice via Whisper 'small' (461MB disk, CPU fp16=False to avoid failures) on uploaded bytes/temp WAV—handles accents for queries like "calm folk acoustic guitar." Streamlit UI offers voice/text/mood tabs (6 buttons: happy/sad/energetic/chill/romantic/party), dark Spotify theme. Custom HTML5 player base64-encodes MP3 bytes (data:audio/mpeg URI) with play/pause icons—no file server, but limits to shorter clips (full 4min tracks bloat DOM). Player state machine loads relative/absolute audio_path from payload, persists across reruns. Central config.py unifies paths/models (e.g., EMBEDDING_DIM=384, TOP_K=5) for easy swaps. Beats alternatives: Qdrant Edge > SQLite-vec (slower HNSW), FAISS (no native persist/payload filter), Chroma (bigger footprint), clouds (needs internet).