The Problem: Siloed Medication Safety Data
Patients seeking information on psychiatric medications face a divide between two unreliable extremes: authoritative but abstract regulatory data (FDA reports) and experience-near but unvalidated patient narratives (Reddit, WebMD). This fragmentation creates risks, as poorly contextualized information can trigger nocebo responses or medication non-adherence. The core challenge is integrating these sources without conflating clinical evidence with anecdotal experience.
A Provenance-Aware Multi-Agent Framework
The researchers developed a multi-agent system that unifies 466,525 Reddit posts, 60,782 WebMD reviews, and 20 years of FDA Adverse Event Reporting System (FAERS) records for nine antidepressants. To ensure auditability, they built a Neo4j knowledge graph grounded in standard medical vocabularies (ATC-N, ICD-10, and MedDRA). This structure preserves the provenance of every claim, ensuring that regulatory facts remain distinct from patient-reported experiences.
Key Findings on Data Concordance
- Independent Signals: Patient-generated data (Reddit and WebMD) showed high internal concordance, with Jaccard similarity scores up to 0.905. This suggests that community platforms form a distinct, partly independent safety signal.
- Early Detection: Community data often acts as an early warning system. For sertraline, the researchers observed that many adverse events were reported in community sources hundreds of days before they appeared in official FDA records.
- High-Accuracy Extraction: The team implemented an LLM-based entity-recognition pipeline that achieved F1 scores of 0.969 for medications and 0.973 for conditions, validated against physician annotations.