ReMMD: Agentic Verification for Multimodal Misinformation

The Challenge of Modern Misinformation

Multimodal misinformation has evolved beyond simple text-image pairs. Modern viral content often features long, multilingual narratives combined with multiple images, mixed provenance, and subtle framing errors. Existing detection benchmarks typically rely on isolated, binary-label tasks that fail to capture these real-world complexities. Furthermore, agentic verification—the process of using AI to actively search for evidence—has historically been prohibitively expensive and inefficient.

The ReMMD Framework

ReMMD (Realistic Multilingual Multi-Image Agentic Verification) addresses these gaps through two primary components:

ReMMDBench: A comprehensive benchmark consisting of 500 samples and 2,756 images. It covers five monolingual languages, two cross-lingual settings, and three text-length tiers. It provides five-way veracity labels, eight specific distortion labels, and includes evidence provenance and rationales to support transparent verification.
ReMMD-Agent: A persistent-memory verifier designed to handle complex posts. Instead of treating a post as a single unit, the agent decomposes the content into atomic points. It then constructs a reusable evidence set to verify these points, enabling structured L1/L2/L3 outputs that offer more granular insight than binary classification.

Performance and Efficiency Gains

ReMMD-Agent demonstrates superior performance in five-way veracity detection, achieving 41.80% accuracy and 39.12% macro-F1 using GPT-5.2. Beyond accuracy, the framework significantly optimizes resource consumption. It reduces verification costs by 17.5% compared to MMD-Agent and by 79.9% relative to T2-Agent, proving that structured, atomic decomposition is a more cost-effective strategy for large-scale misinformation detection than traditional monolithic approaches.

The Challenge of Modern Misinformation

The ReMMD Framework

Performance and Efficiency Gains

More from AI & LLMs

Visual-Seeker: Active Visual Reasoning for Multimodal Agents

Google Overhauls Gemini App into Multimodal AI Hub

Building Multi-Modal AI Media Pipelines with Google DeepMind

Agentic AI Requires Embedded Compliance and Adaptive Oversight