CRS AI Testing Reveals High Failure Rate for Legislative Summaries

The Limits of Off-the-Shelf AI in Legislative Work

After a two-year pilot program testing six different AI models on approximately 1,000 legislative bills, the Congressional Research Service (CRS) reported that less than 3% of the outputs met their internal standards for accuracy, coherence, relevance, and objectivity. Director Karen Donfried emphasized that current general-purpose models are insufficient for the high-stakes, non-partisan requirements of congressional research. The primary risks identified include hallucinations, outdated information, and inherent bias, which necessitate a "highly skilled human in the loop" to maintain the integrity of legislative products.

Strategic Investment and Future Governance

To bridge the gap between current performance and operational requirements, CRS is requesting $1.6 million in recurring funding for fiscal year 2027. This investment is earmarked for two specific objectives:

Infrastructure Upgrades: Transitioning to specialized, confidential models that can be trained on proprietary legislative data, rather than relying on public-facing chatbots.
Human Capital: Hiring five dedicated data scientists and AI developers to manage model integration and governance.

CRS is currently developing a formal AI governance framework to prioritize use cases where the technology can provide value without compromising quality. Planned testing for FY2027 includes evaluating ChatGPT, Claude, Google AI, Perplexity, and Microsoft Copilot, specifically for tasks like generating graphics and analyzing public comments on regulations.

The Tension Between Efficiency and Reliability

While lawmakers expressed concern over the low reliability of AI, the hearing underscored a broader tension within Congress: the rapid, widespread adoption of AI tools by staff versus the institutional need for accuracy. Despite reports of AI-generated content appearing in official submissions, leadership remains firm that AI should serve as an efficiency tool to augment the workforce—specifically by handling backlogs of summaries for legislation that does not reach the floor—rather than a replacement for human analysts. The consensus among committee members and leadership is that the "unreliable information landscape" makes the role of trusted, human-verified research more critical than ever.

The Limits of Off-the-Shelf AI in Legislative Work

Strategic Investment and Future Governance

The Tension Between Efficiency and Reliability

More from GovTech & Public-Sector Adoption

Secret Service Mobile Security Failures and Oversight Challenges

Education Department CIO Office Gutted by 2025 Reduction-in-Force

Accountability and Transparency in AI Infrastructure Expansion

Beyond the Human-in-the-Loop: Defining Meaningful AI Governance