Build Research Agents + Writers for AI Content

Ditch AI Slop: Demand Precision Research for Valuable Content

Generic LLM outputs like ChatGPT's LinkedIn-style posts fail due to slop phrases ("del intricacies," "rapidly evolving"), vague generalizations ("most teams miss"), hallucinations, outdated info, and shallow meaninglessness. High-quality technical AI content requires deep research, expert writing, editing, and iteration—expensive and time-intensive for teams like Towards AI, which produces courses and videos.

Their solution: Automate with a deep research agent (searches web/tools, plans/pivots, cites sources) producing a summarized Markdown artifact, fed sequentially to a deterministic writer generating runnable code, relevant images, and structured articles. Built iteratively using it to create their own course, incorporating student feedback. Challenges include high precision/recall in research (avoid noise overload), hallucination reduction, and human-in-loop for relatable storytelling/jokes.

Key principle: Writer augmentation, not replacement—humans handle connection; AI handles grunt work. Off-the-shelf deep research tools (e.g., exhaustive web scrapers) gather too much noise for focused technical needs.

Common mistake: Over-relying on single LLMs without tooling/memory/context leads to unreliable text-only outputs. Fix: Augment with data injection, tools, memory for workflows.

Master the Autonomy Slider: Workflows Before Agents

AI engineering decisions hinge on constraints absent in traditional software: cost-per-task (model/architecture dependent), latency (reasoning models), quality, data privacy. Stack progresses: prompt engineering → context engineering → tools/orchestration → evals.

Use an "autonomy slider" to minimize complexity:

Pure prompting if model knows the task (add few-shot examples for adaptation).
Static context injection (<200k tokens, pre-cache for efficiency, e.g., Q&A on same report).
Dynamic retrieval for private/recent/domain data (RAG-like).
Workflows for conditional/parallel/looped steps with routers/judges.

Workflow defined: LLM + data/tools/memory/chained prompts/routers/parallelism/loops/majority voting. Reliable, low-cost, predetermined sequences. Example: Ticket handler—classify → route → draft → validate → send (fixed order, no dynamism needed).

Agent threshold: Needs autonomous actions + environment reaction + planning/tool selection. Use when branching dynamically (e.g., API calls/DB writes). But start simple—most "agent" needs are workflows.

Real-world pivot example: Client CRM marketing chatbot. Wanted multi-agent hype for grant; reality: Sequential plan → retrieve client data → generate → validate/fix. Built single agent + specialist tools (SMS/email/validation, each with own prompts/LLMs/evals). Keeps global context in one model, avoids inter-agent errors.

Pitfall: Context rot degrades performance pre-window limit (~200k tokens) due to lost-in-middle (models trained on needle-in-haystack retrieval, not holistic reasoning). Manage budget: Trim/summarize/retrieve selectively/cloudy compaction/delegate to tools/sub-agents.

Multi-agent trigger: >20 tools, massive context, autonomous sub-decisions, security (e.g., local hospital agents).

Decision framework (ask sequentially):

Model knows? Prompt.
Static external context? Inject/cache.
Dynamic/unknown? Retrieve.
Conditional paths? Workflow.
Dynamic branching/actions? Agent.
Overflow context? Multi-agent/tools.

AI products blend all: Workflows for reliability, agents for flexibility. Deep research exemplifies: Goal-driven, web-exploratory, iterative, citing—replaces human researcher.

Split for Success: Exploratory Research + Constrained Writing

Architecture rationale: Research demands flexibility (plan/search/inspect/pivot/iterate/synthesize); writing needs determinism (tone/structure/no slop). Conflict → Separate sequentially: Research agent → MD artifact → Writer workflow. No orchestration—simple script rerun if major changes; users run both or neither.

Research agent traits: High recall/precision, feedback loops (self/human), reliable citations. Handles web/API/user sources.

Writer traits: Structured output (code/images), hallucination-free, tone-compliant.

Build process lessons:

Prototype early, use on real tasks (e.g., self-building course).
Pivot on user feedback (e.g., sequential over alternating).
Question everything: Worthwhile? (Yes, expensive human alternative.) Existing tools? (Too noisy.) Agent/workflow? (Split.) Communication? (Artifact handoff.).

Quality criteria: Runnable code, contextually relevant images, useful (not random), cited sources, no slop/vagueness. Eval via human feedback loops.

Prerequisites: AI engineering basics (prompting/tools), Python/TypeScript comfort. Fits early product dev: Automate content pipelines for education/SaaS.

Exercise: Fork their public GitHub repo (QR in workshop), input topic like "What is harness engineering?", run research → writer, iterate with human edits.

Notable quotes:

"Most people are interested in building agents, but most of these agents that our clients want are actually somewhat super simple workflows." (On overhyping agents.)
"The problem is that this context rot happens much before the actual context window limit... It worsens quite fast after like 200,000." (Explaining performance cliffs.)
"We always try to start with questions and use our sort of autonomy slider." (Decision framework intro.)
"AI products are never just you build an agent or a multi-agent crew... They basically combine all of that." (Holistic systems view.)
"As a writer augmentation not to replace them... you need a human touch to make it relatable." (Human-AI balance.)

Key Takeaways

Always slide autonomy minimally: Prompt → Context → Retrieval → Workflow → Agent → Multi-agent.
Build workflows for fixed sequences (e.g., ticket handling); agents only for dynamic reactions.
Combat context rot: Trim/summarize/delegate to tools/sub-agents before 200k tokens.
For content automation, sequence exploratory research agent → constrained writer via shared MD artifact.
Prototype with real use (e.g., self-generate course) for rapid iteration/feedback.
Use tools as "specialists" with isolated prompts/evals to keep main agent context lean.
Question viability first: Cost of humans vs. build; off-the-shelf noise vs. custom precision.
Human-in-loop for relatability; AI for research grunt/scaffolding.