Build Research Agents + Writers for AI Content

Replace manual research and technical writing with modular AI: an exploratory deep research agent followed by a constrained writer workflow, avoiding slop via workflows over overkill agents.

Ditch AI Slop: Demand Precision Research for Valuable Content

Generic LLM outputs like ChatGPT's LinkedIn-style posts fail due to slop phrases ("del intricacies," "rapidly evolving"), vague generalizations ("most teams miss"), hallucinations, outdated info, and shallow meaninglessness. High-quality technical AI content requires deep research, expert writing, editing, and iteration—expensive and time-intensive for teams like Towards AI, which produces courses and videos.

Their solution: Automate with a deep research agent (searches web/tools, plans/pivots, cites sources) producing a summarized Markdown artifact, fed sequentially to a deterministic writer generating runnable code, relevant images, and structured articles. Built iteratively using it to create their own course, incorporating student feedback. Challenges include high precision/recall in research (avoid noise overload), hallucination reduction, and human-in-loop for relatable storytelling/jokes.

Key principle: Writer augmentation, not replacement—humans handle connection; AI handles grunt work. Off-the-shelf deep research tools (e.g., exhaustive web scrapers) gather too much noise for focused technical needs.

Common mistake: Over-relying on single LLMs without tooling/memory/context leads to unreliable text-only outputs. Fix: Augment with data injection, tools, memory for workflows.

Master the Autonomy Slider: Workflows Before Agents

AI engineering decisions hinge on constraints absent in traditional software: cost-per-task (model/architecture dependent), latency (reasoning models), quality, data privacy. Stack progresses: prompt engineering → context engineering → tools/orchestration → evals.

Use an "autonomy slider" to minimize complexity:

  1. Pure prompting if model knows the task (add few-shot examples for adaptation).
  2. Static context injection (<200k tokens, pre-cache for efficiency, e.g., Q&A on same report).
  3. Dynamic retrieval for private/recent/domain data (RAG-like).
  4. Workflows for conditional/parallel/looped steps with routers/judges.

Workflow defined: LLM + data/tools/memory/chained prompts/routers/parallelism/loops/majority voting. Reliable, low-cost, predetermined sequences. Example: Ticket handler—classify → route → draft → validate → send (fixed order, no dynamism needed).

Agent threshold: Needs autonomous actions + environment reaction + planning/tool selection. Use when branching dynamically (e.g., API calls/DB writes). But start simple—most "agent" needs are workflows.

Real-world pivot example: Client CRM marketing chatbot. Wanted multi-agent hype for grant; reality: Sequential plan → retrieve client data → generate → validate/fix. Built single agent + specialist tools (SMS/email/validation, each with own prompts/LLMs/evals). Keeps global context in one model, avoids inter-agent errors.

Pitfall: Context rot degrades performance pre-window limit (~200k tokens) due to lost-in-middle (models trained on needle-in-haystack retrieval, not holistic reasoning). Manage budget: Trim/summarize/retrieve selectively/cloudy compaction/delegate to tools/sub-agents.

Multi-agent trigger: >20 tools, massive context, autonomous sub-decisions, security (e.g., local hospital agents).

Decision framework (ask sequentially):

  • Model knows? Prompt.
  • Static external context? Inject/cache.
  • Dynamic/unknown? Retrieve.
  • Conditional paths? Workflow.
  • Dynamic branching/actions? Agent.
  • Overflow context? Multi-agent/tools.

AI products blend all: Workflows for reliability, agents for flexibility. Deep research exemplifies: Goal-driven, web-exploratory, iterative, citing—replaces human researcher.

Split for Success: Exploratory Research + Constrained Writing

Architecture rationale: Research demands flexibility (plan/search/inspect/pivot/iterate/synthesize); writing needs determinism (tone/structure/no slop). Conflict → Separate sequentially: Research agent → MD artifact → Writer workflow. No orchestration—simple script rerun if major changes; users run both or neither.

Research agent traits: High recall/precision, feedback loops (self/human), reliable citations. Handles web/API/user sources.

Writer traits: Structured output (code/images), hallucination-free, tone-compliant.

Build process lessons:

  • Prototype early, use on real tasks (e.g., self-building course).
  • Pivot on user feedback (e.g., sequential over alternating).
  • Question everything: Worthwhile? (Yes, expensive human alternative.) Existing tools? (Too noisy.) Agent/workflow? (Split.) Communication? (Artifact handoff.).

Quality criteria: Runnable code, contextually relevant images, useful (not random), cited sources, no slop/vagueness. Eval via human feedback loops.

Prerequisites: AI engineering basics (prompting/tools), Python/TypeScript comfort. Fits early product dev: Automate content pipelines for education/SaaS.

Exercise: Fork their public GitHub repo (QR in workshop), input topic like "What is harness engineering?", run research → writer, iterate with human edits.

Notable quotes:

  • "Most people are interested in building agents, but most of these agents that our clients want are actually somewhat super simple workflows." (On overhyping agents.)
  • "The problem is that this context rot happens much before the actual context window limit... It worsens quite fast after like 200,000." (Explaining performance cliffs.)
  • "We always try to start with questions and use our sort of autonomy slider." (Decision framework intro.)
  • "AI products are never just you build an agent or a multi-agent crew... They basically combine all of that." (Holistic systems view.)
  • "As a writer augmentation not to replace them... you need a human touch to make it relatable." (Human-AI balance.)

Key Takeaways

  • Always slide autonomy minimally: Prompt → Context → Retrieval → Workflow → Agent → Multi-agent.
  • Build workflows for fixed sequences (e.g., ticket handling); agents only for dynamic reactions.
  • Combat context rot: Trim/summarize/delegate to tools/sub-agents before 200k tokens.
  • For content automation, sequence exploratory research agent → constrained writer via shared MD artifact.
  • Prototype with real use (e.g., self-generate course) for rapid iteration/feedback.
  • Use tools as "specialists" with isolated prompts/evals to keep main agent context lean.
  • Question viability first: Cost of humans vs. build; off-the-shelf noise vs. custom precision.
  • Human-in-loop for relatability; AI for research grunt/scaffolding.

Summarized by x-ai/grok-4.1-fast via openrouter

8333 input / 2412 output tokens in 23353ms

© 2026 Edge