Build Research Agents + Writers for AI Content
Replace manual research and technical writing with modular AI: an exploratory deep research agent followed by a constrained writer workflow, avoiding slop via workflows over overkill agents.
Ditch AI Slop: Demand Precision Research for Valuable Content
Generic LLM outputs like ChatGPT's LinkedIn-style posts fail due to slop phrases ("del intricacies," "rapidly evolving"), vague generalizations ("most teams miss"), hallucinations, outdated info, and shallow meaninglessness. High-quality technical AI content requires deep research, expert writing, editing, and iteration—expensive and time-intensive for teams like Towards AI, which produces courses and videos.
Their solution: Automate with a deep research agent (searches web/tools, plans/pivots, cites sources) producing a summarized Markdown artifact, fed sequentially to a deterministic writer generating runnable code, relevant images, and structured articles. Built iteratively using it to create their own course, incorporating student feedback. Challenges include high precision/recall in research (avoid noise overload), hallucination reduction, and human-in-loop for relatable storytelling/jokes.
Key principle: Writer augmentation, not replacement—humans handle connection; AI handles grunt work. Off-the-shelf deep research tools (e.g., exhaustive web scrapers) gather too much noise for focused technical needs.
Common mistake: Over-relying on single LLMs without tooling/memory/context leads to unreliable text-only outputs. Fix: Augment with data injection, tools, memory for workflows.
Master the Autonomy Slider: Workflows Before Agents
AI engineering decisions hinge on constraints absent in traditional software: cost-per-task (model/architecture dependent), latency (reasoning models), quality, data privacy. Stack progresses: prompt engineering → context engineering → tools/orchestration → evals.
Use an "autonomy slider" to minimize complexity:
- Pure prompting if model knows the task (add few-shot examples for adaptation).
- Static context injection (<200k tokens, pre-cache for efficiency, e.g., Q&A on same report).
- Dynamic retrieval for private/recent/domain data (RAG-like).
- Workflows for conditional/parallel/looped steps with routers/judges.
Workflow defined: LLM + data/tools/memory/chained prompts/routers/parallelism/loops/majority voting. Reliable, low-cost, predetermined sequences. Example: Ticket handler—classify → route → draft → validate → send (fixed order, no dynamism needed).
Agent threshold: Needs autonomous actions + environment reaction + planning/tool selection. Use when branching dynamically (e.g., API calls/DB writes). But start simple—most "agent" needs are workflows.
Real-world pivot example: Client CRM marketing chatbot. Wanted multi-agent hype for grant; reality: Sequential plan → retrieve client data → generate → validate/fix. Built single agent + specialist tools (SMS/email/validation, each with own prompts/LLMs/evals). Keeps global context in one model, avoids inter-agent errors.
Pitfall: Context rot degrades performance pre-window limit (~200k tokens) due to lost-in-middle (models trained on needle-in-haystack retrieval, not holistic reasoning). Manage budget: Trim/summarize/retrieve selectively/cloudy compaction/delegate to tools/sub-agents.
Multi-agent trigger: >20 tools, massive context, autonomous sub-decisions, security (e.g., local hospital agents).
Decision framework (ask sequentially):
- Model knows? Prompt.
- Static external context? Inject/cache.
- Dynamic/unknown? Retrieve.
- Conditional paths? Workflow.
- Dynamic branching/actions? Agent.
- Overflow context? Multi-agent/tools.
AI products blend all: Workflows for reliability, agents for flexibility. Deep research exemplifies: Goal-driven, web-exploratory, iterative, citing—replaces human researcher.
Split for Success: Exploratory Research + Constrained Writing
Architecture rationale: Research demands flexibility (plan/search/inspect/pivot/iterate/synthesize); writing needs determinism (tone/structure/no slop). Conflict → Separate sequentially: Research agent → MD artifact → Writer workflow. No orchestration—simple script rerun if major changes; users run both or neither.
Research agent traits: High recall/precision, feedback loops (self/human), reliable citations. Handles web/API/user sources.
Writer traits: Structured output (code/images), hallucination-free, tone-compliant.
Build process lessons:
- Prototype early, use on real tasks (e.g., self-building course).
- Pivot on user feedback (e.g., sequential over alternating).
- Question everything: Worthwhile? (Yes, expensive human alternative.) Existing tools? (Too noisy.) Agent/workflow? (Split.) Communication? (Artifact handoff.).
Quality criteria: Runnable code, contextually relevant images, useful (not random), cited sources, no slop/vagueness. Eval via human feedback loops.
Prerequisites: AI engineering basics (prompting/tools), Python/TypeScript comfort. Fits early product dev: Automate content pipelines for education/SaaS.
Exercise: Fork their public GitHub repo (QR in workshop), input topic like "What is harness engineering?", run research → writer, iterate with human edits.
Notable quotes:
- "Most people are interested in building agents, but most of these agents that our clients want are actually somewhat super simple workflows." (On overhyping agents.)
- "The problem is that this context rot happens much before the actual context window limit... It worsens quite fast after like 200,000." (Explaining performance cliffs.)
- "We always try to start with questions and use our sort of autonomy slider." (Decision framework intro.)
- "AI products are never just you build an agent or a multi-agent crew... They basically combine all of that." (Holistic systems view.)
- "As a writer augmentation not to replace them... you need a human touch to make it relatable." (Human-AI balance.)
Key Takeaways
- Always slide autonomy minimally: Prompt → Context → Retrieval → Workflow → Agent → Multi-agent.
- Build workflows for fixed sequences (e.g., ticket handling); agents only for dynamic reactions.
- Combat context rot: Trim/summarize/delegate to tools/sub-agents before 200k tokens.
- For content automation, sequence exploratory research agent → constrained writer via shared MD artifact.
- Prototype with real use (e.g., self-generate course) for rapid iteration/feedback.
- Use tools as "specialists" with isolated prompts/evals to keep main agent context lean.
- Question viability first: Cost of humans vs. build; off-the-shelf noise vs. custom precision.
- Human-in-loop for relatability; AI for research grunt/scaffolding.