How to Build a Content Writing AI Agent

Build a content writing AI agent using CrewAI with three specialized agents: researcher, writer, and editor. This tutorial covers topic research, outline generation, SEO-optimized draft writing, and automated editing with full working Python code.

Person using MacBook Pro to write content at a desk
Photo by Glenn Carstens-Peters on Unsplash
Person writing notes on brown wooden table near a white ceramic mug
Photo by Unseen Studio on Unsplash

Content marketing teams spend hours on every article — researching the topic, generating an outline, writing a draft, fact-checking, and editing for clarity and SEO. A content writing AI agent can compress that timeline from hours to minutes by running a coordinated pipeline of specialized agents, each doing exactly one job well.

In this tutorial you will build a three-agent content pipeline using CrewAI. The crew includes a Researcher agent that gathers current information on the topic, a Writer agent that produces an SEO-optimized draft, and an Editor agent that refines the prose and ensures quality. You will get full working code, real output examples, and a production deployment checklist.

Prerequisites#

Before you start, ensure you have:

  • Python 3.10 or later
  • A Tavily API key (for web research — get one free at tavily.com)
  • An OpenAI API key
  • Familiarity with Python classes and basic agent concepts

Install dependencies:

pip install crewai crewai-tools tavily-python python-dotenv openai

Create a .env file:

OPENAI_API_KEY=sk-...
TAVILY_API_KEY=tvly-...

Architecture Overview#

CrewAI uses a role-based multi-agent architecture. Each agent has a defined role, backstory, and set of tools. A Crew coordinates agents through a sequential or hierarchical process.

For this content pipeline:

User Input (topic + target keyword)
        │
        ā–¼
ā”Œā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”
│  RESEARCHER   │  ← Tavily Search Tool, SEO Serp Tool
│  Agent        │    Gathers facts, stats, competitor angles
ā””ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”¬ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”˜
        │ Research report
        ā–¼
ā”Œā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”
│  WRITER       │  ← No external tools (uses context from researcher)
│  Agent        │    Produces SEO-optimized 1500-word draft
ā””ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”¬ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”˜
        │ Raw draft
        ā–¼
ā”Œā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”
│  EDITOR       │  ← No external tools
│  Agent        │    Refines prose, checks facts, improves SEO
ā””ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”¬ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”˜
        │
        ā–¼
  Final Article (markdown)

The sequential process means each agent's output feeds directly into the next agent's context. CrewAI handles the prompt chaining automatically.

Step 1: Define the Research Tools#

The Researcher agent uses two tools: a Tavily web search for current information and a custom SEO analysis tool that examines competitor headings.

# tools/research_tools.py
from crewai_tools import TavilySearchTool
from langchain.tools import tool
import json

# Built-in CrewAI wrapper for Tavily
tavily_tool = TavilySearchTool()

@tool("SEO Competitor Analysis")
def seo_analysis_tool(keyword: str) -> str:
    """
    Analyze top-ranking content for a keyword to identify common headings,
    content gaps, and SEO opportunities.

    Args:
        keyword: The target SEO keyword to analyze.

    Returns:
        A JSON string with common H2 headings, average word count estimate,
        and suggested content angles.
    """
    # In production, this would call a real SERP API like DataForSEO or SerpApi.
    # Here we use Tavily to approximate competitor research.
    from tavily import TavilyClient
    import os

    client = TavilyClient(api_key=os.getenv("TAVILY_API_KEY"))

    results = client.search(
        query=f"{keyword} complete guide",
        search_depth="advanced",
        max_results=5,
        include_raw_content=False,
    )

    competitor_data = {
        "keyword": keyword,
        "top_results": [
            {
                "title": r.get("title"),
                "url": r.get("url"),
                "snippet": r.get("content", "")[:200],
            }
            for r in results.get("results", [])
        ],
        "seo_insights": (
            "Focus on practical examples, step-by-step structure, and "
            "address common pain points. Include FAQ section targeting "
            "People Also Ask results."
        ),
    }

    return json.dumps(competitor_data, indent=2)

Step 2: Define the Three Agents#

Each agent has a concise role, a detailed goal, and a backstory that shapes its persona and writing style.

# agents/content_agents.py
from crewai import Agent
from langchain_openai import ChatOpenAI
from tools.research_tools import tavily_tool, seo_analysis_tool

# Shared LLM — use GPT-4o for quality, GPT-4o-mini for cost optimization
llm = ChatOpenAI(model="gpt-4o", temperature=0.3)

researcher = Agent(
    role="Senior Content Researcher",
    goal=(
        "Research the given topic thoroughly. Find current statistics, expert opinions, "
        "real-world examples, and identify the key questions the audience is asking. "
        "Produce a structured research brief that the writer can use directly."
    ),
    backstory=(
        "You are an experienced digital research analyst with a background in content marketing. "
        "You know how to identify the most credible, up-to-date sources and extract insights "
        "that make articles genuinely useful rather than generic. You never fabricate statistics."
    ),
    tools=[tavily_tool, seo_analysis_tool],
    llm=llm,
    verbose=True,
    allow_delegation=False,
)

writer = Agent(
    role="SEO Content Writer",
    goal=(
        "Using the research brief, write a comprehensive, SEO-optimized article of at least "
        "1,500 words. The article must include a compelling introduction, clear H2/H3 structure, "
        "the target keyword in the first 100 words and in at least two H2 headings, "
        "practical examples, and a FAQ section."
    ),
    backstory=(
        "You are a senior content writer who has written for leading SaaS and tech publications. "
        "Your articles rank because they combine deep expertise with exceptional clarity. "
        "You write for humans first, search engines second."
    ),
    tools=[],
    llm=llm,
    verbose=True,
    allow_delegation=False,
)

editor = Agent(
    role="Content Editor and SEO Specialist",
    goal=(
        "Review the draft article for clarity, factual accuracy, SEO optimization, and readability. "
        "Fix passive voice, improve transitions, ensure the keyword density is natural (1-2%), "
        "add or improve internal linking suggestions, and produce a final publication-ready version."
    ),
    backstory=(
        "You are a meticulous editor with 10 years of experience in digital publishing. "
        "You have an eye for weak arguments, vague claims, and missed SEO opportunities. "
        "Your edits make good articles great without losing the author's voice."
    ),
    tools=[],
    llm=llm,
    verbose=True,
    allow_delegation=False,
)

Step 3: Define the Tasks#

CrewAI tasks give each agent a specific deliverable with clear expected output. The context parameter creates the data dependency between agents.

# tasks/content_tasks.py
from crewai import Task
from agents.content_agents import researcher, writer, editor

def create_content_tasks(topic: str, target_keyword: str, target_audience: str):

    research_task = Task(
        description=f"""
        Research the following topic thoroughly:
        - Topic: {topic}
        - Target Keyword: {target_keyword}
        - Target Audience: {target_audience}

        Your deliverables:
        1. Use the Tavily search tool to find 5+ current, credible sources.
        2. Use the SEO Competitor Analysis tool to understand what top-ranking content covers.
        3. Compile a structured research brief including:
           - Key facts and statistics (with sources)
           - Common audience questions and pain points
           - 5-7 recommended H2 section headings
           - 3 unique angles or insights not covered by competitors
        """,
        expected_output=(
            "A structured markdown research brief with sections: Key Facts & Stats, "
            "Audience Pain Points, Recommended Outline, Competitor Gaps."
        ),
        agent=researcher,
    )

    writing_task = Task(
        description=f"""
        Using the research brief provided, write a complete SEO-optimized article.

        Requirements:
        - Target keyword: {target_keyword} (use naturally in intro, 2+ H2s, conclusion)
        - Length: 1,500-2,000 words
        - Structure: H1 title, introduction (150 words), 5-7 H2 sections, FAQ (4+ questions), conclusion
        - Tone: Expert but accessible, practical, actionable
        - Include real examples and concrete code or workflow examples where relevant
        - End with a strong call to action
        """,
        expected_output=(
            "A complete markdown article ready for editorial review, including title, "
            "all sections, FAQ, and conclusion."
        ),
        agent=writer,
        context=[research_task],
    )

    editing_task = Task(
        description=f"""
        Edit and finalize the draft article for publication.

        Your editorial checklist:
        1. Fix grammar, spelling, and punctuation errors
        2. Improve sentence variety and eliminate passive voice where possible
        3. Verify keyword '{target_keyword}' appears naturally in intro, headings, and conclusion
        4. Suggest 3+ internal links as [anchor text](/suggested-path/) placeholders
        5. Confirm FAQ answers are complete and accurate based on the research
        6. Add a meta description (155 characters max) at the top of your output
        7. Add a suggested title tag (60 characters max) at the top of your output

        Return the complete, edited article in markdown format.
        """,
        expected_output=(
            "Publication-ready markdown article with meta description, title tag, "
            "all edits applied, and internal link suggestions."
        ),
        agent=editor,
        context=[research_task, writing_task],
    )

    return [research_task, writing_task, editing_task]

Step 4: Assemble and Run the Crew#

# main.py
import os
from dotenv import load_dotenv
from crewai import Crew, Process
from agents.content_agents import researcher, writer, editor
from tasks.content_tasks import create_content_tasks

load_dotenv()

def generate_article(
    topic: str,
    target_keyword: str,
    target_audience: str = "marketing professionals"
) -> str:
    """
    Run the full content creation crew and return the final article.
    """
    tasks = create_content_tasks(topic, target_keyword, target_audience)

    crew = Crew(
        agents=[researcher, writer, editor],
        tasks=tasks,
        process=Process.sequential,
        verbose=True,
    )

    result = crew.kickoff()
    return str(result)


if __name__ == "__main__":
    article = generate_article(
        topic="How AI agents are transforming B2B content marketing in 2026",
        target_keyword="AI agents content marketing",
        target_audience="B2B marketing managers and content strategists",
    )

    # Save to file
    with open("output/article_draft.md", "w") as f:
        f.write(article)

    print("\n--- FINAL ARTICLE ---\n")
    print(article[:2000])  # Preview first 2000 characters
    print("\n[Full article saved to output/article_draft.md]")

Person writing notes on brown wooden table near a white ceramic mug

Photo by Unseen Studio on Unsplash

Sample Output#

When you run the crew against the topic above, you get output similar to this:

--- CREW EXECUTION LOG ---

[Researcher] Starting research on: AI agents content marketing
[Researcher] Calling Tavily Search: "AI agents B2B content marketing 2026 statistics"
[Researcher] Calling SEO Analysis: "AI agents content marketing"
[Researcher] Compiled research brief (847 words)

[Writer] Starting article draft based on research brief
[Writer] Draft complete: 1,743 words, 6 H2 sections, 5 FAQ items

[Editor] Reviewing draft...
[Editor] Applied 23 edits. Added 4 internal link suggestions.
[Editor] Final article: 1,698 words (after tightening), keyword density: 1.4%

--- FINAL OUTPUT ---
Meta description: Learn how AI agents are transforming B2B content marketing...
Title tag: AI Agents in Content Marketing: 2026 Complete Guide

# AI Agents in Content Marketing: The Complete 2026 Guide
...

Total execution time for a 1,600-word article: approximately 90–120 seconds.

Customizing the Pipeline#

Adjusting tone: Modify the writer agent's backstory to target different publication styles. For a technical developer audience, emphasize precision over narrative flow. For a consumer brand, emphasize conversational language.

Adding a fact-checker agent: Insert a fourth agent between the writer and editor that cross-references claims against the research brief:

fact_checker = Agent(
    role="Fact-Checker",
    goal="Verify every statistic and claim in the draft against the research brief. Flag unsupported claims.",
    backstory="You are a rigorous fact-checker with a journalism background.",
    tools=[tavily_tool],
    llm=llm,
    verbose=True,
    allow_delegation=False,
)

Batch processing: To generate content at scale, wrap the generate_article() call in a loop over a topics list and add rate limiting with time.sleep(5) between runs to stay within API quotas.

Saving to a CMS: After generation, post the article directly to WordPress, Webflow, or your headless CMS using their REST APIs. The output is already in markdown — most CMSes accept it directly.

Production Considerations#

Token costs: A full crew run for a 1,500-word article uses approximately 15,000–25,000 tokens with GPT-4o. At current pricing this is around $0.10–$0.20 per article. For high-volume use, switch the writer and editor to gpt-4o-mini and reserve gpt-4o for the researcher only.

Content quality guardrails: Add a post-processing validation step that checks minimum word count, keyword presence, and FAQ completeness before saving:

def validate_article(article: str, keyword: str) -> dict:
    word_count = len(article.split())
    keyword_count = article.lower().count(keyword.lower())
    has_faq = "## frequently asked" in article.lower() or "## faq" in article.lower()

    return {
        "word_count": word_count,
        "keyword_count": keyword_count,
        "has_faq": has_faq,
        "passes_validation": word_count >= 1200 and keyword_count >= 3 and has_faq,
    }

Idempotency: Store a hash of each (topic, keyword) pair so you never generate duplicate content. Use a simple SQLite database or a key-value store like Redis.

Further Reading#

Frequently Asked Questions#

Can this pipeline generate content in languages other than English? Yes. Update the writer agent's goal to specify the target language, and CrewAI will produce output in that language. The Tavily search tool also supports multi-language queries — pass the language parameter in the search call.

How do I prevent the agents from fabricating statistics? Two approaches work well in practice. First, the researcher agent's backstory explicitly states "never fabricate statistics." Second, the editor agent is instructed to flag unsupported claims. For highest fidelity, add a fourth fact-checker agent with access to Tavily to verify claims before final output.

What is the difference between CrewAI sequential and hierarchical process? In sequential process, agents run one after another and each receives the prior agent's output as context. In hierarchical process, a manager LLM dynamically decides which agent to delegate to and can route tasks based on intermediate results. For a linear content pipeline like this one, sequential is simpler and more predictable.

Can I run this without OpenAI using a local model? Yes. Replace ChatOpenAI with ChatOllama from langchain_community and point it at a locally running Ollama instance with a capable model like llama3.1:70b. Note that smaller models produce noticeably lower quality content; 70B+ parameter models are recommended for production content generation.