šŸ¤–AI Agents Guide
TutorialsComparisonsReviewsExamplesIntegrationsUse CasesTemplatesGlossary
Get Started
šŸ¤–AI Agents Guide

Your comprehensive resource for understanding, building, and implementing AI Agents.

Learn

  • Tutorials
  • Glossary
  • Use Cases
  • Examples

Compare

  • Tool Comparisons
  • Reviews
  • Integrations
  • Templates

Company

  • About
  • Contact
  • Privacy Policy

Ā© 2026 AI Agents Guide. All rights reserved.

Home/Examples/AI Data Analyst Examples: 6 Real Setups
ExampleData12 min read

AI Data Analyst Examples: 6 Real Setups

Explore 6 AI data analyst agent examples covering natural language SQL generation, automated chart creation, anomaly detection, report generation, and business intelligence workflows. Includes Python code for building production-ready data analysis agents.

Data dashboard charts and graphs showing business analytics
Photo by Luke Chesser on Unsplash
By AI Agents Guide Team•February 28, 2026

Table of Contents

  1. Example 1: Natural Language SQL Agent
  2. Example 2: Automated Chart and Dashboard Generator
  3. Example 3: Anomaly Detection and Alerting Agent
  4. Example 4: Automated Weekly Business Report Generator
  5. Example 5: Customer Cohort Analysis Agent
  6. Example 6: Real-Time Dashboard Narrative Agent
  7. Choosing the Right Data Agent Architecture
  8. Getting Started
  9. Frequently Asked Questions
Laptop showing spreadsheet and chart data analysis
Photo by Carlos Muza on Unsplash

Data analysis is one of the highest-leverage applications for AI agents. The bottleneck in most organizations isn't data access — it's the time required to formulate queries, explore datasets, interpret results, and communicate findings. An AI data analysis agent compresses that cycle from hours to minutes.

These six examples cover the most impactful data agent patterns: natural language SQL, automated visualization, anomaly detection, recurring report generation, cohort analysis, and narrative business reporting. Each includes architecture details and realistic Python code.

For the foundational agent pattern that most of these implement, review ReAct reasoning and the Data Analyst Agent tutorial.


Example 1: Natural Language SQL Agent#

Use Case: Allow business users to query a database in plain English without writing SQL. The agent translates questions to SQL, executes safely, and explains results in plain language.

Architecture: LangChain SQLDatabase + create_sql_agent + read-only SQLAlchemy connection + result interpretation.

Key Implementation:

from langchain_openai import ChatOpenAI
from langchain_community.utilities import SQLDatabase
from langchain_community.agent_toolkits import create_sql_agent
from langchain_community.agent_toolkits.sql.toolkit import SQLDatabaseToolkit

# Connect to read-only database
db = SQLDatabase.from_uri(
    "postgresql+psycopg2://readonly_user:password@localhost/analytics",
    include_tables=["orders", "customers", "products", "revenue_daily"],
    sample_rows_in_table_info=2
)

llm = ChatOpenAI(model="gpt-4o", temperature=0)
toolkit = SQLDatabaseToolkit(db=db, llm=llm)

agent = create_sql_agent(
    llm=llm,
    toolkit=toolkit,
    agent_type="openai-tools",
    verbose=True,
    prefix="""You are a data analyst assistant. When answering questions:
    1. Always check the table schema before writing SQL
    2. Write efficient queries (use LIMIT 1000 unless aggregating)
    3. After getting results, explain what they mean in business terms
    4. If results are surprising, note potential data quality issues
    5. Never modify data — SELECT only""",
    max_iterations=8
)

# Business user queries
questions = [
    "What were our top 5 products by revenue last month?",
    "How did customer acquisition cost change quarter over quarter this year?",
    "Which customer segments have the highest 90-day retention rate?"
]

for question in questions:
    print(f"\nQuestion: {question}")
    result = agent.invoke({"input": question})
    print(f"Answer: {result['output']}")

Outcome: Business analysts and executives can explore data without SQL knowledge. The agent shows its reasoning (query, results, interpretation), building trust in the analysis. Processing time: 5–15 seconds per question.


Example 2: Automated Chart and Dashboard Generator#

Use Case: Given a dataset and analysis goal, automatically generate publication-quality charts with appropriate chart types, proper labels, and narrative explanations.

Architecture: Data loader → analysis agent (determines chart type) → Python code executor with matplotlib/plotly → output PNG/HTML files.

Key Implementation:

import pandas as pd
from openai import OpenAI
from langchain_experimental.tools import PythonREPLTool
from langchain.agents import create_react_agent, AgentExecutor
from langchain import hub
from langchain_openai import ChatOpenAI

client = OpenAI()

def determine_optimal_charts(df_info: dict, analysis_goal: str) -> list[dict]:
    """Determine the best chart types for the data and goal."""
    response = client.chat.completions.create(
        model="gpt-4o",
        response_format={"type": "json_object"},
        messages=[{
            "role": "user",
            "content": f"""Given this dataset and analysis goal, recommend 3-4 charts.
            Dataset: {df_info}
            Goal: {analysis_goal}
            Return JSON: {{"charts": [{{"type": str, "x_column": str, "y_column": str,
            "title": str, "rationale": str, "insight": str}}]}}"""
        }]
    )
    import json
    return json.loads(response.choices[0].message.content)["charts"]

def generate_chart_code(df_info: dict, chart_spec: dict) -> str:
    """Generate Python code to create a specific chart."""
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{
            "role": "user",
            "content": f"""Write Python code to create this chart using matplotlib or plotly.
            Chart spec: {chart_spec}
            Dataset info: {df_info}

            Code requirements:
            - Load data from 'df' variable (already exists)
            - Use plotly for interactive charts if time-series, matplotlib for static
            - Include title, axis labels, and source annotation
            - Save to './charts/{chart_spec["type"]}_{chart_spec["title"][:20]}.png'
            - Print: "Chart saved: [filename]"
            Return only the Python code."""
        }]
    )
    return response.choices[0].message.content

# Load data and generate charts
df = pd.read_csv("sales_data.csv")
df_info = {
    "columns": df.dtypes.to_dict(),
    "shape": df.shape,
    "sample": df.head(3).to_dict(),
    "date_range": f"{df['date'].min()} to {df['date'].max()}" if "date" in df.columns else "N/A"
}

chart_specs = determine_optimal_charts(df_info, "Understand revenue trends and product performance")
for spec in chart_specs:
    code = generate_chart_code(df_info, spec)
    print(f"Generating: {spec['title']}")
    print(f"Insight: {spec['insight']}")
    exec(code)  # In production, use sandboxed execution

Outcome: A full dashboard of contextually appropriate charts generated from raw data in under 2 minutes. The agent selects chart types based on data characteristics — line charts for time series, bar charts for comparisons, scatter plots for correlations.


Example 3: Anomaly Detection and Alerting Agent#

Use Case: Monitor a time-series metric (revenue, error rate, user signups) and automatically identify anomalies, explain their likely causes, and generate alert summaries.

Architecture: Scheduled data pull → statistical anomaly detection → AI explanation agent → alert routing.

Key Implementation:

import pandas as pd
import numpy as np
from scipy import stats
from anthropic import Anthropic

client = Anthropic()

def detect_anomalies(df: pd.DataFrame, column: str, threshold_z: float = 2.5) -> list[dict]:
    """Detect statistical anomalies in a time series column."""
    # Calculate rolling mean and standard deviation
    df = df.copy()
    df["rolling_mean"] = df[column].rolling(window=14, min_periods=7).mean()
    df["rolling_std"] = df[column].rolling(window=14, min_periods=7).std()
    df["z_score"] = (df[column] - df["rolling_mean"]) / df["rolling_std"]

    anomalies = df[df["z_score"].abs() > threshold_z].copy()
    anomalies["direction"] = anomalies["z_score"].apply(lambda z: "spike" if z > 0 else "drop")
    anomalies["deviation_pct"] = ((anomalies[column] - anomalies["rolling_mean"]) / anomalies["rolling_mean"] * 100).round(1)

    return anomalies[["date", column, "rolling_mean", "z_score", "direction", "deviation_pct"]].to_dict("records")

def explain_anomaly(anomaly: dict, metric_name: str, context: str) -> str:
    """Use AI to explain a detected anomaly and suggest investigation steps."""
    response = client.messages.create(
        model="claude-3-5-haiku-20241022",
        max_tokens=600,
        messages=[{
            "role": "user",
            "content": f"""Explain this metric anomaly for a business analyst:

            Metric: {metric_name}
            Date: {anomaly["date"]}
            Anomaly: {anomaly["direction"]} of {abs(anomaly["deviation_pct"])}% vs 14-day average
            Actual value: {anomaly[metric_name.lower().replace(" ", "_")]}
            Expected range: ~{anomaly["rolling_mean"]:.0f}

            Business context: {context}

            Provide:
            1. 3 most likely explanations (ranked by probability)
            2. 3 specific data sources to check to confirm root cause
            3. Recommended immediate actions
            Keep it concise — this goes to a Slack alert."""
        }]
    )
    return response.content[0].text

# Load metrics
df = pd.read_sql(
    "SELECT date, daily_revenue FROM revenue_daily WHERE date > current_date - 90 ORDER BY date",
    con=engine
)

anomalies = detect_anomalies(df, "daily_revenue", threshold_z=2.5)
context = "SaaS platform, monthly subscriptions, B2B customers, average deal $500/month"

for anomaly in anomalies[-5:]:  # Alert on last 5 anomalies
    explanation = explain_anomaly(anomaly, "daily_revenue", context)
    print(f"\n🚨 Anomaly detected: {anomaly['date']}")
    print(f"Revenue {anomaly['direction']}: {anomaly['deviation_pct']}% vs baseline")
    print(explanation)
    # post_to_slack(f"*Revenue Anomaly Detected*\n{explanation}")

Outcome: Proactive anomaly alerts with business-context explanations, not just raw statistical signals. The AI layer translates "2.8 standard deviations below mean" into "Revenue dropped 34% vs 14-day average — likely causes include payment processor outage, pricing change effect, or end-of-month churn spike."


Laptop showing spreadsheet and chart data analysis

Example 4: Automated Weekly Business Report Generator#

Use Case: Generate a comprehensive weekly business performance report automatically, combining data from multiple sources into a narrative summary with charts and key metrics.

Architecture: Parallel data pulls from multiple databases → metric computation → AI narrative generator → PDF/Notion export.

Key Implementation:

import asyncio
from anthropic import Anthropic
import pandas as pd

client = Anthropic()

async def fetch_weekly_metrics() -> dict:
    """Fetch all metrics needed for weekly report in parallel."""

    async def get_revenue_metrics():
        # Query your analytics DB
        return {
            "total_revenue": 142500,
            "vs_last_week_pct": 8.3,
            "vs_last_year_pct": 34.2,
            "top_product": "Enterprise Plan",
            "top_product_revenue": 85000
        }

    async def get_customer_metrics():
        return {
            "new_customers": 47,
            "churned_customers": 12,
            "net_new": 35,
            "mrr_change": 8750,
            "nps_score": 62,
            "active_users_7d": 1847
        }

    async def get_product_metrics():
        return {
            "feature_adoption": {"AI Chat": 0.73, "API": 0.51, "Webhooks": 0.38},
            "avg_sessions_per_user": 4.2,
            "p95_api_latency_ms": 234,
            "error_rate_pct": 0.12
        }

    revenue, customers, product = await asyncio.gather(
        get_revenue_metrics(), get_customer_metrics(), get_product_metrics()
    )
    return {"revenue": revenue, "customers": customers, "product": product}

def generate_report_narrative(metrics: dict, week_ending: str) -> str:
    """Generate narrative commentary on weekly metrics."""
    response = client.messages.create(
        model="claude-3-5-sonnet-20241022",
        max_tokens=2000,
        messages=[{
            "role": "user",
            "content": f"""Write a weekly business performance report for the week ending {week_ending}.

            Metrics: {metrics}

            Structure:
            # Weekly Performance Report — Week Ending {week_ending}

            ## Executive Summary (3 sentences max)
            ## Revenue Performance
            ## Customer Growth
            ## Product Health
            ## Key Concerns (if any)
            ## Next Week Focus

            Tone: Professional but direct. Highlight notable trends and concerns.
            Use specific numbers. Flag anything that needs leadership attention."""
        }]
    )
    return response.content[0].text

import asyncio
metrics = asyncio.run(fetch_weekly_metrics())
report = generate_report_narrative(metrics, "2026-02-28")

# Save and distribute
with open("reports/weekly_2026-02-28.md", "w") as f:
    f.write(report)

print("Weekly report generated and distributed")

Outcome: A complete weekly business report generated automatically every Monday morning, ready before the leadership standup. Analysis that previously took a data analyst 2–3 hours runs in under 5 minutes.


Example 5: Customer Cohort Analysis Agent#

Use Case: Analyze customer retention by acquisition cohort, identify patterns in churn timing, and surface actionable retention improvement opportunities.

Architecture: SQL agent (cohort queries) → cohort matrix builder → pattern analysis → recommendation generator.

Key Implementation:

from langchain_openai import ChatOpenAI
from langchain_core.tools import tool
import sqlite3, pandas as pd

@tool
def run_cohort_query(months_back: int = 12) -> str:
    """Generate and run cohort retention analysis SQL."""
    sql = f"""
    WITH cohorts AS (
        SELECT
            customer_id,
            DATE_TRUNC('month', first_order_date) as cohort_month,
            DATE_TRUNC('month', order_date) as order_month
        FROM orders o
        JOIN (
            SELECT customer_id, MIN(created_at) as first_order_date
            FROM orders GROUP BY customer_id
        ) first_orders USING (customer_id)
        WHERE o.order_date >= CURRENT_DATE - INTERVAL '{months_back} months'
    )
    SELECT
        cohort_month,
        EXTRACT(MONTH FROM AGE(order_month, cohort_month)) AS months_since_first,
        COUNT(DISTINCT customer_id) AS retained_customers
    FROM cohorts
    GROUP BY 1, 2
    ORDER BY 1, 2
    """
    conn = sqlite3.connect("analytics.db")
    df = pd.read_sql(sql, conn)
    return df.to_csv(index=False)

@tool
def analyze_retention_patterns(cohort_csv: str) -> str:
    """This tool signals the agent to analyze the cohort data provided."""
    return f"Ready to analyze cohort retention patterns from: {cohort_csv[:200]}..."

from langchain.agents import create_react_agent, AgentExecutor
from langchain import hub

llm = ChatOpenAI(model="gpt-4o", temperature=0)
tools = [run_cohort_query, analyze_retention_patterns]
agent = create_react_agent(llm=llm, tools=tools, prompt=hub.pull("hwchase17/react"))

executor = AgentExecutor(agent=agent, tools=tools, max_iterations=6, verbose=True)

result = executor.invoke({
    "input": """Analyze customer retention by cohort for the past 12 months.
    Identify: 1) Average retention at M1, M3, M6, M12
    2) Whether retention is improving or declining across recent cohorts
    3) The cohort with the best M3 retention and what was different about that period
    4) Specific recommendations to improve M1 retention
    Present as a structured analysis report."""
})
print(result["output"])

Outcome: A complete cohort retention analysis with trend detection and specific recommendations, completed in minutes rather than the hours a traditional SQL + spreadsheet workflow requires.


Example 6: Real-Time Dashboard Narrative Agent#

Use Case: Monitor live business dashboards and automatically generate natural language summaries when metrics cross thresholds or show unusual patterns, sending context-rich alerts.

Architecture: Dashboard API polling → threshold checking → AI narration → alert distribution.

Key Implementation:

import asyncio
from anthropic import Anthropic
from datetime import datetime
import json

client = Anthropic()

class DashboardNarrator:
    def __init__(self, thresholds: dict):
        self.thresholds = thresholds
        self.previous_metrics = {}

    def check_thresholds(self, current: dict) -> list[dict]:
        """Identify metrics that crossed thresholds or changed significantly."""
        alerts = []
        for metric, value in current.items():
            if metric in self.thresholds:
                threshold = self.thresholds[metric]
                if value < threshold.get("min", float("-inf")):
                    alerts.append({"metric": metric, "value": value, "type": "below_min",
                                   "threshold": threshold["min"], "severity": threshold.get("severity", "medium")})
                elif value > threshold.get("max", float("inf")):
                    alerts.append({"metric": metric, "value": value, "type": "above_max",
                                   "threshold": threshold["max"], "severity": threshold.get("severity", "medium")})

            # Check for significant change vs last reading
            if metric in self.previous_metrics:
                change_pct = (value - self.previous_metrics[metric]) / self.previous_metrics[metric] * 100
                if abs(change_pct) > 20:
                    alerts.append({"metric": metric, "value": value, "change_pct": round(change_pct, 1),
                                   "type": "significant_change", "severity": "high"})
        return alerts

    def narrate_alert(self, alert: dict, all_current_metrics: dict) -> str:
        """Generate a narrative explanation for a dashboard alert."""
        response = client.messages.create(
            model="claude-3-5-haiku-20241022",
            max_tokens=400,
            messages=[{
                "role": "user",
                "content": f"""Generate a concise, business-friendly alert for this dashboard signal:

                Alert: {json.dumps(alert)}
                Current dashboard state: {json.dumps(all_current_metrics)}
                Time: {datetime.now().strftime('%Y-%m-%d %H:%M UTC')}

                Include: What happened, likely cause, immediate action needed.
                Max 4 sentences. No jargon. Written for a business executive."""
            }]
        )
        return response.content[0].text

# Configure thresholds
narrator = DashboardNarrator(thresholds={
    "api_error_rate_pct": {"max": 1.0, "severity": "critical"},
    "checkout_conversion_pct": {"min": 2.5, "severity": "high"},
    "avg_order_value_usd": {"min": 45, "severity": "medium"},
    "active_sessions": {"min": 100, "severity": "medium"}
})

# Simulate dashboard polling
current_metrics = {
    "api_error_rate_pct": 3.2,  # Above threshold!
    "checkout_conversion_pct": 3.1,
    "avg_order_value_usd": 62,
    "active_sessions": 847,
    "revenue_today_usd": 8420
}

alerts = narrator.check_thresholds(current_metrics)
for alert in alerts:
    narrative = narrator.narrate_alert(alert, current_metrics)
    print(f"[{alert['severity'].upper()}] {alert['metric']}: {narrative}")
    # Send to Slack/PagerDuty based on severity

Outcome: Real-time dashboard alerts that come with business context rather than just raw metric values. On-call teams spend less time interpreting numbers and more time taking action.


Choosing the Right Data Agent Architecture#

Use the SQL agent pattern (Examples 1, 5) for exploratory analysis where the user or agent needs to formulate queries dynamically. Use scheduled pipeline agents (Examples 3, 4) for recurring reporting and monitoring. Use the code execution pattern (Example 2) when you need to produce visual artifacts. The narration pattern (Example 6) works best for real-time monitoring where the bottleneck is human interpretation speed.

All of these agents work best with a read-only database connection — never give an analysis agent write access to production data.

Getting Started#

The Data Analyst Agent tutorial walks through building a full SQL analysis agent with safe execution. Install langchain-community for the SQLDatabase integration and pandas for data manipulation. For anomaly detection, scipy.stats provides the statistical foundation shown in Example 3.

For connecting multiple data sources, the LangChain tutorial covers multi-tool agents that can query different databases and APIs in the same reasoning loop.

Frequently Asked Questions#

The FAQ section renders from the frontmatter faq array above.

Related Examples

Agentic RAG Examples: 5 Real Workflows

Six agentic RAG examples with working Python code covering query routing, self-correcting retrieval with hallucination detection, multi-document reranking, iterative retrieval with web fallback, conversational RAG with memory, and corrective RAG with grade-and-retry loops.

7 AI Agent Coding Examples (Real Projects)

Discover 7 real-world AI coding agent examples covering code review, PR generation, test writing, bug diagnosis, documentation generation, and refactoring automation. Each example includes architecture details and working code for engineering teams.

AI Agent E-Commerce Examples: 7 Workflows

Six practical AI agent examples for e-commerce covering product recommendation, inventory management, customer service returns, dynamic pricing, abandoned cart recovery, and review analysis. Each example includes architecture details and production-ready Python code snippets.

← Back to All Examples