Google Analytics 4 holds the behavioral data of every user on your website — but most teams only review it reactively, when something has already gone wrong. AI agents connected to GA4 change this dynamic entirely: instead of scheduled dashboard reviews, you get proactive anomaly detection, automated reporting, and natural language answers to business questions directly from your live analytics data.
For marketing teams, product managers, and growth engineers responsible for traffic and conversion metrics, Google Analytics AI integration makes data-driven decisions faster and more accessible across the entire organization.
What AI Agents Can Do With Google Analytics Access#
Traffic Intelligence
- Generate daily and weekly traffic summaries with automatic period-over-period comparisons
- Detect sudden drops in sessions, users, or conversions before stakeholders notice
- Identify which pages gained or lost the most traffic following a content or code change
- Surface the top acquisition channels driving qualified traffic in the current period
Conversion and Funnel Analysis
- Map drop-off points in checkout, signup, or any multi-step conversion sequence
- Compare conversion rates across traffic sources, devices, and landing pages
- Alert when a key conversion event stops firing — catching broken funnels immediately
- Identify pages with high traffic but low conversion rates for optimization targeting
Automated Reporting
- Send Monday morning traffic digests to Slack without manual report creation
- Generate executive summaries comparing this month to the prior quarter
- Track campaign performance automatically as new UTM-tagged traffic arrives
- Summarize geographic or device-based traffic shifts in plain language
Setting Up Google Analytics Data API Access#
pip install google-analytics-data langchain langchain-openai python-dotenv
Enable the API and Authenticate#
- Go to Google Cloud Console → APIs & Services → Enable APIs
- Search "Google Analytics Data API" and enable it
- Go to IAM & Admin → Service Accounts → Create service account → download the JSON key
- In Google Analytics → Admin → Property Access Management → add the service account email as Viewer
export GOOGLE_APPLICATION_CREDENTIALS="/path/to/service-account-key.json"
export GA4_PROPERTY_ID="123456789" # Numbers only — found in GA4 Admin → Property Details
Test your connection:
from google.analytics.data_v1beta import BetaAnalyticsDataClient
from google.analytics.data_v1beta.types import DateRange, Dimension, Metric, RunReportRequest
import os
PROPERTY_ID = os.getenv("GA4_PROPERTY_ID")
client = BetaAnalyticsDataClient()
request = RunReportRequest(
property=f"properties/{PROPERTY_ID}",
dimensions=[Dimension(name="date")],
metrics=[Metric(name="sessions")],
date_ranges=[DateRange(start_date="7daysAgo", end_date="yesterday")]
)
response = client.run_report(request)
print(f"Connected — {response.row_count} days of data returned")
Option 1: No-Code with n8n#
Automated Weekly Analytics Report#
- Schedule Trigger: Monday 8am
- Google Analytics node (n8n built-in): Fetch sessions, active users, and conversions for the past 7 days vs. the prior 7 days
- Code node: Calculate week-over-week percentage changes for each metric
- OpenAI: "Write a 5-bullet weekly website performance summary. Highlight significant changes. Suggest one action based on the data."
- Slack: Post to
#marketing-metricschannel
n8n's Google Analytics node handles GA4 OAuth authentication and dimension/metric queries with simple field mapping — no custom code needed for most reporting workflows.
Option 2: LangChain with Python#
Build Google Analytics Tools#
import os
from google.analytics.data_v1beta import BetaAnalyticsDataClient
from google.analytics.data_v1beta.types import (
DateRange, Dimension, Metric, RunReportRequest, RunRealtimeReportRequest
)
from langchain.tools import tool
from dotenv import load_dotenv
load_dotenv()
PROPERTY_ID = os.getenv("GA4_PROPERTY_ID")
client = BetaAnalyticsDataClient()
def run_ga_report(dimensions: list, metrics: list,
start_date: str = "7daysAgo",
end_date: str = "yesterday",
limit: int = 100) -> list:
"""Run a GA4 Data API report and return rows as list of dicts."""
request = RunReportRequest(
property=f"properties/{PROPERTY_ID}",
dimensions=[Dimension(name=d) for d in dimensions],
metrics=[Metric(name=m) for m in metrics],
date_ranges=[DateRange(start_date=start_date, end_date=end_date)],
limit=limit
)
response = client.run_report(request)
rows = []
for row in response.rows:
row_dict = {}
for i, dim in enumerate(dimensions):
row_dict[dim] = row.dimension_values[i].value
for i, met in enumerate(metrics):
row_dict[met] = row.metric_values[i].value
rows.append(row_dict)
return rows
@tool
def get_traffic_summary(days: int = 7) -> str:
"""Get a traffic summary for the past N days including sessions, users, pageviews, and bounce rate."""
rows = run_ga_report(
dimensions=["date"],
metrics=["sessions", "activeUsers", "screenPageViews", "bounceRate"],
start_date=f"{days}daysAgo"
)
if not rows:
return "No traffic data available"
total_sessions = sum(int(r["sessions"]) for r in rows)
total_users = sum(int(r["activeUsers"]) for r in rows)
total_pageviews = sum(int(r["screenPageViews"]) for r in rows)
avg_bounce = sum(float(r["bounceRate"]) for r in rows) / len(rows) if rows else 0
return (f"Traffic summary (last {days} days):\n"
f"Sessions: {total_sessions:,}\n"
f"Active Users: {total_users:,}\n"
f"Pageviews: {total_pageviews:,}\n"
f"Avg Bounce Rate: {avg_bounce:.1%}")
@tool
def get_top_pages(days: int = 7, limit: int = 10) -> str:
"""Get top pages by pageviews for the past N days."""
rows = run_ga_report(
dimensions=["pagePath", "pageTitle"],
metrics=["screenPageViews", "activeUsers", "averageSessionDuration"],
start_date=f"{days}daysAgo",
limit=limit
)
if not rows:
return "No page data available"
lines = [f"Top {limit} pages (last {days} days):"]
for i, row in enumerate(rows, 1):
views = int(row["screenPageViews"])
users = int(row["activeUsers"])
path = row["pagePath"][:60]
lines.append(f" {i}. {path} | {views:,} views | {users:,} users")
return "\n".join(lines)
@tool
def get_traffic_by_channel(days: int = 30) -> str:
"""Get sessions broken down by marketing channel (Organic Search, Direct, Referral, Paid, etc.)."""
rows = run_ga_report(
dimensions=["sessionDefaultChannelGroup"],
metrics=["sessions", "activeUsers", "conversions"],
start_date=f"{days}daysAgo"
)
if not rows:
return "No channel data available"
total_sessions = sum(int(r["sessions"]) for r in rows)
lines = [f"Traffic by channel (last {days} days):"]
for row in sorted(rows, key=lambda x: int(x["sessions"]), reverse=True):
sessions = int(row["sessions"])
pct = sessions / total_sessions * 100 if total_sessions else 0
conversions = int(row["conversions"])
channel = row["sessionDefaultChannelGroup"]
lines.append(f" {channel}: {sessions:,} sessions ({pct:.1f}%) | {conversions:,} conversions")
return "\n".join(lines)
@tool
def detect_traffic_anomaly(threshold_pct: float = 20.0) -> str:
"""
Compare yesterday's traffic to the prior 7-day average to detect anomalies.
threshold_pct: percentage deviation to flag as anomalous (default 20%).
"""
yesterday_rows = run_ga_report(
dimensions=["date"],
metrics=["sessions", "activeUsers"],
start_date="yesterday", end_date="yesterday"
)
baseline_rows = run_ga_report(
dimensions=["date"],
metrics=["sessions", "activeUsers"],
start_date="8daysAgo", end_date="2daysAgo"
)
if not yesterday_rows or not baseline_rows:
return "Insufficient data for anomaly detection"
yesterday_sessions = int(yesterday_rows[0]["sessions"])
baseline_avg = sum(int(r["sessions"]) for r in baseline_rows) / len(baseline_rows)
deviation = ((yesterday_sessions - baseline_avg) / baseline_avg * 100) if baseline_avg else 0
status = "ANOMALY DETECTED" if abs(deviation) > threshold_pct else "NORMAL"
return (f"Traffic anomaly check:\n"
f"Yesterday: {yesterday_sessions:,} sessions\n"
f"7-day baseline avg: {baseline_avg:,.0f} sessions\n"
f"Deviation: {deviation:+.1f}%\n"
f"Status: {status}")
@tool
def get_realtime_users() -> str:
"""Get the number of users active on the site in the last 30 minutes by page."""
request = RunRealtimeReportRequest(
property=f"properties/{PROPERTY_ID}",
dimensions=[Dimension(name="unifiedScreenName")],
metrics=[Metric(name="activeUsers")],
limit=10
)
response = client.run_realtime_report(request)
total = sum(int(row.metric_values[0].value) for row in response.rows)
lines = [f"Active users right now: {total}"]
for row in response.rows[:5]:
page = row.dimension_values[0].value
users = int(row.metric_values[0].value)
lines.append(f" {page}: {users} users")
return "\n".join(lines)
Google Analytics Agent#
from langchain_openai import ChatOpenAI
from langchain.agents import create_tool_calling_agent, AgentExecutor
from langchain_core.prompts import ChatPromptTemplate
llm = ChatOpenAI(model="gpt-4o", temperature=0)
tools = [get_traffic_summary, get_top_pages, get_traffic_by_channel,
detect_traffic_anomaly, get_realtime_users]
prompt = ChatPromptTemplate.from_messages([
("system", f"""You are a web analytics assistant with access to Google Analytics 4 (Property: {PROPERTY_ID}).
When answering analytics questions:
1. Always specify the time period in your answer
2. Compare metrics to prior periods when possible to show trends
3. Translate raw numbers into business insights — not just "sessions increased 15%" but what that means
4. Flag any anomalies or unexpected patterns proactively
5. Suggest one actionable next step based on the data
Common GA4 dimensions: date, pagePath, sessionDefaultChannelGroup, deviceCategory, country
Common GA4 metrics: sessions, activeUsers, screenPageViews, bounceRate, conversions"""),
("human", "{input}"),
("placeholder", "{agent_scratchpad}"),
])
agent = create_tool_calling_agent(llm, tools, prompt)
executor = AgentExecutor(agent=agent, tools=tools, verbose=True, max_iterations=6)
# Morning analytics brief
result = executor.invoke({
"input": "Run my morning analytics check: yesterday's traffic vs the prior week, any anomalies, and our top 5 pages."
})
print(result["output"])
Rate Limits and Best Practices#
| GA4 Data API limit | Value |
|---|---|
| Tokens per project per day | 200,000 |
| Concurrent requests | 10 |
| Max rows per response | 250,000 |
| Realtime data delay | ~30 minutes |
Best practices:
- Cache daily summaries: Store yesterday's traffic summary once so the agent doesn't re-query on every conversation turn
- Use
NdaysAgonotation: Simpler and more reliable than calculating exact ISO dates —7daysAgo,yesterday,today - Limit dimensions per query: Each additional dimension increases token cost — request only what you need
- Handle sampling warnings: Check
response.metadata.sampling_metadatasand note sampled data in agent output for transparency
Next Steps#
- AI Agents BigQuery Integration — GA4 exports natively to BigQuery for deeper SQL-based analysis
- AI Agents Slack Integration — Send GA4 anomaly alerts and weekly digests to Slack automatically
- AI Agents Gmail Integration — Email daily analytics digests to stakeholders
- Build an AI Agent with LangChain — Complete agent framework tutorial