PagerDuty is the incident command center for engineering teams — and the minutes between alert firing and human response are the most expensive minutes in an outage. AI agents connected to PagerDuty can dramatically shrink that gap by triaging alerts before the on-call engineer even reaches their laptop: gathering context, identifying similar past incidents, checking deployment history, and adding a prioritized action list directly to the incident.
For DevOps teams, SREs, and engineering organizations with on-call rotations, PagerDuty AI integration transforms reactive firefighting into structured, context-rich incident response.
What AI Agents Can Do With PagerDuty Access#
Incident Intelligence
- Query all active incidents by service, severity, and time open without opening the PagerDuty UI
- Identify the highest-priority open incident for immediate escalation
- Summarize incident history to spot recurring failures in the same service
- Check whether a new alert matches a pattern from a recent resolved incident
Automated Triage Support
- Add enrichment notes to new incidents with context pulled from monitoring, deployments, and logs
- Check on-call schedules to identify who should be notified and when escalation triggers
- Surface related incidents from the past 30 days that share the same service or error pattern
- Generate an initial investigation checklist based on the service and incident type
Post-Incident Automation
- Generate post-mortem templates pre-filled with incident timeline, duration, and responders
- Summarize the resolution steps from incident notes into a clean timeline
- Identify the mean time to acknowledge (MTTA) and mean time to resolve (MTTR) for recent incidents
- Create action items from retrospective notes as follow-up tasks
Setting Up PagerDuty API Access#
pip install requests langchain langchain-openai python-dotenv
Get your API key from PagerDuty → Account Settings → API Access Keys → Create New API Key:
export PAGERDUTY_API_KEY="your-api-key"
export PAGERDUTY_FROM_EMAIL="your-email@company.com" # Required for write operations
Test your connection:
import os, requests
PAGERDUTY_API_KEY = os.getenv("PAGERDUTY_API_KEY")
BASE_URL = "https://api.pagerduty.com"
headers = {
"Authorization": f"Token token={PAGERDUTY_API_KEY}",
"Accept": "application/vnd.pagerduty+json;version=2",
"Content-Type": "application/json"
}
resp = requests.get(f"{BASE_URL}/incidents?statuses[]=triggered&statuses[]=acknowledged&limit=5",
headers=headers)
data = resp.json()
print(f"Active incidents: {data['total']}")
for inc in data.get("incidents", []):
print(f" [{inc['urgency'].upper()}] {inc['title']} — {inc['service']['summary']}")
Option 1: No-Code with n8n#
Incident Enrichment Workflow#
- PagerDuty Trigger (n8n built-in): Fires when a new incident is created or triggered
- HTTP Request: Query Datadog or Grafana for recent metric spikes on the affected service
- HTTP Request: Query GitHub Deployments API for recent deploys to the affected service
- OpenAI: "Given this incident alert and recent data, write a 3-bullet initial investigation note. Include the most likely cause and first step to check."
- PagerDuty node: Add the OpenAI response as a note to the incident
n8n's PagerDuty node handles authentication and supports triggers for incident creation, acknowledgment, and resolution events.
Option 2: LangChain with Python#
Build PagerDuty Tools#
import os
import requests
from datetime import datetime, timedelta, timezone
from langchain.tools import tool
from dotenv import load_dotenv
load_dotenv()
API_KEY = os.getenv("PAGERDUTY_API_KEY")
FROM_EMAIL = os.getenv("PAGERDUTY_FROM_EMAIL")
BASE_URL = "https://api.pagerduty.com"
def pd_request(method: str, endpoint: str, params: dict = None,
json_data: dict = None) -> dict:
"""Execute a PagerDuty API request."""
headers = {
"Authorization": f"Token token={API_KEY}",
"Accept": "application/vnd.pagerduty+json;version=2",
"Content-Type": "application/json"
}
if json_data and FROM_EMAIL:
headers["From"] = FROM_EMAIL
resp = requests.request(
method, f"{BASE_URL}/{endpoint}",
headers=headers, params=params or {}, json=json_data
)
resp.raise_for_status()
return resp.json() if resp.content else {}
@tool
def get_active_incidents(urgency: str = None) -> str:
"""
Get currently active (triggered or acknowledged) incidents.
urgency: filter by 'high' or 'low' (optional).
"""
params = {
"statuses[]": ["triggered", "acknowledged"],
"limit": 25,
"sort_by": "urgency:desc,created_at:desc"
}
if urgency:
params["urgencies[]"] = [urgency]
data = pd_request("GET", "incidents", params=params)
incidents = data.get("incidents", [])
if not incidents:
return "No active incidents"
lines = [f"Active incidents ({len(incidents)} found):"]
for inc in incidents:
created = inc.get("created_at", "")
if created:
dt = datetime.fromisoformat(created.replace("Z", "+00:00"))
age_mins = int((datetime.now(timezone.utc) - dt).total_seconds() / 60)
age_str = f"{age_mins}m" if age_mins < 60 else f"{age_mins // 60}h {age_mins % 60}m"
else:
age_str = "unknown"
service = inc.get("service", {}).get("summary", "Unknown service")
status = inc.get("status", "unknown")
urgency = inc.get("urgency", "low")
title = inc.get("title", "No title")[:80]
inc_id = inc.get("id", "")
lines.append(f" [{urgency.upper()}] [{status}] {title}\n"
f" Service: {service} | Open: {age_str} | ID: {inc_id}")
return "\n".join(lines)
@tool
def get_incident_details(incident_id: str) -> str:
"""Get full details for a specific PagerDuty incident including alerts and notes."""
data = pd_request("GET", f"incidents/{incident_id}")
inc = data.get("incident", {})
# Get notes
notes_data = pd_request("GET", f"incidents/{incident_id}/notes")
notes = notes_data.get("notes", [])
service = inc.get("service", {}).get("summary", "Unknown")
assigned = [a.get("assignee", {}).get("summary", "?") for a in inc.get("assignments", [])]
lines = [
f"Incident: {inc.get('title', 'No title')}",
f"Status: {inc.get('status')} | Urgency: {inc.get('urgency')}",
f"Service: {service}",
f"Assigned to: {', '.join(assigned) if assigned else 'Unassigned'}",
f"Created: {inc.get('created_at', 'Unknown')}",
f"Last updated: {inc.get('last_status_change_at', 'Unknown')}",
]
if notes:
lines.append(f"\nNotes ({len(notes)}):")
for note in notes[-3:]: # Show last 3 notes
author = note.get("user", {}).get("summary", "Unknown")
content = note.get("content", "")[:200]
lines.append(f" [{author}]: {content}")
return "\n".join(lines)
@tool
def acknowledge_incident(incident_id: str) -> str:
"""Acknowledge a PagerDuty incident to stop escalation and signal it's being investigated."""
data = pd_request("PUT", f"incidents/{incident_id}",
json_data={"incident": {"type": "incident", "status": "acknowledged"}})
inc = data.get("incident", {})
return f"Incident {incident_id} acknowledged — status: {inc.get('status', 'unknown')}"
@tool
def add_incident_note(incident_id: str, note_text: str) -> str:
"""Add an investigative note to a PagerDuty incident for the response team."""
data = pd_request("POST", f"incidents/{incident_id}/notes",
json_data={"note": {"content": note_text}})
note = data.get("note", {})
return f"Note added to incident {incident_id} (Note ID: {note.get('id', 'unknown')})"
@tool
def get_on_call_now(schedule_id: str = None) -> str:
"""
Get who is currently on call. If schedule_id is provided, check a specific schedule.
Otherwise returns on-call info for all schedules.
"""
params = {"time_zone": "UTC", "include[]": ["users"]}
if schedule_id:
params["schedule_ids[]"] = [schedule_id]
data = pd_request("GET", "oncalls", params=params)
on_calls = data.get("oncalls", [])
if not on_calls:
return "No on-call information found"
lines = ["Currently on call:"]
seen = set()
for entry in on_calls[:10]:
user = entry.get("user", {})
user_name = user.get("summary", "Unknown")
schedule = entry.get("schedule", {})
schedule_name = schedule.get("summary", "Ad hoc") if schedule else "Ad hoc"
escalation_level = entry.get("escalation_level", 1)
key = f"{user_name}-{schedule_name}"
if key not in seen:
lines.append(f" Level {escalation_level}: {user_name} ({schedule_name})")
seen.add(key)
return "\n".join(lines)
@tool
def get_recent_incident_history(service_name: str = None, days: int = 7) -> str:
"""
Get incident history for the past N days to identify patterns and recurring failures.
service_name: filter by service name (optional partial match).
"""
since = (datetime.now(timezone.utc) - timedelta(days=days)).isoformat()
params = {
"since": since,
"statuses[]": ["resolved"],
"limit": 25,
"sort_by": "created_at:desc"
}
data = pd_request("GET", "incidents", params=params)
incidents = data.get("incidents", [])
if service_name:
incidents = [i for i in incidents
if service_name.lower() in i.get("service", {}).get("summary", "").lower()]
if not incidents:
return f"No resolved incidents in the last {days} days"
# Count by service
service_counts = {}
for inc in incidents:
svc = inc.get("service", {}).get("summary", "Unknown")
service_counts[svc] = service_counts.get(svc, 0) + 1
lines = [f"Incident history (last {days} days, {len(incidents)} total resolved):"]
lines.append("\nBy service:")
for svc, count in sorted(service_counts.items(), key=lambda x: x[1], reverse=True)[:5]:
lines.append(f" {svc}: {count} incidents")
lines.append("\nRecent incidents:")
for inc in incidents[:5]:
title = inc.get("title", "No title")[:60]
service = inc.get("service", {}).get("summary", "?")
duration_str = ""
created = inc.get("created_at", "")
resolved = inc.get("last_status_change_at", "")
if created and resolved:
created_dt = datetime.fromisoformat(created.replace("Z", "+00:00"))
resolved_dt = datetime.fromisoformat(resolved.replace("Z", "+00:00"))
duration = int((resolved_dt - created_dt).total_seconds() / 60)
duration_str = f" | {duration}min"
lines.append(f" {title} [{service}]{duration_str}")
return "\n".join(lines)
PagerDuty Incident Response Agent#
from langchain_openai import ChatOpenAI
from langchain.agents import create_tool_calling_agent, AgentExecutor
from langchain_core.prompts import ChatPromptTemplate
llm = ChatOpenAI(model="gpt-4o", temperature=0)
tools = [get_active_incidents, get_incident_details, acknowledge_incident,
add_incident_note, get_on_call_now, get_recent_incident_history]
prompt = ChatPromptTemplate.from_messages([
("system", """You are an incident response assistant with access to PagerDuty.
When responding to incident queries:
1. Start with the highest urgency and longest-open incidents first
2. Always check recent incident history to identify if this is a recurring failure pattern
3. When suggesting investigation steps, be specific: check X metric, look at Y log, run Z command
4. For acknowledge actions: confirm the incident ID before proceeding
5. Never resolve incidents without explicit human confirmation — only acknowledge
Severity guide:
- HIGH urgency: customer-impacting, requires immediate response
- LOW urgency: degraded service, can be addressed in normal hours"""),
("human", "{input}"),
("placeholder", "{agent_scratchpad}"),
])
agent = create_tool_calling_agent(llm, tools, prompt)
executor = AgentExecutor(agent=agent, tools=tools, verbose=True, max_iterations=8)
# Incident status check
result = executor.invoke({
"input": "What's the current incident situation? Give me a summary of all active incidents and who's on call."
})
print(result["output"])
Rate Limits and Best Practices#
| PagerDuty API limit | Value |
|---|---|
| Requests per minute | 960 |
| List response page size | Max 100 |
| Webhook delivery timeout | 5 seconds |
| Incident status values | triggered, acknowledged, resolved |
Best practices:
- Acknowledge, don't auto-resolve: Agents should acknowledge freely (stops escalation) but always require human confirmation to resolve — false resolutions during active outages are dangerous
- Enrich before escalating: The highest-value pattern is adding context notes to incidents before the human engineer joins — saves 10-15 minutes of context gathering per incident
- Use
Fromheader for write operations: PagerDuty requires theFromheader (email address) for all PUT/POST requests — omitting it causes 403 errors on write operations - Pagination for incident history: List endpoints return max 100 results — use
offsetand checkmorein the response for complete history queries
Next Steps#
- AI Agents Datadog Integration — Correlate PagerDuty alerts with Datadog metrics for richer incident context
- AI Agents Slack Integration — Bridge PagerDuty incident alerts with team Slack channels
- AI Agents GitHub Integration — Check recent deploys when investigating service regressions
- Build an AI Agent with LangChain — Complete agent framework tutorial