Computer Use Agent Examples: Browser and Desktop Automation
Computer use agents represent a fundamental shift in automation: instead of writing brittle CSS selectors and XPath expressions, you describe what you want to accomplish and the agent navigates any interface visually. This makes them uniquely capable for legacy systems, complex browser workflows, and any situation where traditional scraping or API integration fails.
These six examples cover the most practical computer use and browser automation patterns — from Playwright-powered web scraping to Claude's computer use API for true visual desktop automation. Each includes working Python code you can adapt for your own use cases.
For browser-specific patterns using the browser-use library, see Browser Use Agent Examples. For the tutorial walkthrough, visit the Computer Use Agent Tutorial.
Example 1: Web Scraping Agent with Playwright#
Use Case: Extract structured product data from an e-commerce site that renders content via JavaScript, making static HTML scraping ineffective. The AI layer interprets the rendered DOM semantically rather than relying on fragile selectors.
Architecture: Playwright browser launch → AI-directed navigation → structured data extraction → Pydantic output validation.
Key Implementation:
import asyncio
import json
from playwright.async_api import async_playwright
from anthropic import Anthropic
from pydantic import BaseModel
from typing import Optional
client = Anthropic()
class ProductData(BaseModel):
name: str
price_usd: float
rating: Optional[float]
review_count: Optional[int]
in_stock: bool
description: str
url: str
async def extract_product_data(page_url: str) -> ProductData:
"""Extract structured product data from a rendered page."""
async with async_playwright() as p:
browser = await p.chromium.launch(headless=True)
page = await browser.new_page()
await page.goto(page_url, wait_until="networkidle")
# Get the full rendered page text
page_text = await page.evaluate("() => document.body.innerText")
page_title = await page.title()
# Use Claude to extract structured data from the rendered text
response = client.messages.create(
model="claude-3-5-haiku-20241022",
max_tokens=500,
messages=[{
"role": "user",
"content": f"""Extract product information from this page.
Page title: {page_title}
Page URL: {page_url}
Page content: {page_text[:3000]}
Return JSON: {{"name": str, "price_usd": float, "rating": float|null,
"review_count": int|null, "in_stock": bool, "description": str, "url": "{page_url}"}}
"""
}]
)
text = response.content[0].text
start, end = text.find('{'), text.rfind('}') + 1
data = json.loads(text[start:end])
await browser.close()
return ProductData(**data)
async def scrape_product_listings(base_url: str, max_pages: int = 3) -> list[ProductData]:
"""Scrape multiple product pages with AI-assisted navigation."""
products = []
async with async_playwright() as p:
browser = await p.chromium.launch(headless=True)
page = await browser.new_page()
for page_num in range(1, max_pages + 1):
url = f"{base_url}?page={page_num}"
await page.goto(url, wait_until="networkidle")
# Extract all product links from the listing page
links = await page.evaluate("""
() => Array.from(document.querySelectorAll('a[href*="/product"]'))
.map(a => a.href)
.filter((v, i, a) => a.indexOf(v) === i)
.slice(0, 10)
""")
print(f"Page {page_num}: found {len(links)} product links")
for link in links[:5]: # Limit to 5 per page for demo
try:
product = await extract_product_data(link)
products.append(product)
print(f" Extracted: {product.name} — ${product.price_usd}")
except Exception as e:
print(f" Failed to extract {link}: {e}")
await browser.close()
return products
# Run scraper
products = asyncio.run(scrape_product_listings("https://example-shop.com/laptops"))
for p in products:
print(f"{p.name}: ${p.price_usd} | {p.rating}/5 ({p.review_count} reviews) | {'In Stock' if p.in_stock else 'Out of Stock'}")
Outcome: Reliable product data extraction from JavaScript-rendered pages that defeat traditional scraping. The AI interpretation layer handles layout variations between product types without per-page selector maintenance.
Example 2: Form-Filling Automation Agent#
Use Case: Automatically complete complex multi-step web forms — permit applications, compliance filings, vendor registrations — by reading form fields semantically and filling them with data from a structured source, then navigating pagination and handling validation errors.
Architecture: Playwright browser + form data source → AI field mapper → sequential fill + submit → confirmation extraction.
Key Implementation:
import asyncio
import json
from playwright.async_api import async_playwright, Page
from anthropic import Anthropic
client = Anthropic()
async def get_form_fields(page: Page) -> list[dict]:
"""Use AI to identify and describe all form fields on the current page."""
form_html = await page.evaluate("""
() => {
const form = document.querySelector('form');
if (!form) return '';
return form.outerHTML.substring(0, 5000);
}
""")
page_text = await page.evaluate("() => document.body.innerText")
response = client.messages.create(
model="claude-3-5-haiku-20241022",
max_tokens=800,
messages=[{
"role": "user",
"content": f"""Identify all form fields on this page.
Form HTML: {form_html}
Page text (for labels): {page_text[:1000]}
Return JSON array: [{{"field_id": str, "label": str, "type": "text|email|select|checkbox|textarea|date",
"required": bool, "placeholder": str|null, "options": list|null}}]
"""
}]
)
text = response.content[0].text
start, end = text.find('['), text.rfind(']') + 1
return json.loads(text[start:end]) if start != -1 else []
async def map_data_to_fields(fields: list[dict], form_data: dict) -> list[dict]:
"""Use AI to map structured data to the correct form fields."""
response = client.messages.create(
model="claude-3-5-haiku-20241022",
max_tokens=600,
messages=[{
"role": "user",
"content": f"""Map this data to the form fields.
Form fields: {json.dumps(fields)}
Available data: {json.dumps(form_data)}
Return JSON array: [{{"field_id": str, "value": str, "action": "fill|select|check"}}]
Only include fields where a matching value exists in the data.
"""
}]
)
text = response.content[0].text
start, end = text.find('['), text.rfind(']') + 1
return json.loads(text[start:end]) if start != -1 else []
async def fill_and_submit_form(url: str, form_data: dict) -> dict:
"""Navigate to a form, fill it with provided data, and submit."""
async with async_playwright() as p:
browser = await p.chromium.launch(headless=False) # headless=False for oversight
page = await browser.new_page()
await page.goto(url, wait_until="networkidle")
max_steps = 5 # Handle multi-page forms
for step in range(max_steps):
print(f"Form step {step + 1}: {page.url}")
# Identify fields
fields = await get_form_fields(page)
print(f" Found {len(fields)} fields")
# Map data to fields
fill_instructions = await map_data_to_fields(fields, form_data)
# Fill each field
for instruction in fill_instructions:
field_id = instruction["field_id"]
value = instruction["value"]
action = instruction.get("action", "fill")
try:
if action == "fill":
await page.fill(f"#{field_id}, [name='{field_id}']", value)
elif action == "select":
await page.select_option(f"#{field_id}, [name='{field_id}']", label=value)
elif action == "check" and value.lower() == "true":
await page.check(f"#{field_id}, [name='{field_id}']")
except Exception as e:
print(f" Could not fill {field_id}: {e}")
# Try to submit or continue
submitted = False
for btn_text in ["Submit", "Continue", "Next", "Save"]:
try:
await page.click(f"button:has-text('{btn_text}')", timeout=2000)
await page.wait_for_load_state("networkidle")
submitted = True
break
except:
continue
if not submitted:
break
# Check for confirmation
page_text = await page.evaluate("() => document.body.innerText")
if any(word in page_text.lower() for word in ["confirmation", "submitted", "success", "thank you"]):
# Extract confirmation number
response = client.messages.create(
model="claude-3-5-haiku-20241022", max_tokens=100,
messages=[{"role": "user", "content": f"Extract the confirmation or reference number from: {page_text[:500]}. Return only the number."}]
)
await browser.close()
return {"success": True, "confirmation": response.content[0].text.strip()}
await browser.close()
return {"success": False, "last_url": page.url}
result = asyncio.run(fill_and_submit_form(
"https://example.gov/business-registration",
{"business_name": "Acme Analytics LLC", "owner_email": "owner@acme.com", "state": "California"}
))
print(f"Result: {result}")
Outcome: Multi-page government and compliance forms completed automatically without writing per-form selectors. The AI field mapper handles layout variations between different form systems while the structured data input keeps the process deterministic.
Example 3: UI Testing Agent#
Use Case: Run regression tests on a web application by navigating user flows visually, detecting functional failures that go beyond what pixel-comparison or unit tests catch — broken checkout flows, missing validation messages, inaccessible navigation.
Architecture: Test scenario definitions → Playwright execution → AI visual assertion checker → structured test report.
Key Implementation:
import asyncio
import json
from playwright.async_api import async_playwright
from anthropic import Anthropic
from dataclasses import dataclass
from typing import List
client = Anthropic()
@dataclass
class TestScenario:
name: str
url: str
steps: List[str]
assertions: List[str]
@dataclass
class TestResult:
name: str
passed: bool
failed_assertions: List[str]
screenshots: List[str]
async def run_test_scenario(scenario: TestScenario) -> TestResult:
"""Execute a test scenario and validate outcomes visually."""
import base64
failed_assertions = []
screenshots = []
async with async_playwright() as p:
browser = await p.chromium.launch(headless=True)
page = await browser.new_page(viewport={"width": 1280, "height": 800})
await page.goto(scenario.url, wait_until="networkidle")
# Execute each step
for step in scenario.steps:
response = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=300,
messages=[{
"role": "user",
"content": f"""I am testing a web application. Current URL: {page.url}
The current page has these interactive elements (from accessibility tree):
{await page.evaluate("() => document.body.innerText")}
Execute this test step by returning a Playwright action:
Step: {step}
Return JSON: {{"action": "click|fill|select|navigate|wait", "selector": str, "value": str|null}}
Use semantic selectors like role or text content.
"""
}]
)
text = response.content[0].text
try:
start, end = text.find('{'), text.rfind('}') + 1
action = json.loads(text[start:end])
if action["action"] == "click":
await page.click(action["selector"], timeout=3000)
elif action["action"] == "fill":
await page.fill(action["selector"], action["value"] or "")
elif action["action"] == "navigate":
await page.goto(action["value"])
await page.wait_for_timeout(500)
except Exception as e:
print(f" Step failed: {step} — {e}")
# Take screenshot for assertions
screenshot_bytes = await page.screenshot()
screenshot_b64 = base64.b64encode(screenshot_bytes).decode()
screenshots.append(screenshot_b64)
# Validate assertions
page_text = await page.evaluate("() => document.body.innerText")
for assertion in scenario.assertions:
response = client.messages.create(
model="claude-3-5-haiku-20241022", max_tokens=100,
messages=[{
"role": "user",
"content": f"""Does this page satisfy the assertion?
Page content: {page_text[:1000]}
Assertion: {assertion}
Answer: PASS or FAIL (one word only)"""
}]
)
result_text = response.content[0].text.strip().upper()
if "FAIL" in result_text:
failed_assertions.append(assertion)
print(f" FAIL: {assertion}")
await browser.close()
return TestResult(
name=scenario.name,
passed=len(failed_assertions) == 0,
failed_assertions=failed_assertions,
screenshots=screenshots
)
# Define test scenarios
checkout_test = TestScenario(
name="Guest Checkout Flow",
url="https://example-shop.com",
steps=[
"Click on the first product in the listing",
"Click the Add to Cart button",
"Click the Cart icon to view cart",
"Click Proceed to Checkout",
"Select Guest Checkout option",
],
assertions=[
"Cart shows at least 1 item",
"Checkout form is visible",
"No error messages are displayed",
"A shipping address section is present",
]
)
result = asyncio.run(run_test_scenario(checkout_test))
print(f"\n{result.name}: {'PASSED' if result.passed else 'FAILED'}")
for fail in result.failed_assertions:
print(f" FAIL: {fail}")
Outcome: UI regression detection that finds functional breakages invisible to unit tests — missing buttons, broken navigation, incorrect validation behavior. The AI assertion layer handles layout variations between deployments without requiring test updates for every UI change.
Example 4: Data Extraction from PDFs and Screenshots#
Use Case: Extract structured data from unstructured documents — scanned invoices, PDF reports, screenshot tables — using Claude's vision capabilities to parse documents that no OCR template can handle.
Architecture: Document loader → Claude vision extraction → structured output → validation and storage.
Key Implementation:
import base64
import json
from pathlib import Path
from anthropic import Anthropic
from pydantic import BaseModel
from typing import Optional, List
client = Anthropic()
class InvoiceData(BaseModel):
vendor_name: str
invoice_number: str
invoice_date: str
due_date: Optional[str]
line_items: List[dict] # [{description, qty, unit_price, total}]
subtotal: float
tax: Optional[float]
total: float
currency: str
def extract_invoice_from_pdf_page(image_path: str) -> InvoiceData:
"""Extract structured invoice data from a PDF page image."""
with open(image_path, "rb") as f:
image_b64 = base64.standard_b64encode(f.read()).decode()
# Detect file type
ext = Path(image_path).suffix.lower()
media_type = "image/jpeg" if ext in (".jpg", ".jpeg") else "image/png"
response = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=1500,
messages=[{
"role": "user",
"content": [
{
"type": "image",
"source": {"type": "base64", "media_type": media_type, "data": image_b64}
},
{
"type": "text",
"text": """Extract all invoice data from this document.
Return JSON: {
"vendor_name": str,
"invoice_number": str,
"invoice_date": "YYYY-MM-DD",
"due_date": "YYYY-MM-DD" | null,
"line_items": [{"description": str, "qty": float, "unit_price": float, "total": float}],
"subtotal": float,
"tax": float | null,
"total": float,
"currency": "USD" | "EUR" | str
}
If a value is not visible, use null. Do not infer or estimate amounts."""
}
]
}]
)
text = response.content[0].text
start, end = text.find('{'), text.rfind('}') + 1
return InvoiceData(**json.loads(text[start:end]))
def batch_extract_invoices(image_dir: str) -> list[dict]:
"""Process all invoice images in a directory."""
results = []
for path in Path(image_dir).glob("*.{png,jpg,jpeg}"):
print(f"Processing: {path.name}")
try:
invoice = extract_invoice_from_pdf_page(str(path))
results.append({
"file": path.name,
"success": True,
"data": invoice.model_dump()
})
print(f" Extracted: {invoice.vendor_name} — Invoice #{invoice.invoice_number} — ${invoice.total}")
except Exception as e:
results.append({"file": path.name, "success": False, "error": str(e)})
print(f" Failed: {e}")
# Summary statistics
successful = [r for r in results if r["success"]]
total_value = sum(r["data"]["total"] for r in successful)
print(f"\nProcessed {len(results)} invoices: {len(successful)} successful, total value ${total_value:,.2f}")
return results
results = batch_extract_invoices("./invoices")
Outcome: Invoice processing that handles dozens of vendor formats without per-vendor template creation. The vision model reads handwritten amounts, unusual table layouts, and multi-page documents that defeat rules-based OCR systems. Accuracy on clean scans: typically above 95% for numeric fields.
Example 5: Cross-App Workflow Agent#
Use Case: Automate a business workflow that spans multiple applications — pull data from one system, process it in another, and post results to a third — without API access to any of them.
Architecture: Playwright multi-tab orchestration → sequential application navigation → data passing between contexts → completion verification.
Key Implementation:
import asyncio
import json
from playwright.async_api import async_playwright, BrowserContext
from anthropic import Anthropic
client = Anthropic()
async def extract_data_from_app(context: BrowserContext, app_url: str, extract_task: str) -> dict:
"""Open an app in a new tab and extract data from it."""
page = await context.new_page()
await page.goto(app_url, wait_until="networkidle")
page_text = await page.evaluate("() => document.body.innerText")
response = client.messages.create(
model="claude-3-5-haiku-20241022", max_tokens=600,
messages=[{"role": "user", "content": f"Extract from this page: {extract_task}\nPage content: {page_text[:3000]}\nReturn JSON."}]
)
text = response.content[0].text
await page.close()
start, end = text.find('{'), text.rfind('}') + 1
return json.loads(text[start:end]) if start != -1 else {}
async def post_data_to_app(context: BrowserContext, app_url: str, data: dict, post_task: str) -> bool:
"""Open an app in a new tab and post data to it."""
page = await context.new_page()
await page.goto(app_url, wait_until="networkidle")
page_text = await page.evaluate("() => document.body.innerText")
# Get fill instructions from AI
response = client.messages.create(
model="claude-3-5-sonnet-20241022", max_tokens=800,
messages=[{"role": "user", "content": f"""
Page content: {page_text[:2000]}
Task: {post_task}
Data to enter: {json.dumps(data)}
Return JSON array of actions: [{{"action": "fill|click|select", "selector": str, "value": str}}]
"""}]
)
text = response.content[0].text
start, end = text.find('['), text.rfind(']') + 1
actions = json.loads(text[start:end]) if start != -1 else []
success = False
for action in actions:
try:
if action["action"] == "fill":
await page.fill(action["selector"], action["value"])
elif action["action"] == "click":
await page.click(action["selector"])
await page.wait_for_load_state("networkidle")
elif action["action"] == "select":
await page.select_option(action["selector"], label=action["value"])
except Exception as e:
print(f" Action failed: {action} — {e}")
final_text = await page.evaluate("() => document.body.innerText")
success = any(word in final_text.lower() for word in ["saved", "submitted", "success", "created"])
await page.close()
return success
async def run_cross_app_workflow():
"""Orchestrate a workflow across multiple web applications."""
async with async_playwright() as p:
browser = await p.chromium.launch(headless=False)
context = await browser.new_context()
print("Step 1: Extract pending orders from OMS")
orders_data = await extract_data_from_app(
context,
"https://internal-oms.company.com/orders?status=pending",
"List all pending orders with order_id, customer_name, items, and total_amount"
)
print(f" Found {len(orders_data.get('orders', []))} pending orders")
print("\nStep 2: Post orders to fulfillment system")
for order in orders_data.get("orders", [])[:3]: # Process first 3 for demo
success = await post_data_to_app(
context,
"https://fulfillment.company.com/new-order",
order,
"Fill in the new order form with the provided order data and submit it"
)
print(f" Order {order.get('order_id')}: {'Success' if success else 'Failed'}")
await browser.close()
asyncio.run(run_cross_app_workflow())
Outcome: Business workflows that span internal systems without requiring engineering resources to build custom integrations. Particularly valuable for mid-size companies with heterogeneous tool stacks where API integration between every pair of systems is impractical.
Example 6: Competitive Research Agent#
Use Case: Automatically gather competitive intelligence — pricing, feature lists, recent announcements — from competitor websites and compile it into a structured report for the product team.
Architecture: Competitor URL list → Playwright content extraction → Claude analysis → structured competitive report.
Key Implementation:
import asyncio
import json
from playwright.async_api import async_playwright
from anthropic import Anthropic
client = Anthropic()
COMPETITOR_RESEARCH_TARGETS = [
{"name": "CompetitorA", "urls": {
"pricing": "https://competitor-a.com/pricing",
"features": "https://competitor-a.com/features",
"blog": "https://competitor-a.com/blog"
}},
{"name": "CompetitorB", "urls": {
"pricing": "https://competitor-b.com/plans",
"features": "https://competitor-b.com/product",
"blog": "https://competitor-b.com/news"
}},
]
async def extract_page_intelligence(url: str, extract_type: str) -> dict:
"""Extract competitive intelligence from a specific page type."""
async with async_playwright() as p:
browser = await p.chromium.launch(headless=True)
page = await browser.new_page()
try:
await page.goto(url, wait_until="networkidle", timeout=15000)
page_text = await page.evaluate("() => document.body.innerText")
except Exception as e:
await browser.close()
return {"error": str(e), "url": url}
await browser.close()
prompts = {
"pricing": "Extract all pricing tiers, prices, and what each tier includes. Return JSON with plans array.",
"features": "List all product features and capabilities mentioned. Return JSON with features array.",
"blog": "What are the 3 most recent blog posts? Summarize each. Return JSON with posts array."
}
response = client.messages.create(
model="claude-3-5-haiku-20241022", max_tokens=800,
messages=[{"role": "user", "content": f"""
Page URL: {url}
Page content: {page_text[:4000]}
Task: {prompts.get(extract_type, "Summarize key information")}
"""}]
)
text = response.content[0].text
start, end = text.find('{'), text.rfind('}') + 1
return json.loads(text[start:end]) if start != -1 else {"raw": text}
async def generate_competitive_report(competitor_data: list[dict]) -> str:
"""Synthesize competitive data into an actionable report."""
response = client.messages.create(
model="claude-3-5-sonnet-20241022", max_tokens=2000,
messages=[{"role": "user", "content": f"""
Create a competitive intelligence report from this data.
Competitor data: {json.dumps(competitor_data, indent=2)}
Format the report as:
## Pricing Comparison
[Summary of pricing tiers and key differences]
## Feature Comparison
[What each competitor offers, what we may be missing]
## Recent Activity
[Noteworthy blog/news items suggesting their roadmap or focus]
## Strategic Implications
[3-5 actionable takeaways for our product team]
"""}]
)
return response.content[0].text
async def run_competitive_research():
all_data = []
for competitor in COMPETITOR_RESEARCH_TARGETS:
print(f"Researching {competitor['name']}...")
comp_data = {"name": competitor["name"], "intelligence": {}}
for data_type, url in competitor["urls"].items():
print(f" Extracting {data_type} from {url}")
intel = await extract_page_intelligence(url, data_type)
comp_data["intelligence"][data_type] = intel
all_data.append(comp_data)
report = await generate_competitive_report(all_data)
with open("competitive_report.md", "w") as f:
f.write(report)
print("\nReport saved to competitive_report.md")
return report
asyncio.run(run_competitive_research())
Outcome: A structured competitive intelligence report compiled in minutes rather than hours. The Playwright extraction handles JavaScript-rendered content that simple HTTP requests miss, and the Claude synthesis layer converts raw page text into actionable strategic insights.
Choosing the Right Automation Approach#
Use Playwright directly (Examples 1, 3, 4, 5, 6) when you need speed and reliability — it interacts with the DOM directly and costs no LLM tokens for navigation. Add the AI interpretation layer (Claude vision + structured extraction) when you need semantic understanding of content rather than raw HTML parsing.
Reserve Claude's full computer use API (screenshot + mouse/keyboard) for desktop applications and systems with no browser-accessible interface. The screenshot loop is significantly slower and more expensive than direct DOM interaction.
Getting Started#
The Computer Use Agent Tutorial covers Claude's computer use API setup including VM isolation. For browser-use library patterns that abstract Playwright with a higher-level AI interface, see Browser Use Agent Examples. For the theoretical framework on autonomous agents, see What is an AI Agent.
Compare this automation approach with other agent frameworks in LangChain Agent Examples and OpenAI Agents SDK Examples.
Frequently Asked Questions#
The FAQ section renders from the frontmatter faq array above.