Web browser tabs and code representing automated browser testing and web scraping — Photo by Carlos Muza on Unsplash

Build a Browser Automation Agent with browser-use

Traditional web automation with Playwright or Selenium requires writing explicit selectors, handling dynamic content, and maintaining brittle scripts that break whenever a website updates its design. browser-use is a Python library that solves this by combining a real browser (Playwright) with an AI model that can see the page, understand its content, and take the right action — even on sites it has never seen before.

In this tutorial you will build a browser agent that can autonomously research job listings, extract structured data, and submit forms — all guided by natural language instructions rather than hardcoded selectors.

What You'll Learn#

How to install and configure browser-use with different AI model backends
How to run basic and multi-step browser automation tasks
How to extract structured data from web pages
How to handle authentication, forms, and multi-page workflows
How to add custom browser actions to extend the agent's capabilities
How to run agents headlessly in production environments

Prerequisites#

Python 3.11 or higher installed (browser-use requires 3.11+)
An OpenAI or Anthropic API key
Basic understanding of AI agents and web automation concepts
No Playwright knowledge required — browser-use handles the browser layer

Step 1: Project Setup#

mkdir browser-agent-demo && cd browser-agent-demo
python -m venv .venv && source .venv/bin/activate

# Install browser-use and model provider
pip install browser-use langchain-openai python-dotenv

# Install Playwright browser binaries
playwright install chromium

Create .env:

OPENAI_API_KEY=sk-...your-key...
# Or for Anthropic:
# ANTHROPIC_API_KEY=sk-ant-...

Step 2: Your First Browser Agent#

A minimal browser-use agent is just three lines of code:

# simple_agent.py
import asyncio
from dotenv import load_dotenv
from browser_use import Agent
from langchain_openai import ChatOpenAI

load_dotenv()


async def main():
    agent = Agent(
        task="Go to reddit.com/r/MachineLearning and find the top post from today. "
             "Return the title, author, and score.",
        llm=ChatOpenAI(model="gpt-4o"),
    )
    result = await agent.run()
    print(result)


if __name__ == "__main__":
    asyncio.run(main())

Run it:

python simple_agent.py

A Chrome window will open, the agent will navigate to Reddit, identify the top post, and return the structured result — all without a single CSS selector.

Step 3: Structured Data Extraction#

For production use cases you often need reliably structured output. Use Pydantic models to define the exact shape of data you want extracted:

# structured_extraction.py
import asyncio
from dotenv import load_dotenv
from pydantic import BaseModel
from browser_use import Agent
from browser_use.browser.browser import Browser, BrowserConfig
from langchain_openai import ChatOpenAI

load_dotenv()


class JobListing(BaseModel):
    """Structured representation of a job listing."""
    title: str
    company: str
    location: str
    salary_range: str | None
    job_type: str  # full-time, part-time, contract
    posted_date: str
    key_requirements: list[str]
    apply_url: str


class JobSearchResults(BaseModel):
    """Container for multiple job listings."""
    search_query: str
    total_found: int
    listings: list[JobListing]


async def search_jobs():
    # Configure browser for production use
    browser = Browser(
        config=BrowserConfig(
            headless=True,           # No visible browser window
            disable_security=False,  # Keep security features on
            extra_chromium_args=["--no-sandbox"],
        )
    )

    agent = Agent(
        task="""Go to linkedin.com/jobs and search for 'AI Engineer' jobs in 'San Francisco, CA'.
        Extract details for the first 5 job listings including: title, company, location,
        salary range (if shown), job type, and posting date.
        Also save the URL for each job's apply button.""",
        llm=ChatOpenAI(model="gpt-4o"),
        browser=browser,
    )

    result = await agent.run()

    # The result is a string — you can parse it or use output_model for typed results
    print("Extracted Job Listings:")
    print(result)

    await browser.close()


if __name__ == "__main__":
    asyncio.run(search_jobs())

Step 4: Multi-Step Web Workflows#

browser-use excels at multi-step tasks that involve navigating across multiple pages, filling forms, and maintaining context throughout:

# multi_step_workflow.py
import asyncio
from dotenv import load_dotenv
from browser_use import Agent, Controller
from browser_use.browser.browser import Browser, BrowserConfig
from langchain_openai import ChatOpenAI

load_dotenv()


async def research_and_compare():
    """Multi-step research workflow across multiple websites."""

    browser = Browser(config=BrowserConfig(headless=False))

    # Controller lets you add custom actions and capture intermediate state
    controller = Controller()

    @controller.action("Save research note")
    def save_note(note: str) -> str:
        """Save a research note for later compilation.

        Args:
            note: The research note text to save.
        """
        # In production, write to database or file
        print(f"[NOTE SAVED]: {note}")
        return f"Note saved: {note[:50]}..."

    agent = Agent(
        task="""Research task: Compare the Python package 'httpx' and 'requests' libraries.

        Steps:
        1. Go to pypi.org/project/httpx and note the download stats and key features
        2. Go to pypi.org/project/requests and note the download stats and key features
        3. Visit the GitHub repositories for both and check star counts and recent activity
        4. Use the 'Save research note' action to record key findings for each library
        5. Provide a final comparison summary

        Be thorough and accurate — only report what you actually see on the pages.""",
        llm=ChatOpenAI(model="gpt-4o"),
        browser=browser,
        controller=controller,
    )

    result = await agent.run(max_steps=25)
    print("\n=== Final Research Summary ===")
    print(result)

    await browser.close()


if __name__ == "__main__":
    asyncio.run(research_and_compare())

Step 5: Handling Authentication and Sessions#

Many real-world tasks require logging in first. browser-use supports persistent browser contexts that preserve cookies and authentication state:

# authenticated_agent.py
import asyncio
from pathlib import Path
from dotenv import load_dotenv
from browser_use import Agent
from browser_use.browser.browser import Browser, BrowserConfig
from browser_use.browser.context import BrowserContextConfig
from langchain_anthropic import ChatAnthropic  # Using Claude for this example

load_dotenv()

# Path to save browser session state (cookies, localStorage)
SESSION_PATH = Path("./browser_session")
SESSION_PATH.mkdir(exist_ok=True)


async def run_with_session():
    """Run an agent that preserves login state across runs."""

    browser = Browser(
        config=BrowserConfig(
            headless=False,
            new_context_config=BrowserContextConfig(
                # Persist session to disk so you only log in once
                storage_state=str(SESSION_PATH / "session.json")
                if (SESSION_PATH / "session.json").exists()
                else None,
                save_storage_state=True,
            ),
        )
    )

    # First run: agent will navigate to the site and handle login if needed
    agent = Agent(
        task="""Go to github.com. If you're not logged in, go to github.com/login
        and log in with the credentials: username 'demo_user', password 'demo_pass'.
        After logging in, go to your notifications page and summarize the first 3 notifications.""",
        llm=ChatAnthropic(model="claude-3-5-sonnet-20241022"),
        browser=browser,
        # Sensitive — don't log actions that might contain credentials
        generate_gif=False,
    )

    result = await agent.run()
    print(result)

    # Save session state for next run
    context = await browser.new_context()
    await context.save_storage_state(str(SESSION_PATH / "session.json"))

    await browser.close()


if __name__ == "__main__":
    asyncio.run(run_with_session())

Web browser automation code with tabs open representing browser-use agent navigation

Step 6: Custom Browser Actions#

Extend the agent's default capabilities with custom actions using the @controller.action decorator:

# custom_actions.py
import asyncio
import csv
import json
from datetime import datetime
from pathlib import Path
from dotenv import load_dotenv
from browser_use import Agent, Controller
from browser_use.browser.browser import Browser, BrowserConfig
from langchain_openai import ChatOpenAI

load_dotenv()

controller = Controller()
collected_data = []


@controller.action("Save product data to CSV")
def save_product_to_csv(
    name: str,
    price: str,
    rating: str,
    review_count: str,
    url: str,
) -> str:
    """Save a product's data to the collection for CSV export.

    Args:
        name: Product name.
        price: Product price as displayed on the page.
        rating: Star rating (e.g., '4.5 out of 5').
        review_count: Number of reviews.
        url: Product page URL.
    """
    collected_data.append({
        "name": name,
        "price": price,
        "rating": rating,
        "review_count": review_count,
        "url": url,
        "scraped_at": datetime.now().isoformat(),
    })
    return f"Saved: {name} at {price}"


@controller.action("Export data to file")
def export_to_csv(filename: str = "products.csv") -> str:
    """Export all collected product data to a CSV file.

    Args:
        filename: Output filename for the CSV.
    """
    if not collected_data:
        return "No data collected yet."

    output_path = Path(filename)
    with open(output_path, "w", newline="", encoding="utf-8") as f:
        writer = csv.DictWriter(f, fieldnames=collected_data[0].keys())
        writer.writeheader()
        writer.writerows(collected_data)

    return f"Exported {len(collected_data)} products to {filename}"


async def scrape_products():
    browser = Browser(config=BrowserConfig(headless=True))

    agent = Agent(
        task="""Go to books.toscrape.com (a legal practice scraping site).
        Navigate through the first 2 pages of books.
        For each book, use 'Save product data to CSV' to record: name, price, rating, review count, and URL.
        After processing all books, use 'Export data to file' to save as 'books_data.csv'.""",
        llm=ChatOpenAI(model="gpt-4o"),
        browser=browser,
        controller=controller,
    )

    await agent.run(max_steps=40)
    await browser.close()

    print(f"\nFinal dataset: {len(collected_data)} products collected")


if __name__ == "__main__":
    asyncio.run(scrape_products())

Step 7: Production Configuration#

For production deployments, configure browser-use for reliability and observability:

# production_config.py
from browser_use import Agent
from browser_use.browser.browser import Browser, BrowserConfig
from browser_use.browser.context import BrowserContextConfig
from langchain_openai import ChatOpenAI


def create_production_agent(task: str) -> Agent:
    """Create a production-configured browser agent."""
    browser = Browser(
        config=BrowserConfig(
            headless=True,
            disable_security=False,
            extra_chromium_args=[
                "--no-sandbox",
                "--disable-dev-shm-usage",     # Required in Docker
                "--disable-gpu",                # Required in headless servers
                "--window-size=1920,1080",
            ],
            new_context_config=BrowserContextConfig(
                viewport={"width": 1920, "height": 1080},
                user_agent="Mozilla/5.0 (compatible; ResearchBot/1.0)",
                java_script_enabled=True,
                accept_downloads=False,         # Disable file downloads for security
            ),
        )
    )

    return Agent(
        task=task,
        llm=ChatOpenAI(
            model="gpt-4o",
            temperature=0,     # Deterministic for consistency
            timeout=60,
        ),
        browser=browser,
        max_failures=3,         # Retry limit per action
        retry_delay=2,          # Seconds between retries
        generate_gif=False,     # Disable GIF recording in production
    )

What's Next#

You have built a capable browser automation agent that works without brittle selectors. Recommended next steps:

Computer use agent: For full desktop automation (not just browsers), see the computer use agent tutorial
LangChain tools: Learn how LangChain's web tools compare to browser-use
Playwright directly: If you need deterministic automation, learn pure Playwright for cases where AI decision-making adds unnecessary latency
MCP integration: See connecting agents to MCP servers to expose browser-use capabilities as MCP tools
Tool use patterns: Read the tool use glossary entry for context on how the agent decides which browser actions to take

Frequently Asked Questions#

How does browser-use differ from traditional Playwright automation?

Traditional Playwright requires explicit CSS selectors, XPaths, or text matchers that break when website layouts change. browser-use uses a vision-language model to understand the current page state visually and semantically, then selects actions based on intent. This makes it far more robust to UI changes but introduces AI latency and non-determinism.

Which AI models work best with browser-use?

gpt-4o and claude-3-5-sonnet are the most reliable choices due to their strong visual understanding capabilities. gpt-4o-mini works for simpler tasks but struggles with complex layouts or ambiguous UI states. Gemini 2.0 Flash is a cost-effective alternative for straightforward navigation tasks.

Can browser-use handle JavaScript-heavy single-page applications?

Yes. Since browser-use uses a real Chromium browser, JavaScript rendering is handled natively. The agent waits for page loads and can interact with dynamically loaded content. For very slow SPAs, increase the action timeout in BrowserConfig.

Is browser-use appropriate for high-volume production scraping?

For high-volume (1000+ pages per day), browser-use is more expensive and slower than traditional scrapers due to AI inference costs. It is best suited for: complex interactive workflows, sites that actively block traditional scrapers, tasks requiring judgment about which elements to interact with, and low-to-medium volume data extraction.

How do I handle bot detection and CAPTCHAs?

browser-use does not bypass bot detection or CAPTCHAs by design — solving CAPTCHAs may violate terms of service. If you encounter a CAPTCHA, the agent will typically stop and report it. Provide a human operator callback or integrate a CAPTCHA service for legitimate workflows that encounter them.

Build a Browser Automation Agent with browser-use

What You'll Learn#

How to install and configure browser-use with different AI model backends
How to run basic and multi-step browser automation tasks
How to extract structured data from web pages
How to handle authentication, forms, and multi-page workflows
How to add custom browser actions to extend the agent's capabilities
How to run agents headlessly in production environments

Prerequisites#

Python 3.11 or higher installed (browser-use requires 3.11+)
An OpenAI or Anthropic API key
Basic understanding of AI agents and web automation concepts
No Playwright knowledge required — browser-use handles the browser layer

Step 1: Project Setup#

mkdir browser-agent-demo && cd browser-agent-demo
python -m venv .venv && source .venv/bin/activate

# Install browser-use and model provider
pip install browser-use langchain-openai python-dotenv

# Install Playwright browser binaries
playwright install chromium

Create .env:

OPENAI_API_KEY=sk-...your-key...
# Or for Anthropic:
# ANTHROPIC_API_KEY=sk-ant-...

Step 2: Your First Browser Agent#

A minimal browser-use agent is just three lines of code:

# simple_agent.py
import asyncio
from dotenv import load_dotenv
from browser_use import Agent
from langchain_openai import ChatOpenAI

load_dotenv()


async def main():
    agent = Agent(
        task="Go to reddit.com/r/MachineLearning and find the top post from today. "
             "Return the title, author, and score.",
        llm=ChatOpenAI(model="gpt-4o"),
    )
    result = await agent.run()
    print(result)


if __name__ == "__main__":
    asyncio.run(main())

Run it:

python simple_agent.py

A Chrome window will open, the agent will navigate to Reddit, identify the top post, and return the structured result — all without a single CSS selector.

Step 3: Structured Data Extraction#

For production use cases you often need reliably structured output. Use Pydantic models to define the exact shape of data you want extracted:

# structured_extraction.py
import asyncio
from dotenv import load_dotenv
from pydantic import BaseModel
from browser_use import Agent
from browser_use.browser.browser import Browser, BrowserConfig
from langchain_openai import ChatOpenAI

load_dotenv()


class JobListing(BaseModel):
    """Structured representation of a job listing."""
    title: str
    company: str
    location: str
    salary_range: str | None
    job_type: str  # full-time, part-time, contract
    posted_date: str
    key_requirements: list[str]
    apply_url: str


class JobSearchResults(BaseModel):
    """Container for multiple job listings."""
    search_query: str
    total_found: int
    listings: list[JobListing]


async def search_jobs():
    # Configure browser for production use
    browser = Browser(
        config=BrowserConfig(
            headless=True,           # No visible browser window
            disable_security=False,  # Keep security features on
            extra_chromium_args=["--no-sandbox"],
        )
    )

    agent = Agent(
        task="""Go to linkedin.com/jobs and search for 'AI Engineer' jobs in 'San Francisco, CA'.
        Extract details for the first 5 job listings including: title, company, location,
        salary range (if shown), job type, and posting date.
        Also save the URL for each job's apply button.""",
        llm=ChatOpenAI(model="gpt-4o"),
        browser=browser,
    )

    result = await agent.run()

    # The result is a string — you can parse it or use output_model for typed results
    print("Extracted Job Listings:")
    print(result)

    await browser.close()


if __name__ == "__main__":
    asyncio.run(search_jobs())

Step 4: Multi-Step Web Workflows#

browser-use excels at multi-step tasks that involve navigating across multiple pages, filling forms, and maintaining context throughout:

# multi_step_workflow.py
import asyncio
from dotenv import load_dotenv
from browser_use import Agent, Controller
from browser_use.browser.browser import Browser, BrowserConfig
from langchain_openai import ChatOpenAI

load_dotenv()


async def research_and_compare():
    """Multi-step research workflow across multiple websites."""

    browser = Browser(config=BrowserConfig(headless=False))

    # Controller lets you add custom actions and capture intermediate state
    controller = Controller()

    @controller.action("Save research note")
    def save_note(note: str) -> str:
        """Save a research note for later compilation.

        Args:
            note: The research note text to save.
        """
        # In production, write to database or file
        print(f"[NOTE SAVED]: {note}")
        return f"Note saved: {note[:50]}..."

    agent = Agent(
        task="""Research task: Compare the Python package 'httpx' and 'requests' libraries.

        Steps:
        1. Go to pypi.org/project/httpx and note the download stats and key features
        2. Go to pypi.org/project/requests and note the download stats and key features
        3. Visit the GitHub repositories for both and check star counts and recent activity
        4. Use the 'Save research note' action to record key findings for each library
        5. Provide a final comparison summary

        Be thorough and accurate — only report what you actually see on the pages.""",
        llm=ChatOpenAI(model="gpt-4o"),
        browser=browser,
        controller=controller,
    )

    result = await agent.run(max_steps=25)
    print("\n=== Final Research Summary ===")
    print(result)

    await browser.close()


if __name__ == "__main__":
    asyncio.run(research_and_compare())

Step 5: Handling Authentication and Sessions#

Many real-world tasks require logging in first. browser-use supports persistent browser contexts that preserve cookies and authentication state:

# authenticated_agent.py
import asyncio
from pathlib import Path
from dotenv import load_dotenv
from browser_use import Agent
from browser_use.browser.browser import Browser, BrowserConfig
from browser_use.browser.context import BrowserContextConfig
from langchain_anthropic import ChatAnthropic  # Using Claude for this example

load_dotenv()

# Path to save browser session state (cookies, localStorage)
SESSION_PATH = Path("./browser_session")
SESSION_PATH.mkdir(exist_ok=True)


async def run_with_session():
    """Run an agent that preserves login state across runs."""

    browser = Browser(
        config=BrowserConfig(
            headless=False,
            new_context_config=BrowserContextConfig(
                # Persist session to disk so you only log in once
                storage_state=str(SESSION_PATH / "session.json")
                if (SESSION_PATH / "session.json").exists()
                else None,
                save_storage_state=True,
            ),
        )
    )

    # First run: agent will navigate to the site and handle login if needed
    agent = Agent(
        task="""Go to github.com. If you're not logged in, go to github.com/login
        and log in with the credentials: username 'demo_user', password 'demo_pass'.
        After logging in, go to your notifications page and summarize the first 3 notifications.""",
        llm=ChatAnthropic(model="claude-3-5-sonnet-20241022"),
        browser=browser,
        # Sensitive — don't log actions that might contain credentials
        generate_gif=False,
    )

    result = await agent.run()
    print(result)

    # Save session state for next run
    context = await browser.new_context()
    await context.save_storage_state(str(SESSION_PATH / "session.json"))

    await browser.close()


if __name__ == "__main__":
    asyncio.run(run_with_session())

Web browser automation code with tabs open representing browser-use agent navigation

Step 6: Custom Browser Actions#

Extend the agent's default capabilities with custom actions using the @controller.action decorator:

# custom_actions.py
import asyncio
import csv
import json
from datetime import datetime
from pathlib import Path
from dotenv import load_dotenv
from browser_use import Agent, Controller
from browser_use.browser.browser import Browser, BrowserConfig
from langchain_openai import ChatOpenAI

load_dotenv()

controller = Controller()
collected_data = []


@controller.action("Save product data to CSV")
def save_product_to_csv(
    name: str,
    price: str,
    rating: str,
    review_count: str,
    url: str,
) -> str:
    """Save a product's data to the collection for CSV export.

    Args:
        name: Product name.
        price: Product price as displayed on the page.
        rating: Star rating (e.g., '4.5 out of 5').
        review_count: Number of reviews.
        url: Product page URL.
    """
    collected_data.append({
        "name": name,
        "price": price,
        "rating": rating,
        "review_count": review_count,
        "url": url,
        "scraped_at": datetime.now().isoformat(),
    })
    return f"Saved: {name} at {price}"


@controller.action("Export data to file")
def export_to_csv(filename: str = "products.csv") -> str:
    """Export all collected product data to a CSV file.

    Args:
        filename: Output filename for the CSV.
    """
    if not collected_data:
        return "No data collected yet."

    output_path = Path(filename)
    with open(output_path, "w", newline="", encoding="utf-8") as f:
        writer = csv.DictWriter(f, fieldnames=collected_data[0].keys())
        writer.writeheader()
        writer.writerows(collected_data)

    return f"Exported {len(collected_data)} products to {filename}"


async def scrape_products():
    browser = Browser(config=BrowserConfig(headless=True))

    agent = Agent(
        task="""Go to books.toscrape.com (a legal practice scraping site).
        Navigate through the first 2 pages of books.
        For each book, use 'Save product data to CSV' to record: name, price, rating, review count, and URL.
        After processing all books, use 'Export data to file' to save as 'books_data.csv'.""",
        llm=ChatOpenAI(model="gpt-4o"),
        browser=browser,
        controller=controller,
    )

    await agent.run(max_steps=40)
    await browser.close()

    print(f"\nFinal dataset: {len(collected_data)} products collected")


if __name__ == "__main__":
    asyncio.run(scrape_products())

Step 7: Production Configuration#

For production deployments, configure browser-use for reliability and observability:

# production_config.py
from browser_use import Agent
from browser_use.browser.browser import Browser, BrowserConfig
from browser_use.browser.context import BrowserContextConfig
from langchain_openai import ChatOpenAI


def create_production_agent(task: str) -> Agent:
    """Create a production-configured browser agent."""
    browser = Browser(
        config=BrowserConfig(
            headless=True,
            disable_security=False,
            extra_chromium_args=[
                "--no-sandbox",
                "--disable-dev-shm-usage",     # Required in Docker
                "--disable-gpu",                # Required in headless servers
                "--window-size=1920,1080",
            ],
            new_context_config=BrowserContextConfig(
                viewport={"width": 1920, "height": 1080},
                user_agent="Mozilla/5.0 (compatible; ResearchBot/1.0)",
                java_script_enabled=True,
                accept_downloads=False,         # Disable file downloads for security
            ),
        )
    )

    return Agent(
        task=task,
        llm=ChatOpenAI(
            model="gpt-4o",
            temperature=0,     # Deterministic for consistency
            timeout=60,
        ),
        browser=browser,
        max_failures=3,         # Retry limit per action
        retry_delay=2,          # Seconds between retries
        generate_gif=False,     # Disable GIF recording in production
    )

What's Next#

You have built a capable browser automation agent that works without brittle selectors. Recommended next steps:

Computer use agent: For full desktop automation (not just browsers), see the computer use agent tutorial
LangChain tools: Learn how LangChain's web tools compare to browser-use
Playwright directly: If you need deterministic automation, learn pure Playwright for cases where AI decision-making adds unnecessary latency
MCP integration: See connecting agents to MCP servers to expose browser-use capabilities as MCP tools
Tool use patterns: Read the tool use glossary entry for context on how the agent decides which browser actions to take

Frequently Asked Questions#

How does browser-use differ from traditional Playwright automation?

Which AI models work best with browser-use?

Can browser-use handle JavaScript-heavy single-page applications?

Is browser-use appropriate for high-volume production scraping?

How do I handle bot detection and CAPTCHAs?

Build a Browser Agent with browser-use

Build a Browser Automation Agent with browser-use

What You'll Learn#

Prerequisites#

Step 1: Project Setup#

Step 2: Your First Browser Agent#

Step 3: Structured Data Extraction#

Step 4: Multi-Step Web Workflows#

Step 5: Handling Authentication and Sessions#

Step 6: Custom Browser Actions#

Step 7: Production Configuration#

What's Next#

Frequently Asked Questions#

Build a Browser Agent with browser-use

Build a Browser Automation Agent with browser-use

What You'll Learn#

Prerequisites#

Step 1: Project Setup#

Step 2: Your First Browser Agent#

Step 3: Structured Data Extraction#

Step 4: Multi-Step Web Workflows#

Step 5: Handling Authentication and Sessions#

Step 6: Custom Browser Actions#

Step 7: Production Configuration#

What's Next#

Frequently Asked Questions#