Agent Browser
Add a real browser to a workspace so agents can use it through MCP, with live human handoff for logins, MFA, and CAPTCHAs.
Agent Browser
Some work has no API. Internal tools, legacy portals, and dashboards behind a login sometimes require a real browser. Agent Browser adds a hosted Chrome session to a workspace so agents can navigate, read, click, type, download files, and ask a human to take over the same live session when a password, MFA challenge, or CAPTCHA appears.
For most teams, setup is simple: add agent-browser as a workspace integration, then create an MCP
server for that workspace. The agent receives browser tools the same way it receives Slack, Gmail,
Filesystem, State KV, Sandbox, and other workspace integrations.
It is not OAuth, and it is not a CAPTCHA bypass. The human signs in directly inside the browser; the agent never receives the password or MFA secret.
By default, browser profile persistence is scoped to the current end user. Pass your user's external
ID as endUserId when agents should keep using that user's saved sign-in state. If a workflow is
intentionally shared by the workspace or by a custom namespace, set settings.persistence on the
workspace integration.
Use agent-browser for deterministic browser tools with no connection auth.
Add agent-browser-ai only when you want natural-language browser actions
backed by your own LLM provider key.
Primary Path: Workspace Integration + MCP
Add Agent Browser to the workspace
Create a workspace integration with a stable alias such as browser. No
connection is required for the deterministic browser tools.
Create an MCP server for the workspace
Use Code Mode when you want compact search, read, and execute
meta-tools. Use Tool Mode when the MCP client should see every browser
action as an individual tool.
Let the agent use the browser
The first browser action starts a hosted session automatically. Later actions reuse the same live session for that workspace and end user when one is available.
Hand control to a human when needed
When a login, MFA challenge, or CAPTCHA blocks progress, the agent calls
request_human. Weavz returns a viewer link so a person can control the
same browser and then hand it back with resume.
Add It To A Workspace And Create MCP
curl -X POST https://api.weavz.io/api/v1/workspaces/YOUR_WORKSPACE_ID/integrations \
-H "Authorization: Bearer wvz_your_api_key" \
-H "Content-Type: application/json" \
-d '{
"integrationName": "agent-browser",
"alias": "browser",
"displayName": "Agent Browser",
"settings": { "persistence": { "scope": "end_user" } }
}'
curl -X POST https://api.weavz.io/api/v1/mcp/servers \
-H "Authorization: Bearer wvz_your_api_key" \
-H "Content-Type: application/json" \
-d '{
"name": "Browser Agent Workspace",
"workspaceId": "YOUR_WORKSPACE_ID",
"mode": "CODE",
"authMode": "oauth",
"endUserAccess": "restricted"
}'How Sessions Are Handled
Agent Browser uses browser sessions behind the scenes, but normal MCP and action callers do not need to create sessions manually.
| Behavior | What happens |
|---|---|
| First browser action | Starts a hosted browser session automatically when no live session is available |
| Later browser actions | Reuse the active session for the same workspace and end user |
sessionId omitted | Weavz uses the auto-managed session for that caller context |
sessionId provided | The action targets that explicit session after access checks |
request_human | Mints a viewer link and blocks agent control while the human is driving |
resume | Returns control to the agent |
end_session | Ends the live session and snapshots the profile to the configured persistence scope |
Pass the end user's externalId as endUserId when you want per-user browser identity. End-user
scoped sessions can reuse saved sign-in state across runs. Workspace and external scopes reuse the
same live browser session and saved browser profile for the configured workspace integration scope.
Batched Code Mode Workflows
When Agent Browser is exposed through a Code Mode MCP server, agents should batch related browser
operations inside one weavz_execute call. A single run can navigate, inspect the page, click or
type by snapshot ref, take a screenshot, and return the observations the agent needs. This is faster
and more reliable than one execute call per browser action.
const session = await weavz.browser.start_session({ headless: true })
await weavz.browser.navigate({
sessionId: session.sessionId,
url: 'https://app.example.com',
})
const snapshot = await weavz.browser.snapshot({ sessionId: session.sessionId })
const status = await weavz.browser.read_text({
sessionId: session.sessionId,
target: '#status',
}).catch(() => null)
const screenshot = await weavz.browser.screenshot({
sessionId: session.sessionId,
quality: 55,
})
return {
sessionId: session.sessionId,
snapshot: String(snapshot.snapshot).slice(0, 3000),
status,
screenshot: {
mimeType: screenshot.mimeType,
width: screenshot.width,
height: screenshot.height,
},
}Use sessionId across separate runs only when the workflow needs incremental state, human handoff,
or a later follow-up. For unfamiliar pages, the most robust loop is snapshot, choose an element
ref such as e5, then call click, type, read_text, or screenshot with that ref.
Persistence Scopes
Agent Browser and Agent Browser AI use the same settings.persistence object as Filesystem and
State KV:
| Scope | Use when |
|---|---|
end_user | Each of your users should keep a separate browser identity. This is the default and requires endUserId on calls. |
workspace | The browser identity is intentionally shared by the workspace. |
external | You want a custom namespace such as a tenant, project, or account key. Set settings.persistence.externalId. |
{
"integrationName": "agent-browser",
"alias": "browser",
"settings": {
"persistence": {
"scope": "workspace"
}
}
}Deterministic Browser Actions
Use these actions directly through REST or the SDK when you are not going through MCP, or when you want to test the integration before connecting an MCP client.
curl -X POST https://api.weavz.io/api/v1/actions/execute \
-H "Authorization: Bearer wvz_your_api_key" \
-H "Content-Type: application/json" \
-d '{
"workspaceId": "YOUR_WORKSPACE_ID",
"integrationName": "agent-browser",
"integrationAlias": "browser",
"actionName": "navigate",
"endUserId": "user_123",
"input": { "url": "https://app.acme.com" }
}'The deterministic action set includes snapshot, navigate, navigate_back, click, type,
fill_form, select_option, hover, drag, press_key, file_upload, evaluate, read_text,
read_html, screenshot, wait_for, handle_dialog, tabs, request_human, resume,
start_session, session_status, and end_session.
Screenshots
screenshot returns a browser image envelope:
{
"mimeType": "image/jpeg",
"width": 1280,
"height": 720,
"imageContent": "base64-encoded JPEG",
"url": "hosted screenshot URL"
}MCP tool calls also include an MCP image content item, so agents can inspect the screenshot directly
without fetching the hosted URL. The URL is included when Filesystem can store the image for human
viewing or downstream download. By default screenshots are JPEG quality 60 at agent-friendly scale;
set fullResolution only when the agent or your backend needs the original device-scale image.
Optional LLM Driver
Natural-language browser actions live in a separate integration: agent-browser-ai. This integration
requires a connection because it uses your LLM provider key. If you do not add agent-browser-ai, the
auth-free agent-browser tools still work.
When configuring Agent Browser AI, choose the provider and model from dropdowns and store the API key
as a secret connection value. The deterministic agent-browser integration does not require auth.
agent-browser-ai provides:
act- complete a natural-language browser task by looping over snapshots and browser actions.extract- extract structured data from the current page.observe- identify relevant page elements without taking action.
curl -X POST https://api.weavz.io/api/v1/actions/execute \
-H "Authorization: Bearer wvz_your_api_key" \
-H "Content-Type: application/json" \
-d '{
"integrationName": "agent-browser-ai",
"actionName": "act",
"workspaceId": "YOUR_WORKSPACE_ID",
"endUserId": "user_123",
"input": { "instruction": "Open the latest invoice and download the PDF" }
}'Human Handoff
Use request_human when the browser reaches a step the agent should not complete.
curl -X POST https://api.weavz.io/api/v1/actions/execute \
-H "Authorization: Bearer wvz_your_api_key" \
-H "Content-Type: application/json" \
-d '{
"integrationName": "agent-browser",
"actionName": "request_human",
"workspaceId": "YOUR_WORKSPACE_ID",
"integrationAlias": "browser",
"endUserId": "user_123",
"input": { "reason": "Login or MFA required" }
}'
curl -X POST https://api.weavz.io/api/v1/actions/execute \
-H "Authorization: Bearer wvz_your_api_key" \
-H "Content-Type: application/json" \
-d '{
"integrationName": "agent-browser",
"actionName": "resume",
"workspaceId": "YOUR_WORKSPACE_ID",
"integrationAlias": "browser",
"endUserId": "user_123",
"input": {}
}'In user state, viewer clicks, typing, scrolling, paste, and navigation control the live page. Agent
browser actions are blocked until control returns to agent.
Restrict Browsing
Pass allowedHosts to start_session or the first browser action in a run when a workflow should
stay inside a known set of domains.
{
"allowedHosts": ["app.acme.com", "*.acme-cdn.com"]
}Omit allowedHosts for unrestricted browsing.
Session Lifecycle
Agent Browser manages the hosted browser session behind the workspace integration. The first browser
action starts a session for the workspace and end user, later actions reuse it, and end_session
releases it when the workflow is finished.
When an action includes endUserId, it must be an existing end user
external ID in that workspace. This scopes browser identity and saved sign-in state to that user.
Use request_human to mint a fresh human viewer link for login, MFA, CAPTCHA, or payment steps, then
call resume after the person completes the step.