Browser LLM Stack
Browser-native model work without pretending every useful workflow has a clean API.
What this is
Some of the best model workflows still live behind a browser session: logged-in research tools, subscription-only UIs, file-upload features, visual review, and draft publishing surfaces. A browser LLM stack gives your agent a controlled Chromium lane for those jobs while keeping API calls as the default path when they are available.
The pattern is simple: run Chromium in a service-owned display, keep one persistent profile per provider or workflow, expose noVNC for human login and inspection, and wrap every fragile profile in a lock.
Why this way
API-first is still the cleanest default. It gives you structured responses, stable auth, lower operational risk, and fewer moving parts. Use it whenever the API has the feature you need.
The browser lane exists for the cases APIs do not cover well:
- subscription entitlements attached to a web account
- research answers that need visible citations from a logged-in provider
- long-context or multimodal web UI features that are not exposed through API
- upload and visual workflows that require the provider’s first-party UI
- draft creation in web apps where auto-publish would be reckless
- manual review of exactly what the model saw before you trust the output
The key design choice is to treat browser automation as an explicit lane, not a hidden fallback. It should have its own health checks, profile policy, locks, artifacts, and failure modes.
Prerequisites
- Linux host with systemd user services
- Chromium or Chrome installed
- Playwright or equivalent browser automation library
- A virtual display service such as Xvfb
- Optional VNC/noVNC bridge for inspection and re-auth
- Comfort with file permissions, locks, and service logs
Do not start here if you only need ordinary completions, embeddings, code edits, or image generation. Start with provider APIs, ACP escalation, or local models, then add the browser lane for the gaps.
Before / After
Before:
- One-off browser scripts open whatever profile happens to be available.
- Auth breaks silently when cookies expire.
- Two jobs can collide on the same Chromium profile.
- Screenshots and page state vanish after failure.
- Cron jobs treat UI automation like any other stateless command.
After:
- Each provider or workflow has a named browser lane.
- Persistent profiles are stored outside the repo and treated like credentials.
flockserializes access to each profile.- Humans can inspect or refresh login state through noVNC.
- Browser skills return structured output plus artifacts.
- Cron jobs know browser work is slow, stateful, and lock-bound.
Implementation
1. Define the lanes
Create lanes around workflow ownership, not around vague provider names. If two jobs can safely share cookies and never run at the same time, they can share a lane. If they have different risk, artifact, or concurrency needs, split them.
Example lane map:
| Lane | Purpose | Notes |
|---|---|---|
research-primary | logged-in research UI | returns answer text and source URLs |
long-context-review | large upload or multimodal review | stores uploaded filenames in artifacts |
visual-generation-review | web-only visual workflows | use native image APIs first |
draft-publishing | web editor draft creation | saves drafts only, never publishes |
Keep the real account name, email address, billing tier, and recovery details out of the repo. Public guides should describe the lane shape, not the private account.
2. Store profiles outside the repo
Use a private state path with restrictive permissions:
install -d -m 700 ~/.config/agent-browser/profiles
install -d -m 700 ~/.local/state/agent-browser/locks
install -d -m 700 ~/.local/state/agent-browser/artifacts
Recommended layout:
~/.config/agent-browser/
profiles/
research-primary/
long-context-review/
visual-generation-review/
draft-publishing/
~/.local/state/agent-browser/
locks/
research-primary.lock
long-context-review.lock
artifacts/
research-primary/
long-context-review/
Treat a browser profile like a password vault. It contains live cookies, session tokens, extension state, history, and provider-specific account metadata.
3. Run a service-owned display
The browser should not depend on a developer’s interactive desktop. A common setup is:
- Xvfb provides a stable virtual display.
- Chromium runs headed inside that display.
- x11vnc exposes the display locally.
- noVNC gives a browser-accessible viewer for re-auth and inspection.
Keep the concrete port and bind address in local config, not in the guide or repo. If you publish a template, use placeholders:
[Service]
Environment=DISPLAY=<display>
Environment=NOVNC_LISTEN=<bind-address>:<vnc-port>
Bind the viewer to loopback or a private control plane unless you have a real access-control layer in front of it. Browser sessions are authenticated control surfaces.
4. Lock every persistent profile
Chromium persistent profiles are not safely multi-writer. Playwright’s launchPersistentContext will fail or corrupt behavior if two jobs try to open the same profile at once.
Wrap every lane command in flock:
lane="research-primary"
lock="${XDG_STATE_HOME:-$HOME/.local/state}/agent-browser/locks/${lane}.lock"
profile="${XDG_CONFIG_HOME:-$HOME/.config}/agent-browser/profiles/${lane}"
mkdir -p "$(dirname "$lock")" "$profile"
flock -w 900 "$lock" \
node scripts/browser-lane.mjs --lane "$lane" --profile "$profile" "$@"
Use one lock per lane. Do not use a global browser lock unless every browser job truly blocks every other browser job.
5. Put a skill boundary around the browser
The orchestrator should not know how to click the provider UI. It should call a narrow skill or script that owns:
- profile path selection
- lock acquisition
- browser launch and timeout policy
- login-state detection
- screenshot and trace capture
- provider-specific scraping
- structured output
Good browser skills return JSON shaped like this:
{
"ok": true,
"lane": "research-primary",
"stage": "complete",
"text": "Short answer or saved draft summary.",
"sources": ["https://example.com/source"],
"artifacts": ["~/.local/state/agent-browser/artifacts/research-primary/run-id/page.png"],
"needsHuman": false,
"durationMs": 42000
}
When auth expires, return a structured pause instead of pretending the model failed:
{
"ok": false,
"lane": "research-primary",
"stage": "auth_required",
"needsHuman": true,
"message": "Open the noVNC viewer for this lane and refresh the login."
}
6. Use native tools first
The browser lane should not become a habit for jobs that have clean tool paths.
Use native APIs or built-in tools for:
- ordinary chat, coding, and summarization
- embeddings and search indexing
- deterministic file processing
- normal image generation
- jobs that need structured outputs with strict schemas
Use browser automation only when the browser session is part of the requirement.
7. Make publish flows draft-only
Browser automation can type into real web apps, so it needs a hard publish boundary.
For social posts, articles, newsletters, and CMS work:
- generate the content
- open the editor
- fill a draft
- attach screenshots or source links as artifacts
- stop before the final publish action
- return a
needsHumanreview result
The automation should not click destructive, financial, external-send, or public-publish buttons unless the operator explicitly asked for that specific action in the live session.
Verification
Check that the display service exists and is healthy:
systemctl --user status agent-browser-display.service
Confirm profile and lock paths are private:
find ~/.config/agent-browser ~/.local/state/agent-browser -maxdepth 2 -type d -printf '%m %p\n'
Expected profile directory permissions are 700 or stricter.
Confirm lane locks work:
lane=research-primary
lock="${XDG_STATE_HOME:-$HOME/.local/state}/agent-browser/locks/${lane}.lock"
flock -n "$lock" true
If a job is running, the non-blocking lock check should fail. If nothing is running, it should exit cleanly.
Run a smoke query through the wrapper:
node scripts/browser-lane.mjs \
--lane research-primary \
--mode smoke \
--query "Return one sentence and one source URL."
Expected result:
- JSON output includes
ok,lane,stage, anddurationMs - screenshots or traces are saved for failures
- source URLs are returned when the provider exposes them
- auth failures return
stage: "auth_required"andneedsHuman: true
Run two concurrent jobs against the same lane and verify one waits or exits with a lock timeout. Run two jobs against different lanes and verify they can proceed independently if the machine has enough resources.
Gotchas
-
Persistent profiles are credentials. Do not commit them, back them up casually, or expose them through shared mounts. A copied profile can carry live sessions.
-
Provider UIs drift. Browser selectors will break. Prefer accessible labels, visible text, and stable role selectors where possible, then capture screenshots on every failure.
-
Human login is part of the system. Auth expiry is not a crash. Return a structured
auth_requiredstate and route the operator to the viewer. -
Shared profiles create fake concurrency. A shared profile may work for weeks, then fail under one overlapping cron job. Lock by profile, not by script.
-
noVNC is a control surface. If someone can reach the viewer, they can interact with authenticated browser sessions. Keep it local, tunneled, or protected.
-
Uploads leave residue. Web UIs often keep recent files, drafts, or history. Scrub test data and keep sensitive uploads out of public demos.
-
Browser answers need source discipline. Research skills should return source URLs or screenshots, not just model prose from the page.
-
Cron needs longer patience. Browser jobs need bigger timeouts, fewer retries, and clear lock behavior. Fast retry loops can stack up behind a single expired login.
Templates
../templates/ai-stack/browser-lane-lock.sh- tinyflockwrapper for lane commands../templates/ai-stack/acp-wrapper.mjs- wrapper shape for subprocess-owned escalation tools../templates/scrubbers/- publish-boundary scrubber skeleton and fixtures
Related
multi-model-orchestration.md- where the browser lane fits among APIs, ACP, and local modelsskills-development.md- package browser workflows as discoverable skillssession-management.md- isolate long-running browser work from chat sessions../automation/cron-patterns.md- schedule slow stateful browser jobs safely../automation/hooks.md- add scrubbers and tool-call boundaries around outbound content../infrastructure/openclaw-host-topology.md- host-level audit points for browser automation