Browser Becomes an Agent: Gemini in Chrome and A2A

The week the browser became an agent

For a decade the browser has been the place where we do things. It is now becoming the thing that does things. Google has begun rolling out Gemini inside Chrome on desktop in the United States, with agentic features on the roadmap that let it read a page, hop across tabs, and act for you on websites. In Google’s words, the official Chrome AI revamp is both a product update and a declaration of intent.

At the same time Google is promoting an open Agent2Agent protocol that allows agents from different vendors to talk to one another, discover capabilities, and coordinate. Put those pieces together and you start to see the outlines of an interoperable agent ecosystem where the single chat screen gives way to a network of specialized helpers that meet in your browser to get work done. For background on the design, see the A2A protocol specification.

From chatbot to browser native executor

A chatbot can answer a question. An agent can turn an intent into a series of steps. Gemini in Chrome aims to do the latter by living where the steps actually happen, inside the DOM of the page, across tabs, and with context from your Google services. Booking, applying, comparing, reconciling, and filing are classic browser chores. The new model collapses those chores into a conversational layer that can interpret a page, suggest the next best action, and then carry it out with your consent.

Three shifts matter:

Placement: The agent is first class inside the browser chrome, not a floating overlay or separate app.
Context: It can see what you see, plus what you saw a moment ago in another tab, which matches real workflows.
Action: The roadmap points to form filling, navigation, stateful follow ups, and long running tasks you can resume.

When the execution surface is the browser, an agent does not need brittle automations built on top of screenshots or fragile scripts. It can use structure the browser already understands and rely on the runtime for permissions, identity, and network control. This lowers the cost of reliable automation and raises expectations for what everyday users will accept. For a grounding overview, see our agentic browsing primer.

A2A is the missing piece for multi agent workflows

Single agents hit limits quickly. Your personal agent may be great at reading web pages, but not at filing an expense in your company ERP or reconciling an invoice against a vendor API. The A2A protocol addresses this with a standardized way for agents to find each other, declare what they can do, negotiate user experience, hand off tasks, and keep state while they collaborate. It complements tool protocols rather than replacing them by defining how independent agents describe capabilities, exchange requests, and coordinate securely.

Interoperability matters for two reasons. First, enterprises already live in a world of many vendors, so a protocol that lets a sales agent, an HR agent, and a procurement agent coordinate across systems reduces integration toil. Second, a protocol introduces predictable boundaries for permissions and identity. If your sales agent needs a finance agent to approve a credit limit, it should not impersonate a human or scrape a back door. It should make a request the finance agent can verify and audit.

How it compares to OpenAI and Anthropic

OpenAI: Tool calling and agent runtimes already let developers orchestrate multi step workflows and attach tools that a model can invoke automatically. What does not exist yet in common use is an inter vendor handshake for agents to discover each other and coordinate across company boundaries. Teams still build per vendor integrations or rely on a shared data layer. For practical patterns, see our internal Assistants API playbook.
Anthropic MCP: The Model Context Protocol standardizes how models attach to tools, APIs, and external resources. Think of MCP as the plug for giving an agent structured access to data and actions. It does not focus on inter agent routing or UX handshakes between independent agents. In that sense, MCP and A2A are complementary. For a deeper overview, read our Model Context Protocol guide.

The pattern that emerges is a stack. At the bottom are tool adapters and data connectors that MCP helps rationalize. Above that are agents with planning and memory. On top sits an inter agent protocol that lets them collaborate without collapsing into one vendor’s walled garden.

What this unlocks for developers

Developers should think in terms of marketplaces of capabilities rather than monolithic assistants. That shifts day to day concerns in five ways:

Capability discovery and contracts

Publish capability manifests with inputs, outputs, prerequisites, and cost estimates.
Treat capabilities like APIs with versioning and deprecation, exposed through natural language as well.
Provide testable stubs and sandboxes so other agents can validate behavior before production use.

UX negotiation

Decide when your agent takes over the UI, shows a card, or asks for a human decision.
Expose a preference profile for interaction style such as terse versus explanatory, batch versus incremental, immediate execution versus review before submit.
Support receipts and progress updates so hosting surfaces like the browser can render stable, cancelable tasks.

Long running tasks and reliability

Design jobs that survive tab closes and computer sleep. Persist intermediate artifacts and expose a resume token.
Build idempotent actions and compensating steps so partial execution can roll forward or roll back safely.
Log key decisions with rationales for later audit and debugging.

Security and trust

Adopt least privilege by default. Ask for the minimum scopes your capability needs and show what will be used.
Sign capability manifests and responses so hosts can verify agent identity and integrity end to end.
Provide cost and data use disclosures up front so other agents can plan within a budget.

Evaluation and billing

Move beyond accuracy only. Evaluate task success rate, time to completion, human time saved, and variance.
Price per action or per successful outcome, not just tokens. Offer predictable pricing for common workflows.

Enterprise implications: governance without killing velocity

Enterprises will love interoperable agents and will fear uncontrolled automation. The path forward is governance that feels like enablement rather than lockdown.

Policy as code for agents: Define what classes of tasks are allowed and where human in the loop is mandatory. Use allowlists for counterpart agents by capability and vendor.
Central capability registry: Maintain an internal catalog of approved agent capabilities with ownership, contact, and SLA. Let teams submit new capabilities for review the same way they would an API.
Identities and scopes: Tie every agent to a service identity in IAM, with fine grained scopes mapped to business roles. Rotate credentials automatically and support short lived session tokens.
Data boundaries: Label data by sensitivity and jurisdiction. Require that inter agent requests include data classification metadata so receiving agents can enforce residency and retention rules.
Observability and audit: Capture structured traces of plans, tool calls, and outcomes. Keep redacted copies of prompts and responses for compliance. Provide replay tooling that can reproduce decisions from logged state.
Red teams and change windows: Treat new agent capabilities like production changes. Run adversarial tests. Roll out gradually behind flags. Measure behavior under load and in failure modes.

With those guardrails, teams can let browser resident agents do real work such as vendor onboarding or contract triage while staying inside compliance lines.

What this means for search and ads

If more tasks complete inside Chrome without a classic search results page, the query substrate that powers ads changes. Three shifts look likely:

From clicks to completions: When an agent books a service or fills a form, the measurable unit becomes a completed action rather than a click. Expect more cost per action pricing and attribution that credits agent plans.
Inventory moves upstream: The browser becomes a premium surface where sponsored options may appear inside agent suggestions when intent is high. That raises questions about labeling, fairness, and bidding rules inside agent ranked plans.
Vertical competition: Marketplaces, aggregators, and brands will compete to be preferred counterpart agents. An airline agent might negotiate directly with a travel planner agent inside Chrome, bypassing a traditional search session. Winning distribution may look like publishing excellent, verifiable capabilities rather than chasing SEO.

Google will need to show that agent recommendations are transparent, that sponsored placement is clearly labeled, and that organic options remain competitive. Regulators will watch how default surfaces like the Chrome UI present choices, how browsing data flows into ad systems, and whether competing agents can participate on reasonable terms.

Risks, consent, and the UX we will accept

Agentic browsing fails if users feel out of control. A good default experience should:

Ask for scoped consent just in time, not one blanket grant. Show what fields will be filled, what accounts will be touched, and what will be sent.
Provide a readable plan before execution and a receipt afterward. Let users copy the plan, see which steps were automated, and verify the outcome.
Offer easy cancel and undo. Long running tasks must be pausable, inspectable, and reversible when possible.
Preserve a clear mental model. Users should always know whether they are typing to the browser, to a page, or to a particular agent.

On the developer side, resist opaque magic. Favor interfaces that make planning and state visible. Give your agent a personality that is helpful but calm. Remember that trust is earned through reliability, good defaults, and clear explanations.

What to build next

Browser native workflows: Start with tedious, multi tab tasks where you can remove time and friction such as benefits enrollment, reimbursements, or vendor quote comparisons.
Interop first backends: Publish capability manifests for core systems and consume others through a protocol like A2A. Design agents to either perform a task directly or refer it when another agent is better qualified.
Guardrails as a service: Build libraries for consent prompts, signed capability cards, policy checks, and audit trails that any agent can drop in.
Evaluation harnesses: Ship test suites that simulate noisy pages, network glitches, CAPTCHAs, and partial failures. Benchmark your agent on realistic obstacles rather than ideal demos.

The browser becoming an agent is more than convenience. It is a re architecture of how software collaborates across companies and across the open web. A2A puts shape around that collaboration. MCP keeps tools and data within reach. OpenAI and others will keep pushing agent runtimes forward. The winners will treat agents like first class software components, publish clear capabilities, and earn trust through transparent execution.

We will still type into boxes and click on links. We will also increasingly tell the browser what needs to be accomplished and watch as a network of agents negotiates, plans, and delivers. When that happens reliably, the web starts to feel like a programmable substrate and the browser becomes the orchestrator we have always needed.