Agentic coding goes mainstream as IDE agents execute

The shift from chat to execution

For the past two years, developers have chatted with AI to draft tests, explain errors, and generate snippets. That phase is giving way to something operational. In May and June 2025, GitHub and Google shipped IDE‑native agents that do not just answer. They plan tasks, change code across multiple files, run builds and tests in a sandbox, and present their work for approval.

On May 19, 2025 GitHub introduced GitHub's new coding agent. It spins up an isolated workspace via Actions, iterates in a background session, and pushes a draft pull request that requires human approval. It is the first time many teams will see an AI teammate that runs unattended but cannot merge without a review. The agent also shows detailed reasoning logs so you can trace decisions and validate changes.

A few weeks earlier and then through June, Copilot’s Agent Mode moved from Insiders to stable in Visual Studio Code, expanding from chat into plan‑and‑execute workflows inside the editor. Around the same time, Google used I O 2025 to debut Agent Mode in Android Studio. Gemini’s agent plans multi‑step work, makes edits across the project, runs the build, and pauses for approvals before applying changes. Auto approve exists if you flip a switch, but the default keeps you in control.

The upshot is clear. Agentic coding moved from interesting demos to default IDE capabilities that are ready to try on real repositories.

What these agents actually do

Agentic systems revolve around three core loops:

Plan: From a high level prompt like add offline support or upgrade the networking stack, the agent decomposes work into ordered steps. It locates relevant modules, gathers context from repo rules, issues, and tests, and drafts a plan that includes code edits and tool calls.
Act: The agent edits multiple files, creates new ones when needed, rewrites imports, and updates configuration. It may synthesize tests or test scaffolds, adjust mocks, and regenerate fixtures. In VS Code, it can suggest terminal commands or tool invocations and ask you to run them. In Android Studio, it can kick off a Gradle build and iterate until a specific error is resolved.
Verify: The agent runs lint, unit tests, and builds in a controlled environment. It logs what it tried and why. If verification fails, it loops, applies fixes, and runs again. When it believes the task is complete, it hands the result back as staged changes for your review.

A new approve to apply workflow

The most significant UI change is not a button. It is a gate. Instead of copy pasting diffs from a chat sidebar, the agent proposes a change set and waits. You review the plan steps and the diffs, then decide.

In GitHub, the agent works in a draft branch and PR. Organization policies, branch protections, and required reviewers still apply. By design, CI and deployment workflows do not run until a human approves the agent’s PR. That adds a protective control without bending existing governance.
In Android Studio, Gemini pauses for your consent before applying edits. You can approve changes one by one, accept the whole proposal, or request revisions. Auto approve exists for solo prototyping, but the default is review first.

This approve to apply pattern reframes AI from a chat toy into a contributor that works within your SDLC. The agent can do the work. You decide what ships.

Design differences you will notice in practice

Although they rhyme, GitHub’s and Google’s implementations make different bets.

Where they run
- GitHub’s coding agent runs a fresh environment per task in Actions. It clones the repo, installs dependencies, runs builds and tests, and pushes commits to a draft branch. That environment is isolated and easy to audit. VS Code’s Agent Mode operates in the IDE, orchestrating multi file edits and tool calls with your local or remote dev setup.
- Gemini’s Agent Mode is built into Android Studio. It integrates with Gradle, Logcat, the emulator, and UI tooling. It fetches files through the IDE, applies edits, and can drive your Android build to verify fixes.
How they plan and execute
- GitHub leans on a background session that explains each step in logs attached to the PR. The session can incorporate repository instructions and issue context to follow your standards. It is comfortable with low to medium complexity tickets in well tested codebases and it prefers features with clear acceptance tests.
- Gemini’s Agent Mode lays out a step list and then obtains permission to use tools that search files, read code, call Model Context Protocol servers, and perform refactors. If a build fails, it captures the error, proposes a fix, and tries again until verification passes.
How you approve
- GitHub fits review into the PR you already use. The agent tags you when the PR is ready and will respond to review comments with revisions. You can read logs that show its reasoning and validations.
- Android Studio shows a change review panel before code lands. You can approve individual actions or the full change set. It is optimized for IDE first loops where the developer wants to stay in the editor.
Model choices and tooling
- Copilot emphasizes choice. Agent Mode in VS Code supports multiple top tier models and can attach external tools via the Model Context Protocol. The coding agent also benefits from MCP, which lets you grant access to specific tools and data sources while keeping a tight security envelope. For a broader view of hybrid model strategies, see the multi model enterprise playbook.
- Gemini’s Agent Mode is powered by Gemini models and also supports MCP servers. That means Android shops can add custom tools for internal docs, build telemetry, or bug trackers while keeping the agent grounded in your environment. The default experience is designed for mobile projects that live and build in Android Studio.
Guardrails
- GitHub enforces branch protection and review policies. The agent can only push to branches it creates. CI and deployment do not run until a human approves, which stops a compromised or mistaken agent from touching production. Internet access for the agent can be limited to approved destinations and everything is logged. For hardening patterns, review agent security stack insights.
- Google’s IDE workflow requires explicit user approval for code application. You can enable auto approve for experiments, but the normal path is human in the loop. The agent’s tool usage is permissioned step by step, which reduces the chance of surprise edits or destructive actions.

What this means for enterprises

Pull request velocity
With an agent able to draft PRs for chores, bug fixes, and refactors, teams can keep the mainline moving without pulling seniors off feature work. The most immediate gains show up in lead time reduction for small to medium tickets.
Dependency and platform upgrades
Repetitive upgrade work is a natural fit. An agent can bump a dependency, adjust breaking changes, regenerate configs, run tests, and propose the PR. The upgrade still gets a thorough review, but the boilerplate is offloaded.
Security patching and hygiene
When a vulnerability drops, the agent can search for vulnerable imports, make the minimum safe change, run your suite, and post a PR with the reasoning and logs attached. That shortens exposure windows while keeping a full audit trail.
Cross repo consistency
Agents paired with repository instructions bring consistency to how code is structured, how tests are written, and how docs are formatted. They can apply patterns at scale while respecting codeowners.
Onboarding and knowledge capture
The logs, plans, and diffs double as training material. New joiners can learn project standards by watching the agent justify changes and react to review comments.

Pitfalls to expect and plan for

Hallucinated edits
Agents can still infer the wrong intent and confidently propose bad changes. Keep tests reliable and expect to send revisions.
Context limits
Agents do not ingest your entire monolith. If your repo structure and docs are chaotic, the plan may miss important modules. Invest in repository instructions, clear project READMEs, and maintainable code maps.
Build flakiness
Agents iterate until builds and tests pass. Flaky tests burn time and credits. Fix the flakes before you scale agents.
Cost visibility
Premium model calls and long sessions can rack up costs. Set budgets, rate limits, and model selection rules by task type.
Secret management
Do not give agents broad access to production credentials. Restrict egress, use dedicated service identities, and prefer read only access where possible.
Code ownership and review load
A flood of agent PRs can overwhelm reviewers. Use codeowners, auto assignment, and batch windows to smooth the queue.

A pragmatic playbook to pilot without breaking governance

Choose the right work
Start with small, deterministic tasks. Bug fixes with clear repro steps, simple feature flags, docs improvements, and well understood dependency bumps. Avoid complex migrations and stateful changes until your team has muscle memory.
Set hard guardrails

GitHub: require approvals from codeowners for agent PRs, keep branch protections strict, and enforce status checks. Disable secrets access in the agent environment unless a task clearly needs it. Limit outbound network access.
Android Studio: keep auto approve off by default. Require a human to confirm each tool action that changes files or runs a build.

Sandbox the runtime
Use ephemeral runners or isolated devcontainers for the agent’s work. Mount only the repo and test resources. Do not mount production data. Log everything the agent does and forward those logs to your SIEM.
Ground the agent with context
Add repository instructions that encode your coding standards, testing requirements, and commit message format. Link to architecture docs. Keep the issue descriptions crisp with acceptance criteria and test hints. In Android Studio, configure rules so Gemini formats output and follows your team’s styles.
Define a reviewer playbook
Teach reviewers how to read agent logs, what to skim, and what to scrutinize. Require a quick risk classification before merge. If a change touches sensitive surfaces, escalate to a senior codeowner.
Measure what matters
Track baseline metrics for four weeks, then run an agent pilot for four to six weeks on the same repos.

Throughput: PRs merged per week and merged lines per author hour.
Speed: mean lead time from issue open to merge.
Quality: first pass review accept rate, rework rate within seven days, and test pass rate on first CI run.
Security: time to patch for high severity issues.
Cost: premium model calls per merged PR and average agent session time.

Run true A and B tests
Use similar tickets and split them between agent assisted and human only paths. Hold weekly readouts with the team to capture qualitative feedback and failure modes.
Create a rollback plan
Decide in advance when to turn the agent off. For example, if rework exceeds a threshold or if a production incident is traced to an agent merge. Make the kill switch a documented step with owners.
Expand by template
Once the pilot clears your thresholds, publish a template repo with repository instructions, CI settings, MCP tool configuration, and review rules. Scale to more teams by copying the template and adjusting only what is unique.

What to watch next

More IDEs by default
Copilot’s Agent Mode is moving beyond VS Code into other popular IDEs, and Android Studio’s Gemini features are expanding across testing, UI, and cloud hosted workflows. Expect agent experiences to feel standard in 2026 era toolchains.
Stronger planning and verification
Planning quality and test generation will keep improving. Look for tighter integration with coverage analysis, flaky test detection, and change impact analysis so agents can route risky edits to humans faster.
Deeper tool ecosystems
MCP servers make it easy to add capabilities. Enterprises will build adapters for internal APIs so agents can learn from bug databases, incident runbooks, and deployment telemetry without broad internet access. For the broader platform angle, see enterprise agent stack with AWS.
Policy as code for AI changes
Approve to apply creates the hook to codify AI policies. Expect org wide rulesets that restrict which repos agents can touch, what tasks they can attempt, and what evidence is required in the PR body before review.

Agentic coding is not magic. It is a set of loops that move work forward while keeping humans in charge of what ships. With guardrails and measurement, these agents can compress the grind of software delivery and let developers spend more time on design, architecture, and hard problems. The winners will not be the teams that throw agents at everything. They will be the teams that teach the agent to work like their own, then scale that playbook with discipline.