Can Gemini CLI be used for free without any Google Cloud billing?

The free tier requires only a personal Google account for authentication, providing 60 requests per minute and 1,000 requests per day with no API key management or billing setup. The free tier uses a blend of Pro and Flash models with intelligent routing; most requests go through Flash, with Pro reserved for complex tasks. Exceeding these limits or getting dedicated Pro access requires a Google AI Pro or Ultra subscription, a Gemini Code Assist license, a Gemini API key, or Vertex AI authentication tied to a billing-enabled Google Cloud project.

Does Intent require an Augment Code subscription to use?

Intent supports BYOA (Bring Your Own Agent) without an Augment subscription, though an Augment Code account is still required. Developers can bring Claude Code, OpenAI Codex, or OpenCode for spec-driven development and agent orchestration. Full Context Engine access and the native Auggie agent require an Augment Code subscription.

Can Gemini CLI connect to non-Google models?

Gemini CLI is architecturally locked to Google's Gemini model family, including Pro, Flash, and their variants. No third-party LLM providers (Anthropic, OpenAI, local models) are supported through the tool's native configuration.

How does Gemini CLI's 1M token context compare to Intent's Context Engine for large codebases?

Gemini CLI's 1,048,576-token context window holds approximately 50,000 lines of code, but empirical stress testing shows code quality degradation beginning at 180K tokens. The Context Engine semantically indexes 400,000+ files and retrieves only relevant code per query, achieving a 12% hallucination rate compared to 28-31% for 1M token approaches in self-reported benchmarking.

Does Gemini CLI support automatic session saving?

Gemini CLI requires manual session management through /chat save and /chat resume commands. GitHub Issue #5101 proposes automatic session logging as a feature enhancement. Token compression during long sessions can permanently lose early conversation data.

Can both tools be used together?

Gemini CLI's MCP client support and the Context Engine MCP server create a viable integration path. Gemini CLI can access semantic codebase analysis through MCP configuration in ~/.gemini/settings.json, allowing developers to use Gemini CLI for terminal tasks while accessing architectural understanding from the Context Engine.

Gemini CLI vs Intent (2026): Google's Terminal Agent vs Spec-Driven Workspace

Intent is the better choice for teams managing complex, multi-service codebases that require parallel agent orchestration and semantic dependency analysis across 400,000+ files, while Gemini CLI is the stronger option for individual developers who need a free, terminal-native AI agent with a 1M token context window for focused, single-task workflows.

TL;DR

Gemini CLI gives individual developers a powerful free terminal agent with 60 requests per minute and a 1,048,576-token context window, using a blend of Gemini Pro and Flash models with intelligent routing. Intent provides a multi-agent workspace with coordinator-based orchestration, isolated git worktrees, and a Context Engine that semantically analyzes 400,000+ files. Gemini CLI handles focused tasks well; Intent handles work that requires structured coordination across agents, models, and services.

Two Different Approaches to AI-Assisted Development

The choice between Intent and Gemini CLI comes down to whether your work fits inside a single terminal session or requires structured orchestration across parallel workstreams. Both tools generate quality code on individual prompts, but the workflows they support diverge sharply once task complexity increases.

Evaluation across different project types over several weeks: focused bug fixes, multi-file refactors, and cross-service feature implementations spanning dozens of modules made the strengths of each tool clear. Gemini CLI is a scalpel: precise, quick, and free for focused work that fits inside one context window. Intent is an operating room: coordinated, parallel, and built for the kind of surgery where multiple specialists need to work on the same patient without conflict.

These tools also complement each other rather than compete directly. Intent can use Gemini CLI as an execution agent through its BYOA (Bring Your Own Agent) support and MCP integration. The real question is whether your work needs a single agent or a workspace that orchestrates many.

[ Coming up next ]

The New Code Review Workflow for AI-Native Engineering Teams

See how leading teams keep code review fast and rigorous as AI writes more of the code.

Save your seat

— Thu, Jul 9 // 9:45 AM PDT

Core Architecture: Single Agent vs. Coordinator Pattern

The architectural gap between these tools reflects different philosophies about how AI should participate in development workflows. Gemini CLI treats development as a conversation between one developer and one model. Intent treats development as a coordination problem requiring multiple specialized agents, shared state, and quality gates.

Gemini CLI operates as a single-agent terminal tool powered exclusively by Google's Gemini models. Every interaction runs through one model, one context window, and one conversation thread. The Gemini CLI repository provides access to Gemini models directly in the terminal with a 1,048,576-token context window, enough for roughly 50,000 lines of code at standard line lengths. The free tier uses a blend of Pro and Flash models with intelligent routing: Flash handles simpler operations while Google reserves Pro for complex reasoning tasks, and the CLI may fall back from Pro to Flash after slow responses.

Intent implements a coordinator-based multi-agent architecture. A Coordinator agent uses the Context Engine to understand the task and propose a plan as a living spec. The Coordinator breaks down that spec into tasks and delegates to Implementors that run in waves. A Verifier agent then checks results against the spec.

Intent's architecture includes six built-in specialist agent personas: Investigate, Implement, Verify, Critique, Debug, and Code Review. Each workspace runs in an isolated git worktree, so parallel agents cannot create merge conflicts or corrupt each other's state.

Dimension	Gemini CLI	Intent
Architecture	Single-agent, single-thread	Coordinator + specialist agents
Execution model	Sequential, one prompt at a time	Parallel waves with handoffs
Workspace isolation	None (single terminal session)	Isolated git worktrees per task
Planning approach	Periodic compression summary	Living specs that evolve in real-time
Quality gates	Manual developer review	Verifier agent validates against spec
Session persistence	Manual `/chat save` commands	Context preservation with pause/resume

The architectural difference matters most during complex tasks. Issue #7383 on the Gemini CLI repository documents five constraints developers encounter with the single-agent approach: no systematic planning, context loss during long operations, poor dependency handling, no progress visibility, and inability to resume interrupted work. Intent's coordinator pattern addresses each of these through its living spec and parallel delegation model.

Context Strategy: 1M Token Window vs. Semantic Analysis

This dimension produced the sharpest technical divergence between the two tools, and the research data reveals dynamics that initial impressions miss entirely.

Gemini CLI's Context Window Approach

Gemini CLI uses Gemini 2.5 Pro's full 1,048,576-token input capacity. The Gemini 2.5 documentation specifies support for up to 500 MB of input data with a maximum output of 65,535 tokens. That context window translates to approximately 50,000 lines of code, 8 average-length English novels, or multimodal inputs consuming 258 tokens per image, 263 tokens per second of video, and 32 tokens per second of audio. Note that full Pro access and the complete 1M token window are primarily available on paid tiers; the free tier routes most requests through Flash with limited Pro access for complex tasks.

For projects that fit comfortably within this window, the approach works well. Testing on a focused 15,000-line service showed Gemini CLI maintaining a coherent understanding of the entire codebase without degradation.

Intent's Context Engine Approach

The Context Engine takes a different strategy: semantic indexing rather than wholesale inclusion. The Context Engine processes 400,000+ files through semantic dependency analysis, building a knowledge graph that maps relationships, dependencies, and call trees across entire codebases. Full indexing completes in 6 minutes for 500,000+ files, with incremental updates taking 45 seconds.

Rather than dumping an entire codebase into a context window, the Context Engine retrieves only the relevant code for each query through semantic search. This approach goes beyond grep or keyword matching to provide cross-file contextual understanding spanning modules, services, and even languages (Python services calling Node.js APIs, for example).

Why Context Quality Outperforms Context Quantity

Empirical stress testing published by Kilo AI reveals a performance cliff that the raw token count obscures:

Normal code quality from 50K to 180K tokens
Degradation beginning at 180K+ tokens
Approximately 5-minute response times at 336K tokens
15 to 20-minute response times with frequent timeouts at 600K tokens

Beyond the 200K token threshold, models generate scattered good ideas but produce non-functional code with systematic issues: missing exports, simplified functionality to bypass errors, dead code, and improper error fixes.

Comparative benchmarking measured hallucination rates across different context architectures. These figures are self-reported and have not been independently verified:

Context Strategy	Hallucination Rate	Cost per Query	Memory Usage
Semantic retrieval (200K optimized)	12%	$0.08	24.4 GB
1M token context (Copilot)	28%	$0.42	122 GB
1M token context (Cursor)	31%	$0.38	122 GB

A blind study on the Elasticsearch repository (3.6M Java LoC) comparing 500 agent-generated pull requests showed +18.2% improvement in completeness and +12.4% in best practice adherence versus baseline approaches. These figures come from Augment Code's own evaluation and have not been independently replicated.

For focused work on smaller projects, Gemini CLI's 1M token window provides more than enough capacity. For enterprise codebases averaging 400,000+ files, where 73% of completions compile locally but violate patterns elsewhere, semantic analysis becomes the difference between code that compiles and code that integrates correctly.

Free Tier Reality: What 60 Requests per Minute Actually Gets You

Gemini CLI's free tier is generous and worth understanding in detail, because the nominal numbers and the effective throughput diverge in practice.

The free tier provides 60 requests per minute and 1,000 requests per day with personal Google account authentication. No API key management required. Automatic updates to the latest Gemini models. For a developer who wants a terminal-native AI agent without spending anything, this is the strongest free offering in the market.

An important nuance: the free tier does not provide dedicated Gemini Pro access. The free tier uses a blend of Pro and Flash, with intelligent routing that sends simpler operations to Flash and reserves Pro for complex reasoning tasks. After two or more slow responses, the CLI falls back to Flash for the remainder of the session. Paid tiers (Google AI Pro, AI Ultra, or Vertex AI authentication) provide more reliable access to Pro models and features like Deep Think mode and Google Search grounding.

The effective limits are also lower than the nominal numbers suggest. Google's quota documentation includes a critical note: "When in agent mode or when using the Gemini CLI, one prompt might result in multiple model requests." A complex agentic task that plans, executes, and validates can consume 3 to 5 requests per interaction. That reduces effective throughput to 12 to 20 usable interactions per minute.

When You Need More Than Free

Developers who exceed the free tier can authenticate through Vertex AI for enterprise-grade access. Vertex AI pricing breaks down as follows:

Model	Input (per 1M tokens)	Output (per 1M tokens)
Gemini 2.5 Pro (≤200K context)	$1.25	$10.00
Gemini 2.5 Pro (>200K context)	$2.50	$15.00
Gemini 2.5 Flash	$0.30	$2.50
Gemini 2.0 Flash	$0.15	$0.60

Enterprise rate limits scale through automatic tier progression based on 30-day spend: Tier 1 ($10 to $250 spend) provides 500,000 tokens per minute for Pro models, climbing to 2,000,000 tokens per minute at Tier 3 (>$2,000 spend).

Intent's Pricing Model

Intent takes a different approach to access. Developers can bring their own agent: Claude Code, Codex, or OpenCode, without an Augment Code subscription (an Augment Code account is still required). This path provides access to spec-driven development and agent orchestration for free, though the Context Engine and native Auggie agent require an Augment Code subscription.

The pricing comparison is asymmetric by design. Gemini CLI is a free single-agent tool; Intent is a paid multi-agent workspace. The question is whether orchestration, workspace isolation, and semantic analysis justify the cost for your specific workflow.

Model Flexibility: Google Lock-in vs. BYOA

Gemini CLI is architecturally locked to Google's Gemini models. Every request goes through the Gemini API. Supported models include Gemini 2.5 Pro, Gemini 2.5 Flash, Gemini 2.5 Flash-Lite, and the Gemini 3.x series: Gemini 3 Flash (available in CLI since December 2025), Gemini 3.1 Pro (rolling out in preview), and Gemini 3 Pro Image. No third-party LLM providers are supported.

Intent explicitly supports multi-provider flexibility. The BYOA architecture allows developers to bring Claude Code, OpenAI Codex, or OpenCode as execution agents. Users can mix models per task type: Opus for complex architecture, Sonnet for rapid iteration, GPT 5.2 for deep code analysis or code review.

This matters for four practical reasons:

Vendor risk: Google actively deprecates older models on fixed timelines. Gemini 2.0 Flash and Flash-Lite are scheduled for retirement on June 1, 2026, and Gemini 3 Pro Preview was shut down on March 9, 2026. A locked-in tool means workflows break on Google's timeline.
Task-specific selection: Different models have measurably different strengths. Claude excels at complex reasoning; GPT variants handle certain domain-specific tasks better. A single-model architecture forces compromise.
Rate limit mitigation: When one provider throttles mid-workflow, multi-model support enables switching providers instead of waiting.
Cost optimization: Different providers price differently per token. Routing simple tasks to cheaper models and complex tasks to premium models reduces overall spend.

Session Management and Persistence

How a tool handles session state determines whether complex, multi-day tasks remain coherent or require constant re-establishment of context. This gap between the two tools is significant for workflows that span more than a single sitting.

Open source

augmentcode/augment.vim★611

Star on GitHub

Gemini CLI provides manual session persistence through file-based commands. Developers can save sessions with /chat save <tag>, resume with /chat resume <tag>, or use gemini --resume to resume the most recent session. GEMINI.md context files auto-load every session, and the /memory add command persists information to global context files.

However, Issue #5101 on the Gemini CLI repository proposes enhancements to automatic chat session logging, identifying that users must remember to execute /chat save manually or risk permanent loss of conversation history. When chat history undergoes token compression, information from the beginning of the conversation is permanently lost.

Intent treats session persistence as a core architectural feature. Each workspace preserves full context, allowing developers to pause work, switch contexts, or hand off instantly. Living specs serve as a shared system of record, maintaining continuity across agents and sessions without manual intervention.

Session Feature	Gemini CLI	Intent
Session save/resume	Manual (`/chat save`, `/chat resume`)	Automatic per workspace
Context persistence	GEMINI.md files, `/memory` commands	Living specs + workspace state
Automatic logging	Not available (requested in Issue #5101)	Built-in
Token compression	Automatic, can lose early context permanently	Context Engine retrieves as needed
Multi-task context	Single thread per session	Isolated worktrees per task
Handoff support	Not supported	Coordinator handles agent handoffs

Gemini CLI as an Execution Agent Inside Intent

These tools are not mutually exclusive, which may be the most practical insight from this comparison. Intent's MCP integration and Gemini CLI's MCP client support create a viable path for using both tools together.

Gemini CLI supports MCP server integration through a discovery layer, OAuth 2.0 authentication for remote servers, and configuration via ~/.gemini/settings.json. The Context Engine MCP server makes semantic codebase understanding available to any MCP-compatible agent, including Gemini CLI.

The documented performance improvements from Context Engine MCP integration are substantial (measured across 300 Elasticsearch PRs with 900 attempts):

Cursor + Claude Opus 4.6: 71% improvement (completeness +60%, correctness +5x)
Claude Code + Opus 4.6: 80% improvement
Cursor + Composer-1: 30% improvement

A developer could use Gemini CLI for quick terminal tasks during focused work, then use Intent's workspace for complex multi-agent orchestration, with both tools accessing the same semantic understanding of the codebase through MCP.

When Free Is Enough vs. When You Need Orchestration

The decision framework comes down to three factors after evaluating both tools across different project types and scales: task complexity, codebase scale, and team coordination needs.

Choose Gemini CLI if:

Your work is focused and sequential. Bug fixes, single-feature implementations, and exploratory prototyping fit naturally into a single-agent terminal workflow. Independent reviews describe Gemini CLI as particularly strong for fast iterations and lightweight development tasks.
Your project spans a few hundred files or fewer. Gemini CLI's 1M token context window comfortably holds projects of this size, though performance may degrade as total token usage approaches the higher end of the window.
You work solo and budget matters. The free tier (60 RPM, 1,000 RPD) is the strongest zero-cost offering for terminal-native AI assistance, though most requests will be routed through Flash rather than Pro.
You prefer terminal-first workflows. If your development loop is shell-centric and you dislike context-switching to a browser or IDE, Gemini CLI stays in your existing environment.

Developers who mostly ship single-service changes and want a free tool that stays out of their way will find Gemini CLI hard to beat.

Choose Intent if:

Your tasks require parallel execution. Multi-service refactors, feature implementations spanning dozens of modules, and cross-team changes benefit from Intent's coordinator pattern delegating to specialist agents running in waves.
Your codebase spans thousands of files or multiple repositories. The Context Engine's semantic analysis delivers architectural understanding that raw token windows cannot match at scale, with 6-minute full indexing and 45-second incremental updates for codebases exceeding 400,000+ files.
You need model flexibility. Intent's BYOA support enables routing complex architecture tasks to Claude Opus, rapid iteration to Sonnet, and code review to GPT 5.2, selecting the right model per task rather than accepting Google-only access.
Quality gates matter for your workflow. Intent's Verifier agent validates implementation against living specs automatically, reducing the cleanup-and-verification overhead that often dominates agentic workflows. METR research reports that experienced developers took longer when using AI tools without structured verification.

Teams whose work routinely spans services and teams will find that the coordinator pattern plus workspace isolation pays back in fewer integration surprises.

Decision Factor	Gemini CLI	Intent
Best for codebase size	A few hundred files or fewer	Thousands of files, multi-repo
Best for task type	Focused, sequential tasks	Parallel, multi-service changes
Best for team size	Solo developers	Teams with coordination needs
Cost	Free (or Vertex AI pay-as-you-go)	Subscription (BYOA free for orchestration)
Model access	Google Gemini only (Pro/Flash blend on free tier)	Claude Code, Codex, OpenCode
Learning curve	Low (terminal commands)	Higher (workspace + spec-driven workflow)

Match the Tool to the Work

The core tension in this comparison separates a tool that handles one task well in isolation from a workspace that coordinates multiple agents across a codebase with architectural awareness. Gemini CLI's 60 RPM free tier and 1M token context window make it the strongest zero-cost terminal agent available, though free-tier users should expect Flash-level quality for most interactions with limited Pro access. Intent's coordinator pattern, Context Engine semantic analysis, and BYOA model flexibility solve the orchestration problems that single-agent tools structurally cannot address.

If the next step is a focused bug fix or a contained feature in a manageable codebase, open a terminal and use Gemini CLI. If the backlog includes cross-service refactors, multi-module feature rollouts, or architectural changes that need parallel specialist agents with quality gates, the structured workspace pays for itself in cleanup time alone.

Gemini CLI vs Intent (2026): Google's Terminal Agent vs Spec-Driven Workspace

TL;DR

Two Different Approaches to AI-Assisted Development

The New Code Review Workflow for AI-Native Engineering Teams

Core Architecture: Single Agent vs. Coordinator Pattern

Context Strategy: 1M Token Window vs. Semantic Analysis

Gemini CLI's Context Window Approach

Intent's Context Engine Approach

Why Context Quality Outperforms Context Quantity

Free Tier Reality: What 60 Requests per Minute Actually Gets You

When You Need More Than Free

Intent's Pricing Model

Model Flexibility: Google Lock-in vs. BYOA

Session Management and Persistence

Gemini CLI as an Execution Agent Inside Intent

When Free Is Enough vs. When You Need Orchestration

Choose Gemini CLI if:

Choose Intent if:

Match the Tool to the Work

FAQ

Written by

Molisha Shah

Give your codebase the agents it deserves

TL;DR

Two Different Approaches to AI-Assisted Development

The New Code Review Workflow for AI-Native Engineering Teams

Core Architecture: Single Agent vs. Coordinator Pattern

Context Strategy: 1M Token Window vs. Semantic Analysis

Gemini CLI's Context Window Approach

Intent's Context Engine Approach

Why Context Quality Outperforms Context Quantity

Free Tier Reality: What 60 Requests per Minute Actually Gets You

When You Need More Than Free

Intent's Pricing Model

Model Flexibility: Google Lock-in vs. BYOA

Session Management and Persistence

Gemini CLI as an Execution Agent Inside Intent

When Free Is Enough vs. When You Need Orchestration

Choose Gemini CLI if:

Choose Intent if:

Match the Tool to the Work

FAQ

Can Gemini CLI be used for free without any Google Cloud billing?

Does Intent require an Augment Code subscription to use?

Can Gemini CLI connect to non-Google models?

How does Gemini CLI's 1M token context compare to Intent's Context Engine for large codebases?

Does Gemini CLI support automatic session saving?

Can both tools be used together?

Related

Written by

Molisha Shah

Give your codebase the agents it deserves