Vitess: Cloud-Native Distributed MySQL | Augment Code

Overview

Relevant Files

README.md
doc/Vision.md
go/vt/vitessdriver/doc.go

Vitess is a cloud-native, horizontally-scalable distributed database system built around MySQL. It enables unlimited scaling through generalized sharding while keeping application code and database queries agnostic to data distribution across multiple servers.

What is Vitess?

Vitess transforms MySQL/MariaDB into a fast, scalable, and highly-available distributed database. Originally developed as a core component of YouTube's database infrastructure from 2011, it has since been adopted by major companies including Slack, Square (now Block), and JD.com. The system allows you to split and merge shards as your needs grow, with atomic cutover steps taking only seconds.

Core Architecture

Loading diagram...

VTGate is the central proxy server that routes queries to appropriate shards. It breaks up queries, routes them to the correct shards, and combines results, giving the appearance of a unified database. The V3 API doesn't require specifying routing information—VTGate analyzes queries and uses VSchema metadata to perform routing automatically.

Key Priorities

Scalability: Achieved through replication and sharding across multiple servers
Efficiency: Proxy servers multiplex queries into fixed-size connection pools and optimize updates
Manageability: Tools backed by a lockserver (ZooKeeper) track and administer distributed infrastructure
Simplicity: VTGate provides a unified view that hides complexity from applications

Design Trade-offs

Vitess balances scalability with practical constraints:

Consistency: Replica reads offer eventual consistency and better scalability; primary reads provide read-after-write consistency
Transactions: Guaranteed per-keyspace-id (single shard); distributed transactions require application-level sequencing
Latency: Proxy servers introduce minimal latency while extracting more throughput from MySQL

Preserved MySQL Features

Since MySQL remains the underlying storage layer, Vitess preserves important relational features:

Secondary Indexes: Efficiently query rows using multiple keys
Joins: Split relational data across tables and join on demand for efficient storage

Getting Started

The Go SQL driver makes connecting to Vitess straightforward:

import "vitess.io/vitess/go/vt/vitessdriver"

db, err := vitessdriver.Open("localhost:15991", "@primary")
// Use db via the standard Go sql interface

For more details, see the full example and local tutorial.

Architecture & Core Components

Relevant Files

go/vt/vtgate/vtgate.go
go/vt/vtgate/executor.go
go/vt/vtgate/tabletgateway.go
go/vt/vttablet/tabletserver/tabletserver.go
go/vt/topo/server.go
doc/design-docs/LifeOfAQuery.md

Vitess is a distributed MySQL proxy layer that transparently shards databases across multiple MySQL instances. The architecture consists of four main layers: clients, VTGate (proxy), VTTablet (tablet server), and MySQL backends, coordinated by a topology service.

Core Components

VTGate is the query routing proxy that clients connect to. It receives SQL queries, parses them, builds execution plans, and routes them to the appropriate shards. VTGate maintains an in-memory cache of the cluster topology and uses the Executor to coordinate query execution across multiple tablets.

VTTablet wraps each MySQL instance and provides a standardized query interface. It handles query validation, plan generation, ACL checks, connection pooling, and result caching. VTTablet also manages replication state and coordinates with the topology service to report tablet health.

Topology Server (etcd, Zookeeper, or Consul) stores cluster metadata: keyspace-to-shard mappings, shard-to-tablet assignments, tablet endpoints, and VSchema (virtual schema) definitions. VTGate polls this server to maintain an up-to-date view of the cluster.

Discovery & Health Check continuously monitors tablet health via RPC calls. The TabletGateway uses this information to route queries only to healthy tablets and automatically handle failovers.

Query Execution Flow

Loading diagram...

Key Subsystems

Plan Builder transforms SQL into an optimized execution plan. It uses the VSchema to determine which shards contain relevant data, applies query rewrites, and generates a tree of engine primitives (Route, Join, Aggregation, etc.). The core philosophy is pushing as much work as possible down to MySQL shards rather than processing in the proxy layer.

ScatterConn executes queries across multiple shards in parallel. It manages per-shard connections, handles transaction state, and coordinates retries on recoverable errors. For transactional queries, it maintains transaction IDs per shard and ensures consistency.

VSchema Manager maintains the virtual schema that defines keyspaces, tables, sharding keys (vindexes), and shard routing rules. It watches for schema changes and rebuilds the routing information dynamically.

Query Cache stores execution plans keyed by normalized SQL. This avoids expensive re-planning for repeated queries and significantly improves throughput.

Sharding & Routing

Vitess uses vindexes (virtual indexes) to determine which shard contains a row. Common vindexes include hash-based (consistent hashing), lookup-based (external mapping table), and range-based routing. The planner uses vindex information to route queries: single-shard queries execute directly, while multi-shard queries scatter across relevant shards and merge results.

Resilience

VTGate retries queries on transient errors (connection timeouts, tablet unavailability). The buffer module temporarily queues queries during planned maintenance or failovers, reducing client-visible errors. Health checks continuously monitor tablet state and automatically remove unhealthy tablets from the routing pool.

Query Routing & Execution

Relevant Files

go/vt/vtgate/executor.go
go/vt/vtgate/engine/route.go
go/vt/vtgate/engine/routing.go
go/vt/vtgate/planbuilder/planner.go
go/vt/vtgate/planbuilder/operators/sharded_routing.go
go/vt/vtgate/executorcontext/vcursor_impl.go

Query routing determines which shard(s) receive a query, while execution coordinates the actual query dispatch and result aggregation. This two-phase process is central to Vitess's distributed query handling.

Query Execution Flow

When a query arrives at VTGate, it follows this pipeline:

Parse & Plan - The SQL is parsed and a query plan is built using the planner
Route - The plan determines which shard(s) need the query
Execute - The query is sent to the identified shard(s)
Aggregate - Results are collected and returned to the client

The Executor in executor.go orchestrates this flow. It calls getCachedOrBuildPlan() to get or build a plan, then executePlan() to run it. Plans are cached by query hash to avoid repeated planning.

Routing Opcodes

The routing layer uses opcodes to describe how a query should be distributed:

Unsharded - Send to unsharded keyspace (single destination)
EqualUnique - Route to one shard using a unique vindex (e.g., WHERE id = 5)
Equal - Route using a non-unique vindex (may hit multiple shards)
IN - Route with multiple values (e.g., WHERE id IN (1, 2, 3))
Scatter - Send to all shards (e.g., SELECT * FROM users)
Reference - Route to any shard (reference tables exist everywhere)
DBA - System queries (information_schema, etc.)

The opcode determines the plan type: PlanPassthrough (single shard), PlanMultiShard (multiple shards), or PlanScatter (all shards).

Routing Decision Logic

The planner's ShardedRouting operator analyzes WHERE predicates to find the best vindex to use. It evaluates predicates like:

WHERE user_id = 42          → EqualUnique (if user_id is unique vindex)
WHERE user_id IN (1, 2, 3)  → IN opcode
WHERE name = 'alice'        → Equal (if name is non-unique vindex)

If no vindex predicate is found, the query defaults to Scatter (all shards). The planner picks the lowest-cost routing option available.

Execution & Shard Resolution

The Route primitive executes the plan. Its TryExecute() method:

Calls findRoute() to resolve routing parameters to actual shards
Uses the vindex to map values to ResolvedShard objects
Calls executeShards() to dispatch queries in parallel

The VCursorImpl handles the actual execution via ExecuteMultiShard(), which sends queries to tablets and aggregates results. Buffering retries handle transient shard unavailability.

Plan Caching & Optimization

Plans are cached by query hash. For prepared statements, VTGate can build an optimized plan using actual bind variable values, enabling better routing decisions. The PlanSwitcher can dynamically choose between baseline and optimized plans at runtime.

Loading diagram...

Sharding & Vindexes

Relevant Files

proto/vschema.proto
go/vt/vtgate/vindexes/vindex.go
go/vt/vtgate/vindexes/ (all vindex implementations)
go/vt/vtgate/planbuilder/operators/sharded_routing.go
examples/region_sharding/

Vitess distributes data across multiple database shards using vindexes (virtual indexes). A vindex is a mapping function that determines which shard contains a given row based on a column value. Together, sharding and vindexes enable horizontal scaling while maintaining query transparency.

Core Concepts

Keyspace ID (KSID): Every row in a sharded keyspace has an associated keyspace ID, a binary value that determines its shard. Vindexes map column values to keyspace IDs.

Shard Ranges: Shards are identified by key ranges (e.g., 80- means all keyspace IDs from 0x80 to 0xFF). The full range of keyspace IDs is partitioned across shards with no overlap.

Vindex Types: Vitess provides multiple vindex implementations:

Hash-based: hash, xxhash, unicode_loose_xxhash – Hash column values to keyspace IDs. Fast, uniform distribution, but no semantic meaning.
Numeric: numeric – Direct bit-pattern mapping for uint64 values. Cost 0 (no computation).
Lookup: lookup, lookup_unique, consistent_lookup – Query an external lookup table to map values to keyspace IDs. Higher cost but enables custom routing logic.
Region-based: region_json, region_experimental – Multi-column vindexes for geo-partitioning. First column determines region prefix; second is hashed.
Binary: binary – Identity function; used for pinned tables with pre-assigned keyspace IDs.

Vindex Configuration

Vindexes are defined in the VSchema (virtual schema) as part of a keyspace definition:

{
  "sharded": true,
  "vindexes": {
    "user_hash": {
      "type": "xxhash"
    },
    "order_lookup": {
      "type": "lookup_unique",
      "params": {
        "table": "order_lookup_tbl",
        "from": "order_id",
        "to": "keyspace_id"
      },
      "owner": "orders"
    }
  },
  "tables": {
    "users": {
      "column_vindexes": [
        {
          "column": "user_id",
          "name": "user_hash"
        }
      ]
    }
  }
}

Query Routing

When a query arrives, the planner's ShardedRouting operator analyzes WHERE predicates to select the best vindex:

WHERE user_id = 42          → EqualUnique (single shard)
WHERE user_id IN (1, 2, 3)  → IN (multiple shards)
WHERE name = 'alice'        → Equal (if name is a vindex)
SELECT * FROM users         → Scatter (all shards)

The planner picks the lowest-cost vindex that matches the query predicates. If no vindex predicate exists, the query scatters to all shards.

Execution Flow

Map: Vindex Map() function converts column values to keyspace IDs (or key ranges).
Resolve: ResolveDestinations() maps keyspace IDs to actual shards using shard key ranges.
Execute: Query is dispatched to resolved shards in parallel; results are merged.

Loading diagram...

Unique vs. Non-Unique Vindexes

Unique vindexes guarantee one keyspace ID per value (e.g., user_id). Queries with equality predicates on unique vindexes route to a single shard.

Non-unique vindexes may map one value to multiple keyspace IDs. Queries scatter across multiple shards and results are merged.

Reversible Vindexes

Some vindexes (e.g., hash, numeric) support reverse mapping: given a keyspace ID, recover the original column value. This enables VTGate to fill missing column values during INSERT operations.

Resharding

When splitting or merging shards, Vitess uses vreplication workflows to copy data from source shards to target shards. The vindex function remains consistent; only the shard key ranges change. Queries automatically route to the new shards after the migration completes.

Replication & Binlog Streaming

Relevant Files

go/vt/binlog/binlog_connection.go
go/vt/binlog/binlog_streamer.go
go/vt/vttablet/tabletserver/vstreamer/vstreamer.go
go/vt/vttablet/tabletserver/vstreamer/engine.go
go/vt/vttablet/tabletmanager/vreplication/engine.go
go/vt/vttablet/tabletmanager/vreplication/controller.go
go/vt/vttablet/tabletmanager/vreplication/vreplicator.go
go/vt/vttablet/tabletmanager/vreplication/vplayer.go
go/vt/vttablet/tabletmanager/vreplication/vcopier.go
proto/binlogdata.proto

Vitess replication uses a multi-layered architecture to stream binlog events from source tablets to target tablets. This enables workflows like MoveTables, Reshard, and cross-cluster replication.

Core Components

BinlogConnection establishes a MySQL replication connection to a source server. It assigns a unique server ID, sends a binlog dump command, and streams raw MySQL binlog events through channels. The connection handles position-based and timestamp-based streaming.

VStreamer (in tabletserver) transforms raw MySQL binlog events into filtered, schema-aware VEvents. It parses binlog events, applies table filters, extracts primary keys, and groups events into transactions. VStreamer runs on the source tablet and serves as the streaming endpoint for replication consumers.

VReplication Engine (in tabletmanager) manages replication workflows on target tablets. It maintains a _vt.vreplication table storing workflow metadata and state. Each workflow has a controller that manages the replication lifecycle.

Replication Workflow Phases

Loading diagram...

Init Phase: Workflow starts with no position. The system inserts table names into _vt.copy_state to mark which tables need copying.

Copy Phase: VCopier reads table data from the source using snapshot connections, applying filters and transformations. Data is inserted into target tables in parallel batches. After each table completes, it's removed from copy_state. The copy phase respects throttling based on replication lag and InnoDB history length.

Replicate Phase: VPlayer streams binlog events from VStreamer, applies them to the target, and updates the replication position. Events are grouped into transactions and applied with proper ordering guarantees.

Data Flow

Loading diagram...

Key Concepts

Filtering: Rules in binlogdata.Filter specify which tables and columns to replicate. VStreamer applies these filters before sending events, reducing network overhead.

Position Tracking: Replication positions use GTIDs (Global Transaction IDs) or file:offset notation. The system tracks both the source position and target position to detect lag.

Throttling: Replication respects configured limits on transactions-per-second and replication lag. If lag exceeds thresholds, the copy phase pauses to allow the target to catch up.

Atomicity: Transactions are applied atomically on the target. If a transaction fails, the entire transaction is rolled back and retried.

Workflow State Management

Workflows transition through states stored in _vt.vreplication: Init → Copying → Running → Stopped or Error. Controllers monitor state changes and react accordingly. Errors trigger automatic retries with exponential backoff.

Backup & Restore

Relevant Files

go/vt/mysqlctl/backup.go
go/vt/mysqlctl/backupengine.go
go/vt/mysqlctl/builtinbackupengine.go
go/vt/mysqlctl/xtrabackupengine.go
go/vt/mysqlctl/mysqlshellbackupengine.go
go/vt/mysqlctl/backupstorage/interface.go
go/vt/vttablet/tabletmanager/rpc_backup.go
go/vt/vttablet/tabletmanager/restore.go

Vitess provides a pluggable backup and restore system that supports multiple backup engines and storage backends. This enables flexible disaster recovery and tablet initialization strategies.

Backup Engines

Vitess supports three backup engines, each with different characteristics:

Builtin Engine (builtin) - The default engine that uses MySQL's native capabilities. It supports both full and incremental backups, can run online (without shutting down MySQL), and uses file-level parallelization for performance.

Xtrabackup Engine (xtrabackup) - Uses Percona's xtrabackup tool for physical backups. Provides fast, consistent backups with minimal locking. Supports streaming to various storage backends.

MySQL Shell Engine (mysqlsh) - Uses MySQL Shell's dump and load utilities. Useful for logical backups and cross-version compatibility.

Backup Flow

Loading diagram...

The backup process starts with a request to the tablet manager, which selects the appropriate backup engine. The engine collects database files, writes them to the configured backup storage (file, S3, GCS, etc.), and creates a MANIFEST file containing metadata like the replication position and backup timestamp.

Restore Flow

Loading diagram...

Restore operations locate the appropriate backup, read its MANIFEST to determine which engine created it, and use that engine to restore the data. For incremental backups, the system also applies binary logs to reach a specific point in time.

Backup Storage Backends

The BackupStorage interface abstracts storage implementation. Supported backends include:

File - Local filesystem or NFS
S3 - Amazon S3 or S3-compatible services
GCS - Google Cloud Storage
Azure Blob - Azure Blob Storage
Ceph - Ceph object storage

Each backup is organized as keyspace/shard/tablet-timestamp with a MANIFEST file containing replication position, backup method, and file checksums.

Key Parameters

Concurrency - Controls parallel file processing during backup and restore. Higher values speed up operations but increase resource usage.

IncrementalFromPos - When set, triggers an incremental backup from the specified replication position instead of a full backup.

DeleteBeforeRestore - If true, removes existing data before restoring. Used when restoring to an already-running tablet.

RestoreToPos / RestoreToTimestamp - Enable point-in-time recovery by applying binary logs up to a specific position or timestamp.

Backup Result States

Backups return one of three states: BackupUsable (successful), BackupEmpty (no data to backup), or BackupUnusable (failed). Empty backups are cleaned up automatically to avoid cluttering storage.

Schema Management & DDL

Relevant Files

go/vt/schemadiff - Schema diffing and analysis
go/vt/schemamanager - Schema change orchestration
go/vt/vttablet/onlineddl - Online DDL execution engine
go/vt/vttablet/tabletserver/schema/engine.go - Schema tracking and caching
go/vt/schema/ddl_strategy.go - DDL strategy configuration

Vitess provides a sophisticated schema management system that handles DDL (Data Definition Language) operations safely and efficiently across distributed clusters. The system supports multiple execution strategies and includes intelligent schema diffing to detect and apply changes safely.

Core Components

Schema Engine (schema/engine.go) maintains an in-memory cache of table schemas on each tablet. It tracks table metadata, columns, indexes, and partitions, and automatically reloads the schema when changes are detected via replication. The engine notifies subscribers of schema changes, enabling dependent systems to react appropriately.

Schema Diff (schemadiff/) compares two schemas and generates a rich diff that includes dependency analysis. It detects table renames heuristically, identifies foreign key constraints, and orders DDL statements to maintain schema validity at each step. The SchemaDiff type provides OrderedDiffs() which returns a safe execution sequence.

Schema Manager (schemamanager/) orchestrates schema changes across a keyspace. It implements a Controller-Executor pattern where Controllers read schema changes from various sources (files, UI, etc.) and Executors apply them to tablets. The system validates changes before execution and tracks results per shard.

DDL Strategies

Vitess supports four DDL execution strategies, configured via @@ddl_strategy:

Direct - Immediate MySQL ALTER TABLE on all tablets (blocking, no replication)
Online/Vitess - VReplication-based migration (non-blocking, safe for large tables)
MySQL - Managed migration queued by the scheduler, runs as direct ALTER TABLE
Instant - MySQL 8.0.12+ ALGORITHM=INSTANT when available (zero-copy, instant)

The executor analyzes each ALTER statement and may choose a special plan: instant DDL for compatible changes, range partition rotation for partitioned tables, or standard vreplication for general cases.

Online DDL Execution

The onlineddl/Executor is a state machine that manages long-running migrations. It:

Submits migrations to the _vt.schema_migrations table
Monitors progress via vreplication streams
Handles throttling based on replica lag
Supports cut-over thresholds and postponement
Manages artifact cleanup (shadow tables, triggers)
Supports revert operations for rollback

Migrations run concurrently when safe (CREATE/DROP operations, or ALTERs with compatible strategies). The executor auto-adopts orphaned migrations from failed tablets.

Schema Change Flow

Loading diagram...

Key Features

Safe Ordering - Dependencies between DDL statements are analyzed and respected
Revertible Migrations - Online DDL supports rollback via REVERT statements
Throttling - Migrations respect replica lag thresholds to prevent replication delays
Concurrent Execution - Compatible operations run in parallel for efficiency
Artifact Management - Shadow tables and triggers are automatically cleaned up
Cross-Shard Coordination - Changes propagate consistently across all shards

Topology & Service Discovery

Relevant Files

go/vt/topo/server.go - Core topology server interface and factory pattern
go/vt/srvtopo/resilient_server.go - Caching layer for topology queries
go/vt/discovery/topology_watcher.go - Periodic topology polling and tablet discovery
go/vt/discovery/healthcheck.go - Health monitoring and tablet status tracking
proto/topodata.proto - Core topology data structures

Vitess maintains a distributed topology service that tracks cluster metadata: keyspaces, shards, tablets, and their health status. This system enables VTGate to route queries intelligently and handle failovers automatically.

Topology Server Architecture

The topology service uses a pluggable backend pattern supporting etcd, Zookeeper, and Consul. The topo.Server maintains two connections: one to the global topology (for cluster-wide metadata) and one per cell (for local tablet information). Implementations register themselves via RegisterFactory(), allowing runtime selection through the --topo-implementation flag.

// Global topology stores keyspace definitions, shard mappings, and VSchema
// Local cell topology stores tablet records and health information
topo.Server {
  globalCell       // Consistent read/write to global metadata
  globalReadOnlyCell // Optional read-only replicas (e.g., Zookeeper observers)
  cellConns        // Per-cell connections for local data
}

Resilient Caching Layer

The ResilientServer wraps the base topology server with intelligent caching to reduce load and improve resilience. It maintains three watchers:

SrvKeyspaceWatcher - Watches serving graph changes (shard-to-tablet mappings per cell)
SrvVSchemaWatcher - Watches virtual schema updates
SrvKeyspaceNamesQuery - Caches the list of available keyspaces

Each uses a dual-TTL strategy: entries refresh every srv-topo-cache-refresh (default 1s), but remain valid for srv-topo-cache-ttl (default 1s) even if the backend is unavailable. This allows queries to proceed during brief topology service outages.

Tablet Discovery & Health Checking

The TopologyWatcher polls the topology periodically (configurable interval) to discover tablets in a cell. When changes are detected, it notifies the HealthCheck interface:

TopologyWatcher {
  refreshInterval      // How often to poll topology
  refreshKnownTablets  // Whether to re-fetch existing tablets
  tablets              // Map of alias -> tablet info
}

The HealthCheck maintains a cache of tablet health by keyspace/shard/type. For each tablet, it:

Establishes a gRPC connection
Streams health updates via StreamHealth RPC
Tracks replication lag, serving status, and primary term
Notifies subscribers of state changes (used by VTGate buffer for failover)

HealthCheck {
  healthByAlias    // All tablets by alias
  healthData       // Tablets grouped by keyspace.shard.type
  healthy          // Only healthy tablets per target
  subscribers      // Listeners for primary changes
}

Data Flow

Loading diagram...

Key Concepts

Cells - Logical groupings of tablets (datacenters). Each cell has its own topology server connection for low-latency access.

SrvKeyspace - Serving graph entry mapping a keyspace to its shards and tablet types in a specific cell. Updated when shards are added/removed or tablets change type.

TabletAlias - Globally unique tablet identifier: cell-uid (e.g., us-east-0-100).

Tablet Types - PRIMARY (accepts writes), REPLICA (serves reads), BATCH/RDONLY (long-running queries), SPARE, BACKUP, RESTORE, DRAINED.

The topology system is eventually consistent: changes propagate through watchers within seconds, but clients may briefly see stale data. VTGate handles this gracefully by retrying failed queries against updated tablet lists.

SQL Parsing & Analysis

Relevant Files

go/vt/sqlparser/parser.go
go/vt/sqlparser/analyzer.go
go/vt/sqlparser/normalizer.go
go/vt/sqlparser/token.go
go/vt/sqlparser/ast.go
go/vt/vtgate/semantics/analyzer.go

SQL parsing and analysis in Vitess transforms raw SQL strings into structured representations that can be routed, optimized, and executed. This process happens in multiple stages, each adding semantic understanding to the query.

Parsing: String to AST

The parser converts SQL text into an Abstract Syntax Tree (AST) using a YACC-based grammar. The main entry point is Parser.Parse2(), which returns both the AST and any bind variables found in the query.

stmt, bindVars, err := parser.Parse2("SELECT * FROM users WHERE id = :id")

The parser uses a tokenizer to scan the SQL string and a pooled parser implementation for efficiency. It handles multiple statements, partial DDL, and various SQL dialects. The resulting AST contains typed nodes like Select, Insert, Update, representing the query structure.

Statement Classification

Before full parsing, Vitess can quickly classify statements using Preview(), which examines only the first few tokens. This enables fast routing decisions without full parsing overhead.

stmtType := sqlparser.Preview(sql)
// Returns: StmtSelect, StmtInsert, StmtUpdate, StmtDDL, etc.

Normalization: Query Standardization

The normalizer transforms the AST to enable efficient query planning and execution. It serves two key purposes:

Parameterization - Converts literals to bind variables, allowing plan reuse across queries with different values
Pattern Standardization - Normalizes expressions (e.g., ensuring columns appear on the left side of comparisons)

result, err := sqlparser.Normalize(
    ast,
    reservedVars,
    bindVars,
    true, // parameterize
    "keyspace",
    0,    // selectLimit
    "",   // setVarComment
    sysVars,
    fkChecksState,
    views,
)

The normalizer also rewrites special functions like LAST_INSERT_ID(), DATABASE(), and FOUND_ROWS() into bind variables for vtgate handling.

Semantic Analysis

The semantic analyzer in go/vt/vtgate/semantics builds a semantic table that tracks table dependencies, column types, and routing information. It validates that the query is executable and determines if it can be routed to a single shard.

semTable, err := semantics.Analyze(stmt, currentDb, schemaInfo)

The analyzer performs multiple passes: early table collection, binding column references to tables, type inference, and foreign key validation. It produces routing errors if the query cannot be executed on a single shard.

AST Traversal and Rewriting

Vitess provides utilities for traversing and modifying ASTs. The Rewrite() function applies visitor functions during tree traversal, enabling transformations like predicate pushdown or expression simplification.

newAST := sqlparser.Rewrite(ast, walkDown, walkUp)

This architecture enables Vitess to parse, validate, optimize, and route queries efficiently while maintaining compatibility with MySQL syntax.

Administration & Monitoring

Relevant Files

go/vt/vtctl - Legacy command-line tool for cluster management
go/vt/vtctld - gRPC server exposing vtctl commands
go/cmd/vtctldclient - Modern CLI client for vtctld
go/vt/vtadmin - Web UI and API for multi-cluster management
go/vt/wrangler - Core library for topology operations
proto/vtctlservice.proto - gRPC service definitions

Vitess provides a layered administration stack for managing clusters, keyspaces, shards, and tablets. The tools range from low-level command-line utilities to high-level web interfaces.

Architecture Overview

Loading diagram...

Core Components

Wrangler is the foundation layer. It manages complex topology operations like reparenting, resharding, and backups. The Wrangler coordinates with the topology server and tablet managers to execute administrative tasks.

vtctld exposes Wrangler functionality via gRPC. It implements the Vtctld service with endpoints for keyspace, shard, tablet, and workflow operations. Multiple vtctld instances can run for redundancy.

vtctldclient is the modern CLI tool. It communicates with vtctld via gRPC and provides commands organized by resource type: tablets, keyspaces, shards, workflows, and backups.

VTAdmin is the unified management interface. It discovers and manages multiple Vitess clusters simultaneously, providing both a web UI and REST/gRPC API. It uses pluggable discovery mechanisms (Consul, static files, or dynamic configuration).

Key Operations

Tablet Management: Ping tablets, refresh state, reload schemas, execute queries as DBA or app user, manage replication.

Keyspace & Shard Operations: Create/delete keyspaces and shards, validate consistency, rebuild serving graphs, manage VSchema routing rules.

Workflows: Execute MoveTables and Reshard workflows with actions like Create, SwitchTraffic, ReverseTraffic, Complete, and Cancel.

Backups & Recovery: Trigger backups, restore from backups, manage backup storage, verify backup integrity.

Schema Migrations: Launch online schema migrations, track progress, cancel or retry migrations.

VTAdmin Cluster Discovery

VTAdmin discovers cluster components through pluggable discovery implementations:

Consul Discovery: Queries Consul service catalog for vtctld and vtgate instances
Static File Discovery: Reads cluster topology from JSON configuration files
Dynamic Discovery: Accepts cluster configuration via HTTP requests at runtime

Each cluster in VTAdmin maintains a Cluster object containing a discovery service, database connection, and vtctld client proxy. The API aggregates results across clusters for multi-cluster operations.

Command Categories

Tablets: InitTablet, GetTablet, Ping, RefreshState, RunHealthCheck, ExecuteFetchAsApp, ExecuteFetchAsDba

Shards: CreateShard, GetShard, ValidateShard, DeleteShard, EmergencyFailoverShard, PlannedFailoverShard

Keyspaces: CreateKeyspace, GetKeyspaces, ValidateKeyspace, RebuildKeyspaceGraph, ApplyRoutingRules

Workflows: MoveTables, Reshard, Materialize, Migrate with full lifecycle management

Generic: Validate (global consistency check), ListAllTablets, GetCellsAliases, GetTopologyPath