Install

pingcap/tidb

TiDB: Distributed SQL Database

Last updated on Dec 18, 2025 (Commit: 865730d)

Overview

Relevant Files
  • README.md
  • cmd/tidb-server/main.go
  • pkg/config/config.go
  • pkg/domain/domain.go
  • pkg/server/server.go
  • pkg/store/driver/tikv_driver.go

TiDB is an open-source, cloud-native, distributed SQL database designed for high availability, horizontal and vertical scalability, strong consistency, and high performance. It provides MySQL 8.0 compatibility while supporting modern distributed database features.

Core Architecture

TiDB follows a three-layer architecture:

  1. TiDB Server (SQL Layer) - Handles MySQL protocol connections, SQL parsing, optimization, and execution
  2. TiKV (Storage Layer) - Distributed key-value store providing ACID transactions and Raft-based replication
  3. PD (Placement Driver) - Cluster management and metadata service

The server separates compute from storage, enabling independent scaling of both layers.

Key Features

  • Distributed Transactions: Two-phase commit protocol ensures ACID compliance across multiple nodes
  • HTAP Support: Dual storage engines—TiKV (row-based) and TiFlash (columnar)—for transactional and analytical workloads
  • High Availability: Raft consensus protocol with automatic failover and multi-replica data replication
  • Horizontal Scalability: Add nodes without downtime; data automatically rebalances
  • MySQL Compatibility: Supports MySQL 8.0 protocol and most SQL syntax

Server Startup Flow

main() 
  → registerStores() [TiKV, MockTiKV, UniStore]
  → setupMetrics() & setupTracing()
  → createStoreDDLOwnerMgrAndDomain()
    → kvstore.MustInitStorage() [connects to TiKV/PD]
    → ddl.StartOwnerManager() [DDL coordination]
    → session.BootstrapSession() [initializes domain & schema]
  → createServer() [MySQL protocol server]
  → svr.Run() [starts network listeners]

Domain and Session Management

The Domain manages cluster-wide state including:

  • Information schema (table metadata)
  • Statistics and query optimization
  • DDL operations and schema synchronization
  • Binding cache for SQL optimization
  • Global configuration syncing with PD

Sessions represent client connections and execute SQL statements within transaction contexts.

Storage Layer Integration

TiDB abstracts storage through the kv.Storage interface, supporting:

  • TiKV Driver: Production distributed storage via PD coordination
  • MockTiKV: In-memory testing backend
  • UniStore: Embedded single-node storage for standalone deployments

The store handles transactions, key-value operations, and coprocessor pushdown for distributed computation.

Configuration

Configuration is centralized in pkg/config/config.go and loaded from TOML files. Key settings include:

  • Server port (default 4000) and status port (default 10080)
  • Storage backend and connection parameters
  • Performance tuning (memory limits, transaction sizes)
  • Security (TLS, user privileges)
  • Logging and metrics collection

Execution Pipeline

  1. Parser (pkg/parser/) - Converts SQL text to AST
  2. Planner (pkg/planner/) - Optimizes logical and physical plans
  3. Executor (pkg/executor/) - Executes operators (SELECT, INSERT, DDL, etc.)
  4. Expression (pkg/expression/) - Evaluates expressions and built-in functions
  5. Storage (pkg/store/, pkg/kv/) - Reads/writes data via TiKV

This modular design enables independent optimization of each layer while maintaining clean interfaces.

Architecture & Core Components

Relevant Files
  • pkg/server/server.go
  • pkg/server/conn.go
  • pkg/session/session.go
  • pkg/store/store.go
  • pkg/kv/kv.go
  • pkg/infoschema/infoschema.go
  • pkg/executor/adapter.go
  • pkg/planner/core/optimizer.go

TiDB's architecture follows a layered design that separates concerns across network handling, session management, query planning, execution, and storage. Understanding these core components is essential for contributing to the codebase.

Network & Connection Layer

The pkg/server package handles all MySQL protocol communication. The Server struct manages TCP listeners and maintains a map of active client connections. When a client connects, the server creates a clientConn that handles the MySQL handshake, authentication, and command dispatching. Each connection is processed in its own goroutine via onConn(), which reads packets, parses commands, and routes them to the appropriate handler.

The IDriver interface abstracts the database engine. TiDBDriver implements this interface and creates a new Session for each connection via OpenCtx(). This design allows TiDB to support different storage backends.

Session Management

Sessions (pkg/session/session.go) represent user connections and maintain execution context. A session holds:

  • Transaction state and isolation level
  • Variable bindings (system variables, user variables)
  • Current database context
  • Statement cache for prepared statements

The session is responsible for parsing SQL, building execution plans, and executing statements. It acts as the bridge between the network layer and the query execution engine.

Query Execution Pipeline

When a SQL statement arrives, it flows through these stages:

  1. Parsing (pkg/parser) – SQL text is parsed into an AST
  2. Planning (pkg/planner/core) – The optimizer builds a logical plan, then transforms it into a physical plan using cost-based optimization
  3. Execution (pkg/executor) – The physical plan is compiled into an executor tree that processes data

The ExecStmt adapter bridges the planner and executor, managing transaction semantics and retry logic.

Information Schema

The InfoSchema (pkg/infoschema) provides metadata about databases, tables, columns, and indexes. It is versioned and cached to support concurrent reads. The schema is loaded from storage via the Domain and synchronized across the cluster.

Storage Layer

The pkg/store package provides a unified interface to the underlying KV store. The kv.Storage interface abstracts the storage engine, supporting both TiKV (distributed) and mock stores for testing.

Key KV abstractions:

  • Retriever – Read-only access (Get, Iter, IterReverse)
  • Mutator – Write operations (Set, Delete)
  • Transaction – ACID transactions with snapshot isolation

Domain & Cluster Coordination

The Domain (pkg/domain) is a singleton that manages cluster-wide state:

  • InfoSchema cache and synchronization
  • Statistics for query optimization
  • DDL ownership and coordination
  • Resource groups and scheduling
Loading diagram...

Key Design Patterns

Layered Abstraction: Each layer exposes a clean interface, allowing components to be tested and modified independently.

Context Propagation: Go contexts flow through the entire stack, enabling cancellation, timeouts, and tracing.

Concurrent Safety: Shared state (InfoSchema, Domain) uses RWMutex for safe concurrent access.

Pluggable Storage: The KV interface allows different storage backends without changing upper layers.

SQL Execution Pipeline

Relevant Files
  • pkg/parser/parser.go
  • pkg/planner/optimize.go
  • pkg/executor/compiler.go
  • pkg/executor/builder.go
  • pkg/executor/internal/exec/executor.go
  • pkg/expression/expression.go

A SQL query in TiDB flows through four main stages: parsing, planning, compilation, and execution. Each stage transforms the query into a more concrete representation until it becomes executable operations.

Parsing Stage

The parser converts raw SQL text into an Abstract Syntax Tree (AST). The pkg/parser package provides a MySQL-compatible parser that produces ast.StmtNode objects representing the parsed statement structure. This stage validates SQL syntax and creates the initial representation for downstream processing.

// Example: Parse SQL to AST
p := parser.New()
stmts, warns, err := p.ParseSQL(sql)

Planning Stage

The planner transforms the AST into an optimized execution plan. This involves two sub-stages:

Logical Planning: The PlanBuilder in pkg/planner/core/logical_plan_builder.go converts the AST into a logical plan tree. Logical plans represent the semantic structure of the query (e.g., LogicalSelection, LogicalJoin, LogicalAggregation) without considering physical execution details.

Optimization: The Optimize() function in pkg/planner/optimize.go applies transformation rules to the logical plan, producing a physical plan. This includes cost-based optimization, index selection, join reordering, and predicate pushdown. The optimizer uses statistics to estimate row counts and choose the most efficient execution strategy.

// Optimize converts logical plan to physical plan
finalPlan, cost, err := core.DoOptimize(ctx, sctx, optFlag, logicalPlan)

Compilation Stage

The Compiler in pkg/executor/compiler.go bridges planning and execution. It:

  1. Preprocesses the statement (resolves table names, validates permissions)
  2. Calls the planner to generate a physical plan
  3. Builds an executor tree from the physical plan

The executorBuilder in pkg/executor/builder.go implements a visitor pattern, recursively converting each physical plan node into a corresponding executor. For example, PhysicalTableScan becomes TableReaderExec, PhysicalHashJoin becomes HashJoinExec.

Execution Stage

Executors implement the Volcano iterator model with an Open-Next-Close protocol:

  • Open(): Initializes the executor and its children
  • Next(): Fetches the next batch of rows (not single rows) into a Chunk
  • Close(): Releases resources
// Executor interface
type Executor interface {
    Open(context.Context) error
    Next(ctx context.Context, req *chunk.Chunk) error
    Close() error
    Schema() *expression.Schema
}

Executors form a tree where parent executors pull data from children. The root executor is driven by the session, which repeatedly calls Next() until all rows are consumed. This design enables efficient batch processing and memory management through chunk-based data flow.

Data Flow Diagram

Loading diagram...

Key Components

Expressions: The pkg/expression package evaluates expressions during execution. Expressions are compiled into evaluators that operate on chunks for vectorized performance.

Statistics: The planner uses table statistics from pkg/statistics to estimate cardinality and choose optimal plans. Statistics are collected via ANALYZE statements.

Distributed Execution: For distributed queries, the planner generates CopTask (coprocessor tasks on TiKV) and MppTask (MPP tasks on TiFlash). The executor coordinates these distributed tasks through the pkg/distsql interface.

Query Optimization & Planning

Relevant Files
  • pkg/planner/core/logical_plan_builder.go
  • pkg/planner/core/find_best_task.go
  • pkg/planner/cardinality/selectivity.go
  • pkg/statistics/table.go
  • pkg/planner/core/plan_cost_ver1.go
  • pkg/planner/core/plan_cost_ver2.go

Query optimization in TiDB transforms SQL statements into efficient execution plans through a multi-stage process: logical plan construction, rule-based optimization, cardinality estimation, and physical plan selection.

Logical Plan Construction

The PlanBuilder class in logical_plan_builder.go converts parsed SQL AST nodes into logical plans. It processes SELECT statements by building operators for each clause:

  • Selection – WHERE conditions filter rows early
  • Aggregation – GROUP BY and aggregate functions
  • Sort – ORDER BY operations
  • Join – Table joins with reordering optimization
  • Projection – SELECT column selection

Each operator is built sequentially, with optimization flags set to enable specific rule applications. For example, buildSelection() sets FlagPredicatePushDown to allow pushing filters closer to data sources.

Cardinality Estimation

Accurate row count predictions are critical for cost-based optimization. The Selectivity() function in cardinality/selectivity.go estimates how many rows pass through filters:

// Selectivity = (rows after filter) / (rows before filter)
// Uses histogram statistics and column NDV (distinct values)
selectivity, _, err := cardinality.Selectivity(
    ctx, tableStats.HistColl, conditions, accessPaths)

When statistics are unavailable, the optimizer falls back to pseudo-selectivity estimates. For complex multi-column predicates, it uses exponential backoff estimation to balance accuracy and computation time.

Cost-Based Plan Selection

The findBestTask() function in find_best_task.go drives the core optimization workflow. It enumerates candidate physical plans and selects the lowest-cost option:

  1. Generate physical plan candidates (table scan, index scan, joins)
  2. Calculate cost for each candidate using CPU, I/O, and memory factors
  3. Compare costs and select the best task

Cost models (v1 and v2) account for:

  • CPU cost – Expression evaluation and row processing
  • I/O cost – Network and disk access
  • Memory cost – Sorting and aggregation buffers
  • Concurrency – Parallel execution benefits

Optimization Rules

After logical plan construction, TiDB applies transformation rules:

  • Predicate Push Down – Move filters to table scans
  • Join Reordering – Reorder joins to minimize intermediate results
  • Aggregation Push Down – Push GROUP BY to storage layer
  • Column Pruning – Remove unused columns early
  • Constant Propagation – Simplify expressions with known values
Loading diagram...

Task Types

Physical plans are wrapped in tasks representing execution contexts:

  • Root Task – TiDB layer execution
  • Cop Task – TiKV coprocessor execution
  • MPP Task – Distributed TiFlash execution

The optimizer selects task types based on data distribution, query complexity, and system configuration.

Distributed SQL & Execution

Relevant Files
  • pkg/distsql/distsql.go
  • pkg/distsql/request_builder.go
  • pkg/distsql/select_result.go
  • pkg/executor/distsql.go
  • pkg/store/copr/coprocessor.go
  • pkg/kv/kv.go

Distributed SQL execution in TiDB separates query planning from execution by pushing computation to storage nodes (TiKV/TiFlash). This architecture enables horizontal scaling and reduces data movement across the network.

Architecture Overview

Loading diagram...

Request Building Pipeline

The RequestBuilder in pkg/distsql/request_builder.go constructs distributed requests before sending them to storage. It uses a fluent builder pattern to configure:

  • DAG Request: Serialized query plan with filters, projections, and aggregations
  • Key Ranges: Regions to scan based on table partitioning
  • Execution Parameters: Isolation level, transaction scope, replica read strategy, concurrency settings

The builder validates transaction scope, sets default concurrency levels, and optimizes for simple scans. For example, single-executor DAGs with KeepOrder are assigned concurrency of 2 to balance throughput with client protocol overhead.

Coprocessor Task Execution

The CopClient in pkg/store/copr/coprocessor.go manages task distribution to TiKV regions. Key responsibilities:

  • Task Building: Maps key ranges to regions using the region cache
  • Batching: Groups multiple tasks for efficient RPC transmission
  • Retry Logic: Handles region splits and transient failures with exponential backoff
  • Paging: Supports streaming large result sets in batches to control memory usage

The coprocessor supports both legacy single-task and modern batch modes. Batch mode reduces RPC overhead by sending multiple tasks in one request, with configurable batch sizes.

Result Streaming

The SelectResult interface in pkg/distsql/select_result.go provides an iterator abstraction over coprocessor responses:

  • NextRaw(): Returns raw protobuf bytes for low-level processing
  • Next(): Decodes responses into vectorized chunks for efficient processing
  • IntoIter(): Converts results to row-by-row iteration when needed

Multiple result types handle different scenarios: selectResult for single-region responses, serialSelectResults for sequential merging, and sortedSelectResults for merge-sort operations across regions.

Executor Integration

Executors in pkg/executor/distsql.go (TableReaderExecutor, IndexReaderExecutor, IndexLookUpExecutor) orchestrate distributed execution:

  1. Build key ranges from table/index metadata
  2. Create RequestBuilder with DAG plan and execution parameters
  3. Send request via distsql.Select() to get SelectResult
  4. Iterate results and apply post-processing (filtering, sorting, aggregation)

For index lookups, the executor performs a two-phase operation: first fetch index entries to get row handles, then fetch actual rows from the table. This is coordinated through worker goroutines with result channels.

Performance Optimizations

Paging: Large result sets are fetched in configurable batches (default 256 rows) to reduce memory pressure and enable early termination.

Replica Read: Supports reading from follower replicas or closest replicas to reduce latency and load on leaders.

Cop Cache: Caches coprocessor responses for identical requests within a transaction to avoid redundant computation.

Merge Sort: When multiple regions return ordered data, results are merged using a heap-based algorithm to maintain global order without re-sorting.

DDL & Schema Management

Relevant Files
  • pkg/ddl/ddl.go - Core DDL engine and job lifecycle management
  • pkg/ddl/column.go - Column modification operations
  • pkg/ddl/job_scheduler.go - Job scheduling and execution coordination
  • pkg/ddl/schema_version.go - Schema versioning and synchronization
  • pkg/meta/meta.go - Metadata storage and retrieval
  • pkg/infoschema/builder.go - Schema information building and caching
  • pkg/table/table.go - Table abstraction layer
  • pkg/ddl/backfilling.go - Data reorganization during schema changes

Overview

TiDB's DDL (Data Definition Language) system manages schema changes through a distributed, asynchronous job execution model. The system ensures consistency across a cluster while supporting concurrent DDL operations on different tables. Schema changes progress through multiple states, with each state transition synchronized across all TiDB nodes.

Job Lifecycle

DDL jobs follow a well-defined state machine:

  1. Queueing - Job submitted and waiting to be picked up
  2. Running - Job actively executing on the DDL owner
  3. Done - Job completed locally, awaiting cluster synchronization
  4. Synced - All nodes have synchronized the schema change
  5. Cancelled/Rollingback - Job cancelled or rolling back due to errors

Each job progresses through schema states: StateNoneStateDeleteOnlyStateWriteOnlyStateWriteReorganizationStatePublic. This F1-inspired approach ensures only two adjacent schema states exist cluster-wide at any time.

Schema Versioning

Schema versions are global, monotonically increasing integers stored in metadata. When a DDL job transitions a schema element's state, it:

  1. Acquires a schema version lock (prevents concurrent version increments)
  2. Generates a new schema version via GenSchemaVersion()
  3. Creates a SchemaDiff describing the change
  4. Persists both to the KV store in a single transaction

The SchemaDiff contains the job type, affected table/schema IDs, and state transitions. All TiDB nodes watch for schema version changes via etcd and reload affected metadata.

Job Scheduling

The jobScheduler loads pending jobs from mysql.tidb_ddl_job and dispatches them to worker pools. Key responsibilities:

  • Load and deliver: Fetches runnable jobs, checks dependencies (prevents concurrent modifications to same object)
  • Execute: Runs job steps via workers, persisting state changes
  • Synchronize: Waits for all nodes to reach the new schema version before proceeding
  • Finalize: Moves completed jobs to history table

The scheduler uses etcd for cross-node coordination and supports pausing/cancelling jobs mid-execution.

Backfilling

Large schema changes (adding indexes, modifying columns) require backfilling existing data. The system:

  • Splits table key ranges into tasks distributed to worker goroutines
  • Processes tasks concurrently within transactions
  • Supports both traditional transaction-based and fast ingest-based approaches
  • Tracks progress in mysql.tidb_ddl_reorg for crash recovery

Backfilling happens during the StateWriteReorganization phase, allowing concurrent reads/writes on the table.

Metadata Management

Metadata is stored hierarchically in TiKV:

NextGlobalID → int64
SchemaVersion → int64
DBs → {DB:1 → metadata, DB:2 → metadata}
DB:1 → {Table:1 → metadata, Table:2 → metadata}

The meta.Mutator interface provides transactional access to metadata. Schema diffs are keyed by version for efficient incremental schema loading.

InfoSchema Caching

The infoschema.Builder constructs in-memory schema snapshots by applying SchemaDiff entries. This enables:

  • Fast schema lookups without KV access
  • Incremental updates for most DDL operations
  • Full reloads when diffs span too large a version gap

Each session maintains its own schema version, allowing concurrent transactions to see consistent snapshots.

Data Migration & Backup Tools

Relevant Files
  • br/pkg/backup/backup.go - Backup client and orchestration
  • br/pkg/restore/restorer.go - Restore logic and file handling
  • lightning/pkg/importer/import.go - Lightning import controller
  • dumpling/export/dump.go - Data export and dumping
  • pkg/lightning/backend/backend.go - Backend abstraction layer
  • pkg/lightning/backend/local/local.go - Local backend implementation
  • pkg/lightning/backend/tidb/tidb.go - TiDB backend implementation

TiDB provides a comprehensive suite of tools for data migration, backup, and restoration. These tools work together to enable efficient data movement and disaster recovery across different deployment scenarios.

Core Components

BR (Backup & Restore) handles cluster-level backup and restore operations. It supports multiple backup types: full backups, database backups, table backups, raw KV backups, and transaction log backups. BR can backup to various storage backends (S3, GCS, local filesystem) and includes features like encryption, compression, and rate limiting.

Dumpling exports data from TiDB/MySQL to SQL or CSV files. It handles concurrent table dumping with configurable chunk sizes, supports consistency modes (snapshot, lock, none), and can export specific databases or tables using filters.

Lightning imports data from files (SQL, CSV, Parquet) into TiDB at high speed. It uses two backend strategies: the local backend writes sorted KV pairs directly to TiKV, while the TiDB backend uses SQL INSERT statements for compatibility.

Backend Strategies

Local Backend:
  - Writes KV pairs to local disk, sorts them
  - Directly ingests into TiKV regions
  - Fastest performance, requires disk space
  - Supports duplicate detection and conflict resolution

TiDB Backend:
  - Uses SQL INSERT statements
  - Works through TiDB server
  - Slower but more compatible
  - Useful for online imports without downtime

Data Flow Architecture

Loading diagram...

Key Features

Checkpoint & Resume: All tools support checkpoints, allowing interrupted operations to resume from the last successful point without reprocessing data.

Concurrency Control: Configurable worker pools and rate limiting prevent overwhelming the target cluster. Lightning supports table-level and index-level concurrency tuning.

Conflict Resolution: Lightning handles duplicate keys via strategies like replace, ignore, or error modes. The local backend can record conflicts for later analysis.

Consistency Modes: Dumpling offers snapshot consistency (using TiDB snapshots), lock-based consistency, or eventual consistency depending on requirements.

Typical Workflows

For full cluster migration: Export with Dumpling → Import with Lightning → Verify with checksums.

For disaster recovery: Regular BR backups → Store in S3/GCS → Restore to new cluster on failure.

For incremental updates: Use transaction log backups (PITR) to restore to any point in time.

Transaction Management & Isolation

Relevant Files
  • pkg/sessiontxn/txn_manager.go
  • pkg/sessiontxn/isolation/isolation.go
  • pkg/sessiontxn/isolation/base.go
  • pkg/kv/txn.go
  • pkg/session/txn.go
  • pkg/sessiontxn/staleread/staleread.go

TiDB's transaction management system provides a flexible, multi-isolation-level architecture that supports both optimistic and pessimistic concurrency control. The system is built around the concept of transaction context providers that abstract away isolation-level-specific behavior.

Core Architecture

The transaction lifecycle is managed by the TxnManager interface, which coordinates transaction state across a session. Key responsibilities include:

  • Transaction initialization via EnterNewTxn() with configurable modes (optimistic/pessimistic)
  • Statement-level hooks for error handling, retry logic, and state transitions
  • Timestamp management for read and write operations at different isolation levels
  • Lazy transaction activation to defer expensive operations until needed
// Transaction states in LazyTxn
// Invalid: No transaction or future
// Pending: Future allocated, waiting for StartTS
// Valid: Actual transaction object ready

Isolation Levels

TiDB implements three primary isolation levels through specialized context providers:

Optimistic Transactions (OptimisticTxnContextProvider)

  • All reads use the transaction's start timestamp
  • Conflicts detected at commit time
  • Automatic retry enabled for auto-commit statements
  • Optimized for read-heavy workloads with low contention

Pessimistic Repeatable Read (PessimisticRRTxnContextProvider)

  • Acquires locks immediately on read/write operations
  • Uses consistent ForUpdateTS for all DML statements
  • Optimizes point-get operations by reusing cached timestamps
  • Prevents phantom reads through early lock acquisition

Pessimistic Read Committed (PessimisticRCTxnContextProvider)

  • Each statement sees the latest committed data
  • Fetches new timestamps per statement for non-transactional reads
  • Supports RCCheckTS flag for consistent point-write operations
  • Balances consistency with concurrency

Timestamp Management

Timestamps are critical for isolation guarantees:

  • StartTS: Transaction's snapshot version, allocated once at transaction start
  • ForUpdateTS: Used for DML operations in pessimistic mode, may be refreshed per statement
  • StmtReadTS: Read timestamp for SELECT statements (differs by isolation level)
  • SnapshotTS: Explicit snapshot for point-in-time reads

The system uses lazy timestamp allocation—timestamps are fetched from PD only when actually needed, reducing latency for read-only operations.

Transaction Context Providers

All providers inherit from baseTxnContextProvider and implement the TxnContextProvider interface:

type TxnContextProvider interface {
    GetTxnInfoSchema() infoschema.InfoSchema
    GetStmtReadTS() (uint64, error)
    GetStmtForUpdateTS() (uint64, error)
    ActivateTxn() (kv.Transaction, error)
    OnStmtStart(ctx context.Context, node ast.StmtNode) error
    OnStmtErrorForNextAction(ctx context.Context, point StmtErrorHandlePoint, err error) (StmtErrorAction, error)
}

Providers handle:

  • Initialization: Setting up transaction context with isolation-specific defaults
  • Statement lifecycle: Hooks before/after statement execution
  • Error recovery: Determining retry eligibility based on error type and isolation level
  • Plan optimization: Adjusting timestamps based on query plans (e.g., point-get optimization)

Stale Read Support

TiDB supports reading historical snapshots via START TRANSACTION READ ONLY AS OF TIMESTAMP. The stale read provider wraps other providers to enforce read-only semantics and use explicit historical timestamps instead of current ones.

Fair Locking

Pessimistic transactions support fair locking to prevent starvation in high-contention scenarios. When enabled, the system coordinates lock acquisition across transactions to ensure forward progress.

Metadata, Statistics & Information Schema

Relevant Files
  • pkg/infoschema/infoschema.go
  • pkg/infoschema/cache.go
  • pkg/statistics/handle/handle.go
  • pkg/statistics/handle/cache/statscache.go
  • pkg/meta/meta.go
  • pkg/domain/domain.go
  • pkg/domain/infosync/info.go

TiDB maintains three interconnected systems for managing schema metadata and table statistics: the Information Schema, the Statistics System, and the Metadata Layer. These components work together to provide query optimization and schema consistency across the cluster.

Information Schema & InfoCache

The InfoSchema is the in-memory representation of all database schemas, tables, columns, indexes, and constraints. It is immutable and versioned—each DDL operation increments the schema version.

The InfoCache (pkg/infoschema/cache.go) manages multiple InfoSchema versions in a bounded cache (typically 16 versions). It supports three lookup patterns:

  • By Version: GetByVersion(version) retrieves a specific schema version
  • By Timestamp: GetBySnapshotTS(ts) finds the schema valid at a given timestamp
  • Latest: GetLatest() returns the newest schema

The cache is automatically garbage-collected when it exceeds capacity, keeping only the most recent versions. Schema versions are tracked with commit timestamps to support time-travel queries.

Statistics System

The Statistics Handle (pkg/statistics/handle/handle.go) manages table statistics used by the query optimizer. It consists of several components:

  • StatsCache: In-memory cache of table statistics (row counts, column histograms, index statistics)
  • StatsReadWriter: Persists statistics to mysql.stats_* tables
  • StatsAnalyze: Manages auto-analyze jobs and ANALYZE statements
  • StatsSyncLoad: Asynchronously loads detailed statistics (histograms, TopN values) on-demand

Statistics are loaded in two phases: metadata-only initialization (InitStatsLite) loads basic counts, while full initialization (InitStats) loads histograms and indexes.

Metadata Storage Layer

The Meta system (pkg/meta/meta.go) is the persistent storage abstraction for schema metadata. It stores:

  • Global IDs and schema versions
  • Database and table definitions (serialized as protobuf)
  • Sequence and auto-increment information
  • DDL history and schema diffs

Metadata is stored in TiKV using a hierarchical key structure: m/DB:id/Table:id/...

Domain Integration

The Domain (pkg/domain/domain.go) orchestrates all three systems:

type Domain struct {
    infoCache    *infoschema.InfoCache
    statsHandle  *handle.Handle
    info         *infosync.InfoSyncer
    isSyncer     *issyncer.Syncer
    // ...
}

The Domain manages:

  • InfoSyncer: Synchronizes server info and topology with PD etcd
  • Syncer: Continuously reloads schema from storage and broadcasts DDL changes
  • Statistics Handle: Periodically updates statistics and triggers auto-analyze

Schema Synchronization

When a DDL operation completes, the schema version is incremented and persisted. The Syncer (pkg/infoschema/issyncer/syncer.go) runs a background loop that:

  1. Detects new schema versions from etcd
  2. Loads schema diffs or performs full reload
  3. Updates the InfoCache
  4. Notifies all sessions of schema changes

This ensures all TiDB instances eventually converge to the same schema version.

Loading diagram...

Key Design Patterns

  • Immutable Snapshots: Each InfoSchema version is immutable, enabling safe concurrent reads
  • Lazy Loading: Statistics are loaded on-demand to reduce startup time
  • Version Tracking: Schema and statistics versions enable point-in-time queries
  • Distributed Consistency: etcd coordination ensures all nodes see consistent schema versions

Testing Infrastructure & Utilities

Relevant Files
  • pkg/testkit/testkit.go - Core test utilities and TestKit struct
  • pkg/testkit/dbtestkit.go - Database connection test utilities
  • pkg/testkit/result.go - Result assertion and validation
  • pkg/testkit/mockstore.go - Mock storage backend for testing
  • pkg/util/chunk/chunk.go - Columnar data structure for tests
  • pkg/util/codec/codec.go - Encoding/decoding utilities
  • tests/integrationtest/run-tests.sh - Integration test runner
  • pkg/testkit/testutil/loghook.go - Log capture for testing

TiDB provides a comprehensive testing infrastructure designed to support unit tests, integration tests, and end-to-end validation. The framework enables developers to write tests at multiple levels of abstraction, from low-level component tests to full SQL execution scenarios.

Core Testing Components

TestKit is the primary testing utility for SQL execution tests. It wraps a KV storage backend and session, providing convenient methods for executing SQL statements and validating results:

tk := testkit.NewTestKit(t, store)
tk.MustExec("CREATE TABLE t (id INT PRIMARY KEY)")
result := tk.MustQuery("SELECT * FROM t")
result.Check([][]any{})

DBTestKit provides database connection-level testing using standard Go database/sql interfaces, useful for testing MySQL protocol compatibility and connection handling:

dbTk := testkit.NewDBTestKit(t, db)
stmt := dbTk.MustPrepare("SELECT * FROM t WHERE id = ?")
rows := dbTk.MustQueryPrepared(stmt, 1)

Result Validation

The Result type provides flexible assertion methods for query results. The Check() method validates exact row matches, while CheckWithFunc() allows custom comparison logic:

result.Check([][]any{{"1", "2"}, {"3", "4"}})
result.CheckWithFunc(expected, func(row []string, exp []any) bool {
    return len(row) == len(exp)
})

Mock Storage Backend

MockStore enables testing without a real TiKV cluster. It simulates the KV layer, allowing rapid unit test execution. The RunTestUnderCascades() helper runs tests under different planner modes:

testkit.RunTestUnderCascades(t, func(t *testing.T, tk *TestKit, cascades, caller string) {
    tk.MustExec("SELECT * FROM t")
}, opts...)

Integration Testing

Integration tests use .test files in tests/integrationtest/t/ with corresponding .result files for expected outputs. The run-tests.sh script orchestrates test execution:

./run-tests.sh -t select              # Run specific test
./run-tests.sh -r select              # Record new results
./run-tests-next-gen.sh -t select     # Run with real TiKV

Utilities

LogHook captures logs during testing for validation. Chunk provides columnar data structures matching Apache Arrow format for efficient batch processing. Codec utilities handle encoding/decoding of SQL values to KV format.

Best Practices

  • Use --tags=intest when running tests to enable assertions
  • Reuse existing test data and table structures
  • Keep unit test files under 50 tests per package
  • Use integration tests for end-to-end SQL validation
  • Capture logs with LogHook when testing error handling