Overview
Relevant Files
README.mdsrc/DEVELOPERSsrc/backend/main/main.csrc/backend/postmaster/postmaster.csrc/backend/tcop/postgres.c
PostgreSQL is an advanced object-relational database management system supporting transactions, foreign keys, subqueries, triggers, user-defined types, and functions. This repository contains the complete source code distribution, organized into a modular backend architecture with multiple specialized subsystems.
Architecture Overview
The PostgreSQL backend is organized into distinct functional layers:
Loading diagram...
Core Subsystems
Postmaster & Process Management (src/backend/postmaster/)
- Central coordinator that manages all server processes
- Handles client connections, forking backend processes
- Manages auxiliary processes: autovacuum, background workers, WAL writer, checkpointer
Query Processing Pipeline (src/backend/parser/, src/backend/optimizer/, src/backend/executor/)
- Parser: Converts SQL text into parse trees using Yacc/Lex
- Analyzer: Performs semantic analysis and builds query trees
- Planner: Generates optimal execution plans using cost-based optimization
- Executor: Executes plans via node-based evaluation engine
Storage & Access Methods (src/backend/storage/, src/backend/access/)
- Buffer management for efficient disk I/O
- Multiple index types: B-tree, Hash, GiST, GIN, BRIN, SP-GiST
- Heap table access and sequence management
- Transaction management and MVCC implementation
Catalog & Metadata (src/backend/catalog/)
- System catalogs storing schema definitions
- Namespace and object management
- Dependency tracking and access control
Replication & WAL (src/backend/replication/, src/backend/access/transam/)
- Write-Ahead Logging for crash recovery
- Logical and physical replication support
- Slot management for replication streams
Entry Points
The main entry point (src/backend/main/main.c) dispatches to different modes:
- Postmaster: Normal server operation (
PostmasterMain) - Single-user: Direct database access (
PostgresSingleUserMain) - Bootstrap: Database initialization (
BootstrapModeMain) - Check mode: Configuration validation
Build System
PostgreSQL uses both traditional Makefiles and Meson build system. Key build files:
GNUmakefile.in/Makefile: Traditional build configurationmeson.build: Modern build system files throughout the treeconfigure.ac: Autoconf configuration
Development Resources
For detailed development information, see the PostgreSQL wiki at http://wiki.postgresql.org/wiki/Development_information. Developer tools are located in src/tools/.
Architecture & Query Processing Pipeline
Relevant Files
src/backend/parser/READMEsrc/backend/optimizer/READMEsrc/backend/executor/READMEsrc/backend/tcop/postgres.c
PostgreSQL processes queries through a well-defined pipeline: Parse → Analyze → Optimize → Execute. Each stage transforms the query representation, ultimately producing executable plan trees that the executor runs.
Query Processing Pipeline
Loading diagram...
Parser Stage
The parser (src/backend/parser/) tokenizes SQL and builds a raw parse tree. Key components:
- scan.l - Lexer that breaks queries into tokens
- gram.y - Grammar rules that produce parse trees
- analyze.c - Transforms raw parse trees into Query structures with semantic information
- parse_*.c - Specialized handlers for clauses (WHERE, GROUP BY), expressions, functions, operators, and type coercion
The parser also handles complex features like Common Table Expressions (CTEs), aggregates, and MERGE statements. Output is a Query node ready for optimization.
Optimizer Stage
The optimizer (src/backend/optimizer/) generates efficient execution plans by exploring alternative join orders and access methods. It uses a dynamic programming algorithm to build join trees:
- Path Generation - For each base relation, create Paths representing different scan methods (sequential, index, bitmap)
- Join Enumeration - Recursively combine relations using nested-loop, merge, or hash joins
- Cost Estimation - Evaluate each Path's cost and select the cheapest
- Plan Conversion - Transform the optimal Path tree into a Plan tree
Key concepts:
- RelOptInfo - Represents a relation or join; contains all Paths for that relation
- Path - One way to generate a RelOptInfo (e.g., IndexPath, HashPath, NestPath)
- EquivalenceClass - Groups expressions known to be equal (e.g.,
a.x = b.y = 42) - PathKey - Represents sort ordering to enable merge joins without explicit sorts
Executor Stage
The executor (src/backend/executor/) runs the plan tree as a demand-pull pipeline. Each plan node produces tuples when called:
- Plan Tree - Read-only tree of Plan nodes from the optimizer
- State Tree - Parallel tree of PlanState nodes holding runtime state
- Expression Evaluation - ExprState nodes flatten expression trees into linear steps for efficient interpretation
Key execution features:
- Tuple-at-a-time - Nodes pull tuples from children on demand
- Memory Management - Per-query and per-tuple contexts for efficient cleanup
- Rescan Support - Nodes can reset and regenerate output (used in nested loops)
- ModifyTable - Handles INSERT, UPDATE, DELETE, and MERGE operations
Data Flow Example
For SELECT * FROM t1 JOIN t2 ON t1.id = t2.id WHERE t1.x > 5:
- Parser creates a Query with join and WHERE clause information
- Optimizer builds Paths for t1 (seq scan, index scans), t2 (seq scan, index scans), then join Paths (nested loop, hash, merge)
- Optimizer selects cheapest Path (e.g., index scan on t1 + hash join with t2)
- Executor runs the plan: scan t1 with filter, build hash table from t2, probe and return matching rows
Parser & Query Analysis
Relevant Files
src/backend/parser/gram.ysrc/backend/parser/scan.lsrc/backend/parser/analyze.csrc/backend/parser/parse_expr.csrc/backend/parser/parser.c
The parser transforms raw SQL text into structured query trees through three distinct phases: lexical analysis, grammatical parsing, and semantic analysis.
Lexical Analysis (scan.l)
The lexer tokenizes input using Flex. It recognizes keywords, identifiers, operators, and literals while maintaining position tracking for error reporting. Keywords are looked up against a static keyword table, and identifiers are normalized to lowercase. The scanner handles complex constructs like dollar-quoted strings and Unicode escapes without backtracking, optimizing performance.
Grammatical Parsing (gram.y)
Bison processes tokens according to grammar rules defined in gram.y, producing a raw parse tree. The grammar supports multiple parse modes (SQL statements, type names, PL/pgSQL expressions) via the parse_toplevel production. The parser uses location tracking to record token positions for error messages. Importantly, the grammar performs minimal semantic analysis—it focuses on structural validation only.
Semantic Analysis (analyze.c & parse_expr.c)
The raw parse tree undergoes transformation into a Query object through parse_analyze_fixedparams() and related functions. This phase:
- Validates references: Resolves table and column names against the catalog
- Type checking: Determines expression types and applies implicit coercions
- Expression transformation: Converts raw expression nodes (A_Expr, FuncCall, etc.) into typed expression trees
Expression Transformation Pipeline
transformExpr() in parse_expr.c recursively processes expressions via transformExprRecurse(). This dispatcher handles:
- Operators: Binary, unary, comparison, and special operators (IN, BETWEEN, DISTINCT)
- Functions: Function calls with argument type coercion
- Subqueries: SubLink nodes that embed SELECT statements
- Aggregates: Aggregate functions with GROUP BY validation
- Complex constructs: CASE expressions, array constructors, row expressions, JSON operations
Each expression type has a dedicated transformation function that validates context (e.g., aggregates only in SELECT lists) and applies type coercion rules.
Data Flow
Loading diagram...
Key Structures
- ParseState: Maintains context during analysis (namespace, parameter types, expression kind)
- RawStmt: Wraps raw parse tree nodes with location information
- Query: Final analyzed query object with resolved types and validated semantics
Query Optimizer & Planning
Relevant Files
src/backend/optimizer/plan/planner.c- Main planner entry pointsrc/backend/optimizer/plan/planmain.c- Core join planning logicsrc/backend/optimizer/util/pathnode.c- Path node creation and managementsrc/backend/optimizer/util/relnode.c- Relation node constructionsrc/backend/optimizer/geqo/geqo_main.c- Genetic query optimizersrc/backend/optimizer/path/allpaths.c- Access path generation
The query optimizer transforms parsed SQL into an efficient execution plan by exploring alternative join orders and access methods. It uses a cost-based approach to select the cheapest plan.
Core Architecture
The optimizer operates in two main phases:
- Path Generation - Explores all feasible ways to execute the query (different join orders, scan methods, join algorithms)
- Plan Selection - Chooses the cheapest path and converts it to an executable plan
Key data structures:
- RelOptInfo - Represents a relation (base table or join result) with its possible access paths
- Path - Represents one way to scan or join a relation, including cost estimates
- PlannerInfo - Global state for a single query optimization
Join Tree Construction
The optimizer uses dynamic programming to build join trees:
Level 1: {A}, {B}, {C} (base relations)
Level 2: {A,B}, {B,C}, {A,C} (2-way joins)
Level 3: {A,B,C} (final join)
At each level, the planner considers all legal join combinations and generates paths for each using different join methods (nested loop, hash join, merge join).
Path Selection Strategy
For each relation, the optimizer maintains a pathlist containing the cheapest paths for different sort orderings and parameterizations. When adding a new path, it compares against existing paths using:
- Total cost - Cost to retrieve all rows
- Startup cost - Cost to retrieve first row
- Sort order - Useful orderings for upper query levels
Paths that are dominated on all criteria are discarded.
GEQO: Genetic Query Optimizer
For queries with many joins (>= geqo_threshold, default 12), the standard exhaustive search becomes prohibitively expensive. GEQO uses a genetic algorithm to explore the join space probabilistically:
- Represents join orders as chromosomes
- Evolves a population across generations using selection, crossover, and mutation
- Evaluates fitness by computing path costs for candidate join orders
- Returns the best solution found within the generation limit
GEQO trades optimality for speed on complex queries, making it practical for large join problems.
Cost Estimation
The planner estimates costs for all operations (scans, joins, sorts) using:
- Table statistics (row counts, page counts, column distributions)
- Index information
- Selectivity estimates from WHERE clauses
- Cardinality estimates for intermediate results
These estimates guide the choice between sequential scans, index scans, and different join algorithms.
Loading diagram...
Executor & Runtime Execution
Relevant Files
src/backend/executor/execMain.csrc/backend/executor/execExpr.csrc/backend/executor/nodeSeqscan.csrc/backend/executor/nodeHashjoin.csrc/backend/executor/README
The executor is PostgreSQL's runtime engine that transforms a plan tree into actual query results. It operates as a demand-pull pipeline where each plan node produces tuples on demand, creating a tree of operations that flow data upward from leaf nodes (scans) to the root.
Execution Architecture
The executor maintains two parallel tree structures:
- Plan Tree - Read-only tree of
Plannodes from the optimizer, representing the query strategy - State Tree - Parallel tree of
PlanStatenodes holding runtime state, allowing plan reuse across queries
Each PlanState node has an ExecProcNode function pointer that returns the next tuple when called. This design separates immutable query plans from mutable execution state.
Query Execution Lifecycle
Loading diagram...
ExecutorStart initializes the executor state and recursively builds the state tree. ExecutorRun drives tuple production through the plan tree. ExecutorFinish handles post-processing (e.g., AFTER triggers). ExecutorEnd releases resources.
Expression Evaluation
Expressions are compiled into flat arrays of ExprEvalStep instructions during initialization via ExecInitExpr(). This linear representation avoids tree-walk overhead during evaluation:
- Each step has an opcode (e.g.,
EEOP_VAR,EEOP_FUNCEXPR) and stores results inresvalue/resnull - Steps chain together: a
+operator step receives inputs from priorVARsteps - The array ends with
EEOP_DONE_RETURNorEEOP_DONE_NO_RETURN ExecReadyExpr()selects the evaluation method (interpreted viaExecInterpExpror JIT-compiled)
Memory Management
Execution uses hierarchical memory contexts:
- Per-query context - Created by
CreateExecutorState(), holds plan/state trees - Per-tuple contexts - Reset after each tuple to prevent memory leaks
ExprContextnodes manage per-tuple memory for expression evaluation
Plan Node Types
Common node types include:
- Scan nodes (
SeqScan,IndexScan) - Fetch tuples from tables - Join nodes (
HashJoin,NestLoop,MergeJoin) - Combine tuples from child nodes - Aggregate nodes (
Agg,GroupAggregate) - Compute aggregates - Sort/Limit nodes - Order or restrict output
- ModifyTable - Handles INSERT/UPDATE/DELETE/MERGE operations
Hash Join Example
Hash joins use a hybrid algorithm with multi-batch support. The build phase hashes inner relation tuples into buckets; the probe phase streams outer tuples and looks up matches. If the hash table exceeds work_mem, batches are spilled to disk and processed iteratively. Parallel hash joins coordinate multiple backends via barrier synchronization.
Rescan and Parameter Handling
Nodes support ExecReScan() to reset and regenerate output. When query parameters change, affected nodes rescan. Intelligent schemes avoid unnecessary rescans (e.g., Sort reuses cached sorted data if input parameters unchanged).
Asynchronous Execution
For I/O-bound operations (e.g., ForeignScan), nodes can indicate they cannot produce tuples immediately. Append nodes coordinate async-capable children via ExecAsyncRequest, ExecAsyncConfigureWait, and ExecAsyncNotify callbacks, enabling concurrent execution of multiple branches.
Storage Management & I/O
Relevant Files
src/backend/storage/buffer/bufmgr.csrc/backend/storage/smgr/smgr.csrc/backend/storage/page/bufpage.csrc/backend/storage/lmgr/lock.csrc/backend/storage/buffer/freelist.csrc/backend/storage/buffer/buf_init.c
PostgreSQL's storage layer manages physical data access through a multi-layered architecture combining buffer management, storage abstraction, page formatting, and concurrency control. This section covers how data flows from disk to memory and back.
Buffer Manager Architecture
The buffer manager (bufmgr.c) maintains a shared pool of in-memory buffers that cache disk pages. Key concepts:
- Pinning: Processes must "pin" a buffer (increment its reference count) before accessing it. Unpinned buffers can be evicted at any time.
- Buffer Tags: Each buffer is identified by a
BufferTagcontaining the relation locator, fork number, and block number. - Shared vs. Local Buffers: Shared buffers are accessible to all backends; local buffers are per-backend and used for temporary relations.
The buffer pool is initialized in shared memory with NBuffers slots, each containing a BufferDesc structure (padded to 64 bytes to avoid false sharing) and a corresponding data page.
Clock Sweep Replacement Algorithm
When a new page is needed and no free buffers exist, the buffer manager uses a clock sweep algorithm to find a victim:
- A "clock hand" (
nextVictimBuffer) circularly advances through all buffers - Each buffer has a
usage_countthat increments when pinned - The algorithm skips buffers with non-zero usage counts, decrementing their counts
- The first unpinned buffer with zero usage count becomes the victim
This provides efficient O(1) replacement without maintaining explicit LRU lists.
Storage Manager Interface
The storage manager (smgr.c) abstracts physical file operations through a function pointer table (f_smgr). Currently, only magnetic disk storage is implemented (md.c), but the interface supports alternative storage backends.
Key operations:
- Read/Write:
smgr_readv(),smgr_writev()for vectored I/O - Extend:
smgr_extend()adds new blocks to relations - Prefetch:
smgr_prefetch()initiates asynchronous reads - Sync:
smgr_immedsync()forces immediate disk synchronization
SMgrRelation objects are cached in a hash table and pinned by the relation cache to prevent premature closure of file descriptors.
Page Structure and Management
Pages are fixed-size blocks (typically 8KB, BLCKSZ) with a standardized header (PageHeader). The bufpage.c module provides:
- PageInit(): Initializes a new page with header and free space tracking
- PageAddItem(): Inserts items (tuples) into pages, managing line pointers
- Checksums: Optional page checksums computed at write time for corruption detection
Pages track free space using pd_lower (end of header/line pointers) and pd_upper (start of item data), allowing efficient space management.
Concurrency Control
Buffer access is protected by multiple lock levels:
Loading diagram...
- BufMappingLock: Partitioned locks protect the buffer tag hash table
- Buffer Header Spinlock: Protects buffer state modifications
- Content Lock: Shared/exclusive locks for data access
- Pin Count: Prevents buffer eviction while in use
Lock Manager (lock.c) handles relation-level locks and transaction locks, preventing conflicts at a higher level than buffer locks.
I/O and Writeback
Dirty buffers are written back through:
- Checkpoint: Periodic flush of all dirty buffers
- Background Writer: Continuous flushing to reduce checkpoint stalls
- Writeback Context: Coalesces multiple flush requests to reduce system calls
The WritebackContext structure batches pending flushes, improving I/O efficiency by combining multiple page writes into single system calls.
Access Methods & Indexes
Relevant Files
src/backend/access/heap/heapam.csrc/backend/access/nbtree/nbtree.csrc/backend/access/hash/hash.csrc/backend/access/gin/gininsert.csrc/include/access/amapi.hsrc/backend/access/index/indexam.c
PostgreSQL supports multiple index types through a pluggable access method (AM) interface. Each index type implements a standardized API defined by IndexAmRoutine, allowing the query executor to work uniformly with different index structures.
Index Access Method Interface
The IndexAmRoutine struct defines the contract between the core system and index implementations. Every index AM must provide:
- Build operations:
ambuild(create index),ambuildempty(create empty index) - Modification:
aminsert(add entries),ambulkdelete(remove entries),amvacuumcleanup(post-vacuum cleanup) - Scanning:
ambeginscan,amrescan,amgettuple,amgetbitmap,amendscan - Metadata:
amcostestimate(query planner hints),amoptions(parse index options),amvalidate(verify opclass definitions)
Each AM also declares capabilities via boolean flags: whether it supports unique indexes, multi-column indexes, backward scans, parallel scans, and index-only scans.
Built-in Index Types
B-tree (nbtree) is the default general-purpose index. It implements Lehman & Yao's high-concurrency algorithm with right-sibling pointers and high keys, enabling lock-free searches. B-trees support range queries, equality, ordering, and are clusterable. They handle unique constraints and multi-column indexes.
Hash indexes use dynamic hashing with bucket splitting. Tuples hash to buckets, which chain overflow pages as needed. Hash indexes are compact (storing only hash codes, not values) and support equality queries only. They cannot be used for unique constraints or multi-column indexes.
GIN (Generalized Inverted Index) is optimized for composite values like arrays and full-text search. It stores (key, posting list) pairs where posting lists are heap row references. GIN uses a B-tree over keys with posting trees for large posting lists. Fast-update pending lists accelerate bulk inserts. GIN excels at "contains" and "overlaps" queries on arrays.
GIST (not shown in core files) is a generalized search tree supporting custom operators and spatial data types.
Index Scan Flow
Loading diagram...
Scans begin with index_beginscan, which calls the AM's ambeginscan to initialize an IndexScanDesc. The executor repeatedly calls index_getnext_tid to retrieve heap tuple IDs. For each ID, index_fetch_heap fetches the actual tuple from the heap using the table AM interface. This separation allows the index AM to release locks before heap access.
Heap Access Method
The heap is the default table storage format. heapam.c implements sequential scans, tuple insertion/deletion/update, and MVCC visibility checks. The heap AM integrates with indexes through index_fetch_begin/end callbacks, allowing indexes to prepare for tuple fetching. HOT (Heap-Only Tuple) optimization reduces index entries when updated columns aren't indexed.
Index Maintenance
During INSERT, UPDATE, or DELETE, the executor calls index_insert for each modified index. The index AM's aminsert callback adds or removes entries. Vacuum calls ambulkdelete with a callback to identify dead tuples, then amvacuumcleanup for final cleanup. GIN's pending list is merged into the main B-tree during vacuum to maintain search performance.
Catalog & Metadata Management
Relevant Files
src/backend/catalog/catalog.csrc/backend/catalog/heap.csrc/backend/catalog/index.csrc/backend/catalog/namespace.c
PostgreSQL stores all schema metadata in system catalogs—regular tables that describe tables, columns, types, indexes, and other database objects. The catalog system is the foundation for schema management and object lifecycle operations.
Core System Catalogs
The primary catalogs are:
pg_class- Describes all relations (tables, indexes, views, sequences, etc.). Each relation has an OID, name, namespace, owner, and storage metadata.pg_attribute- Describes columns within relations. Each attribute references its parent relation and stores type, collation, and constraint information.pg_type- Defines data types, including built-in types and composite types created from relations.pg_namespace- Organizes objects into schemas. Provides namespace isolation and search path support.pg_index- Metadata for indexes, including key columns, uniqueness constraints, and index expressions.pg_depend- Tracks object dependencies for cascading operations (e.g., dropping a table drops its indexes).
System catalogs are pinned (OID < FirstUnpinnedObjectId) and protected from accidental modification.
Relation Creation Pipeline
Creating a cataloged relation involves multiple coordinated steps:
- Validation -
CheckAttributeNamesTypes()validates column definitions and types. - Storage Creation -
heap_create()allocates physical disk storage and creates a relation descriptor. - Type Registration -
AddNewRelationType()creates a composite type for the relation's row type. - Catalog Registration -
AddNewRelationTuple()inserts a row intopg_class. - Attribute Registration -
AddNewAttributeTuples()inserts column metadata intopg_attribute. - Dependency Recording - Dependencies on types and collations are recorded in
pg_depend.
The main entry point is heap_create_with_catalog(), which orchestrates these steps and returns the new relation's OID.
Namespace Search Path
Object lookup uses a configurable search path defined by the search_path GUC variable. The path is a list of namespace OIDs searched in order:
- Implicit namespaces - Temp namespace (if initialized) and
pg_catalogare always searched. - Explicit namespaces - User-specified schemas from
search_path. - Creation target - The first namespace in the path is the default for new objects.
The search path is cached and recomputed only when invalidated by syscache changes or GUC updates. Functions like recomputeNamespacePath() and fetch_search_path() manage this state. Visibility functions (RelationIsVisible(), TypeIsVisible(), etc.) check if objects are accessible in the current search path.
Index Metadata
Indexes are tracked in pg_index, which stores:
- Index and heap relation OIDs
- Key columns and included columns
- Uniqueness and constraint flags
- Index expressions and predicates (for partial indexes)
The index_create() function creates indexes by constructing a tuple descriptor, registering the index in pg_index, and building the index structure via the access method API.
Catalog Access Patterns
Catalog access is optimized through:
- System cache - Frequently accessed catalogs (e.g.,
pg_class,pg_type) are cached in memory for fast lookups. - Catalog indexes - Unique indexes on
pg_class(by OID and name/namespace) enable efficient searches. - Pinned OIDs - System catalog OIDs are hardcoded to avoid catalog lookups during bootstrap.
All catalog modifications are transactional and trigger invalidation messages to keep caches consistent across backends.
Replication & Write-Ahead Logging
Relevant Files
src/backend/access/transam/xlog.csrc/backend/replication/walsender.csrc/backend/replication/walreceiver.csrc/backend/replication/slot.csrc/backend/replication/syncrep.c
PostgreSQL's replication system ensures data durability and high availability through Write-Ahead Logging (WAL) and streaming replication. All database changes are first written to WAL before being applied to data pages, enabling crash recovery and replication to standby servers.
Write-Ahead Logging (WAL)
WAL is the core durability mechanism. Every transaction writes its changes to the WAL buffer before modifying the actual data pages. The xlog.c module manages WAL buffers, insertion, and flushing to disk.
Key WAL Operations:
- WAL Insertion -
XLogInsertRecord()appends records to the in-memory WAL buffer - WAL Flushing -
XLogFlush()writes buffered records to disk, ensuring durability - WAL Writing -
XLogWrite()handles the actual I/O operations to WAL files - Checkpointing - Periodic snapshots that allow old WAL to be discarded
The WAL buffer is organized as a circular ring, with insertion and write positions tracked separately. This allows concurrent insertions while background processes flush data to disk.
Streaming Replication Architecture
Loading diagram...
Primary Side (WAL Sender):
- Spawned when a standby connects requesting replication
- Reads WAL records from disk and sends them via the COPY protocol
- Tracks replication progress from standbys
- Handles both physical and logical replication
Standby Side (WAL Receiver):
- Connects to primary and requests WAL streaming
- Receives WAL records and writes them to
pg_waldirectory - Updates
WalRcvData->flushedUptoto signal startup process - Sends periodic status updates to primary
Replication Slots
Replication slots prevent premature WAL removal and maintain state about replication streams. Each slot tracks the restart LSN (Log Sequence Number) needed by its consumer.
Slot Types:
- Physical Slots - For streaming replication; prevent WAL removal until replicated
- Logical Slots - For logical decoding; maintain snapshot and transaction state
Slots are stored on disk in $PGDATA/pg_replslot/ with state files containing metadata. The slot state is protected by ReplicationSlotControlLock and per-slot mutexes for thread safety.
Synchronous Replication
Synchronous replication ensures transactions wait for acknowledgment from standbys before committing. The syncrep.c module implements this on the primary.
Modes:
- FIRST - Priority-based; wait for N highest-priority standbys
- ANY - Quorum-based; wait for any N standbys from the list
Backends waiting for replication are queued and released when walsenders confirm WAL has been written, flushed, or applied on standbys. This is controlled by synchronous_standby_names GUC parameter.
WAL Lifecycle
- Generation - Transactions generate WAL records during execution
- Buffering - Records accumulate in the WAL buffer
- Flushing - Background writer or explicit
XLogFlush()writes to disk - Replication - WAL sender streams to standbys
- Archival - Archive command copies WAL files for long-term storage
- Recycling - Old WAL files are recycled when no longer needed
The system maintains careful coordination between these stages to ensure durability while managing disk space efficiently.
Utilities & Command-Line Tools
Relevant Files
src/bin/psql/mainloop.csrc/bin/pg_dump/pg_dump.csrc/bin/initdb/initdb.csrc/bin/pg_ctl/pg_ctl.csrc/bin/pg_basebackup/pg_basebackup.csrc/bin/pgbench/pgbench.csrc/bin/scripts/(createdb, dropdb, vacuumdb, etc.)
PostgreSQL provides a comprehensive suite of command-line utilities in src/bin/ for database administration, client interaction, and server management. These tools are essential for day-to-day operations and cluster maintenance.
Client & Interactive Tools
psql (src/bin/psql/) is the primary interactive terminal for PostgreSQL. Its architecture centers on a re-entrant main loop (MainLoop() in mainloop.c) that processes user input from stdin or files. The loop uses a flex-based lexer (psqlscanslash.l) to parse both SQL queries and backslash commands (\ prefix). Key components include:
- Query Accumulation: Buffers (
query_buf,previous_buf) collect multi-line queries until a semicolon terminates them - Command Processing:
HandleSlashCmds()routes backslash commands (e.g.,\d,\i,\e) to specialized handlers - Conditional Execution: A conditional stack manages
\if/\else/\endifblocks for scripting - Signal Handling: Uses
sigsetjmp()to handle Ctrl-C interrupts gracefully
The lexer maintains state across interactive and file-based input, supporting variable expansion and nested file inclusion via \i (include) commands.
Backup & Restore Tools
pg_dump (src/bin/pg_dump/) serializes database schemas and data into SQL scripts or custom binary formats. It queries system catalogs to enumerate all dumpable objects (tables, indexes, functions, etc.) and emits them in dependency order. Compression support includes gzip, LZ4, and Zstandard formats.
pg_basebackup (src/bin/pg_basebackup/) performs physical backups by streaming WAL and data files from a running server using the replication protocol. It supports incremental backups and manifest verification.
pg_restore reconstructs databases from pg_dump's custom or directory formats, with parallel restore capability.
Server Management Tools
pg_ctl (src/bin/pg_ctl/) controls the PostgreSQL server lifecycle. It dispatches commands (init, start, stop, restart, reload, status, promote) to appropriate handlers. The tool manages PID files, log rotation, and shutdown modes (smart, fast, immediate).
initdb (src/bin/initdb/) initializes a new cluster by creating the data directory structure, bootstrap databases (template0, template1, postgres), and configuration files. It handles locale and encoding setup.
Maintenance & Diagnostic Tools
pgbench (src/bin/pgbench/) is a benchmarking tool that generates synthetic workloads. It supports custom SQL scripts, variable substitution, and concurrent client simulation.
vacuumdb and reindexdb (src/bin/scripts/) wrap SQL VACUUM and REINDEX commands with connection management and parallel execution options.
pg_upgrade performs in-place cluster version upgrades by comparing old and new schemas, transferring data, and validating consistency.
Architecture Pattern
All utilities follow a common pattern: parse command-line options, establish a libpq connection (if needed), execute operations, and report results. They use shared frontend utilities (src/fe_utils/) for connection management, option parsing, and output formatting.
Loading diagram...
repository: "https://github.com/postgres/postgres" generatedAt: "2025-12-17T19:10:28.378291"