Install

hashicorp/consul

Consul - Distributed Service Networking

Last updated on Dec 12, 2025 (Commit: 187756b)

Overview

Relevant Files
  • main.go
  • agent/agent.go
  • agent/consul/server.go
  • agent/consul/client.go
  • README.md

Consul is a distributed, highly available solution for service discovery, health checking, and dynamic configuration across datacenters. It operates as a cluster of agents that can run in either server or client mode, providing multi-datacenter awareness and service mesh capabilities.

Architecture Overview

Loading diagram...

Core Components

Agent (agent/agent.go) is the central long-running process on every machine. It exposes RPC, HTTP, DNS, and gRPC interfaces and can operate in two modes:

  • Server Mode - Runs a full Consul server with Raft consensus, state management, and leadership election
  • Client Mode - Forwards requests to Consul servers via RPC, maintaining local service and check state

Consul Server (agent/consul/server.go) manages cluster state using Raft consensus, maintains the state store, handles service registration, and coordinates with other servers across datacenters.

Consul Client (agent/consul/client.go) maintains a connection pool to servers, routes RPC requests, and manages local agent state without participating in consensus.

Key Features

  • Service Discovery - Services register themselves; clients discover via DNS or HTTP API
  • Health Checking - Monitors service health; prevents routing to unhealthy instances
  • Service Mesh - Enables secure service-to-service communication with automatic TLS and identity-based authorization
  • Multi-Datacenter - Servers federate across datacenters; clients forward to local servers
  • Dynamic Configuration - HTTP API for storing indexed configuration and metadata

Communication Protocols

Consul uses multiple protocols for different purposes:

  • Serf Gossip - LAN and WAN membership management and failure detection
  • Raft - Server consensus for state replication (servers only)
  • RPC - Client-server and server-server communication
  • gRPC - Modern API for proxies and external services
  • DNS - Service discovery queries
  • HTTP - REST API for all operations

Startup Flow

When the agent starts, it initializes configuration, creates either a Server or Client delegate based on mode, sets up local state tracking, and starts listening on HTTP, DNS, and gRPC ports. The agent then begins service synchronization and retry join logic to connect with the cluster.

Architecture & Core Components

Relevant Files
  • agent/consul/server.go
  • agent/consul/client.go
  • agent/consul/rpc.go
  • agent/consul/fsm/fsm.go
  • agent/consul/state/state_store.go
  • agent/pool/pool.go

Consul's architecture is built on a distributed consensus model with clear separation between server and client components. The system uses Raft for strong consistency and Serf for gossip-based cluster membership.

Core Components

Server (agent/consul/server.go) is the primary stateful component that manages the cluster. Each server maintains:

  • A Raft instance for distributed consensus across the datacenter
  • A Finite State Machine (FSM) that applies committed log entries to the state store
  • A State Store (in-memory MemDB) holding all cluster data (nodes, services, ACLs, etc.)
  • Multiple Serf pools for cluster membership (LAN for local DC, WAN for cross-DC)
  • RPC servers for handling both traditional net/rpc and gRPC requests

Client (agent/consul/client.go) is a lightweight agent that runs on every node. Clients:

  • Do not participate in Raft consensus
  • Use a connection pool to communicate with servers
  • Maintain a router to discover and select healthy servers
  • Apply rate limiting to outbound RPC requests
  • Listen to Serf events for cluster membership changes

Consensus & State Management

The FSM (agent/consul/fsm/fsm.go) implements Raft's state machine interface. When Raft commits a log entry:

  1. The FSM receives the log entry via Apply()
  2. It dispatches to a registered command handler based on message type
  3. The handler modifies the State Store (MemDB)
  4. Changes are published to event subscribers for real-time updates

The State Store (agent/consul/state/state_store.go) uses MemDB for fast, queryable in-memory storage with MVCC semantics. It supports:

  • Blocking queries (clients wait for state changes)
  • Snapshots for Raft recovery
  • Transaction-based updates for consistency

RPC & Communication

RPC Layer (agent/consul/rpc.go) handles all request routing:

  • Clients forward requests to servers via the connection pool
  • Servers accept connections and route to appropriate handlers
  • Forwarding logic sends requests to the leader if needed, or to other datacenters
  • Rate limiting prevents overload on both client and server sides

Connection Pool (agent/pool/pool.go) manages persistent connections:

  • Multiplexes multiple RPC streams over single TCP connections using Yamux
  • Caches connections for reuse
  • Supports TLS encryption and certificate verification
  • Handles connection timeouts and cleanup

Data Flow

Loading diagram...

Key Design Patterns

  • Strong Consistency: Write operations go through Raft; reads can be stale or consistent
  • Blocking Queries: Clients can wait for state changes without polling
  • Gossip Membership: Serf maintains cluster topology; Raft manages state
  • Multiplexing: Single TCP connection carries multiple concurrent RPC streams
  • Rate Limiting: Protects servers from client overload and clients from server limits

State Management & Persistence

Relevant Files
  • agent/consul/state/state_store.go
  • agent/consul/state/catalog.go
  • agent/consul/fsm/fsm.go
  • agent/consul/fsm/snapshot.go
  • agent/consul/server.go

Consul's state management system is built on a Raft-based finite state machine (FSM) that ensures strong consistency across the cluster. All state is stored in-memory using MemDB, a fast in-memory database, and is reconstructed from Raft logs through the FSM.

Architecture Overview

Loading diagram...

State Store (MemDB)

The Store struct in state_store.go is the core in-memory database containing all Consul state:

  • MemDB: A thread-safe, in-memory database with MVCC (Multi-Version Concurrency Control) semantics
  • Tables: Organized into logical tables for nodes, services, checks, KV pairs, ACLs, sessions, and more
  • Transactions: Read and write transactions provide isolation and consistency guarantees
  • Change Tracking: All writes are tracked and published as events for subscribers

The state store is entirely reconstructed from the Raft log through the FSM, ensuring it can be rebuilt on any server.

Finite State Machine (FSM)

The FSM in fsm.go applies Raft log entries to the state store:

  • Apply: Processes each Raft log entry by dispatching to registered command handlers
  • Command Registry: Commands are registered at package init time and mapped by message type
  • Atomic Updates: Each log entry updates the state store within a single transaction
  • Event Publishing: Changes trigger events that are published to subscribers

Snapshots & Persistence

Snapshots enable fast recovery and cluster bootstrap:

Persist (snapshot.go):

  • Captures a point-in-time snapshot of the entire state store
  • Encodes all tables (nodes, services, ACLs, KV, etc.) using msgpack
  • Includes a header with the last Raft index for consistency tracking
  • Persisted to disk by Raft for recovery

Restore:

  • Reads snapshot data and reconstructs the state store
  • Replaces the entire state store atomically to prevent inconsistency
  • Restores chunking state and resource storage separately
  • Signals watchers that the state has changed

Consistency Model

  • Strong Consistency: All writes go through Raft consensus before applying to state
  • Read Consistency: Reads from the state store reflect all committed writes
  • Blocking Queries: Clients can watch for changes using index-based blocking
  • Snapshot Consistency: Snapshots capture a consistent view at a specific Raft index

Key Operations

Write Path: RPC request → Raft leader → FSM.Apply() → State Store update → Event published

Read Path: Query → State Store snapshot → MemDB transaction → Results returned

Recovery Path: Snapshot restored → State Store rebuilt → Raft logs replayed from snapshot index

Service Discovery & Catalog

Relevant Files
  • agent/consul/catalog_endpoint.go
  • agent/consul/health_endpoint.go
  • agent/consul/state/catalog.go
  • agent/structs/catalog.go
  • agent/dns.go
  • api/catalog.go

The service catalog is Consul's core registry that tracks all nodes, services, and their health status. It enables service discovery by maintaining a queryable database of what services are available and where they run.

Core Architecture

The catalog system has three main layers:

  1. Endpoints (catalog_endpoint.go, health_endpoint.go) - HTTP/RPC API handlers that accept registration and query requests
  2. State Store (agent/consul/state/catalog.go) - In-memory database using memdb that stores and indexes catalog data
  3. Data Structures (agent/structs/catalog.go, api/catalog.go) - Request/response types and core entities

Registration Flow

Services register through the Catalog.Register RPC endpoint:

// Register a service and/or check(s) in a node
func (c *Catalog) Register(args *structs.RegisterRequest, reply *struct{}) error

The registration process:

  1. Validates ACL permissions and enterprise metadata
  2. Pre-applies validation rules to node, service, and check data
  3. Stores the node, service, and health checks in the state store
  4. Triggers replication to other servers via Raft

A RegisterRequest can include:

  • Node information (name, address, metadata)
  • Service definition (name, port, tags, metadata)
  • Health checks (HTTP, TCP, script-based, TTL)

Query Operations

The Health endpoint provides multiple query patterns:

  • Health.ServiceNodes - Returns all instances of a service with their health status
  • Health.ServiceChecks - Returns checks for a specific service
  • Health.NodeChecks - Returns all checks for a node
  • Health.ChecksInState - Returns checks matching a health state (passing, warning, critical)

Queries support:

  • Tag filtering to find service instances with specific tags
  • Connect-aware queries for service mesh proxies
  • Ingress gateway queries for external traffic routing
  • Blocking queries for real-time updates

DNS Integration

The DNS server (agent/dns.go) queries the catalog to resolve service names:

// Service lookup queries the catalog for service instances
args := structs.ServiceSpecificRequest{
    ServiceName: lookup.Service,
    ServiceTags: serviceTags,
    // ...
}
out, _, err := d.agent.rpcClientHealth.ServiceNodes(context.TODO(), args)

DNS queries like redis.service.consul are resolved by:

  1. Parsing the service name from the DNS query
  2. Calling Health.ServiceNodes to get available instances
  3. Returning A/AAAA records for healthy instances
  4. Supporting SRV records for port information

State Storage

The state store maintains multiple indexes for efficient queries:

  • Nodes table - Indexed by node ID and name
  • Services table - Indexed by node, service name, and tags
  • Checks table - Indexed by node and service
  • Service virtual IPs - Maps services to their assigned VIPs

Queries use memdb watch sets for blocking query support, allowing clients to wait for changes without polling.

Health Status Management

Every node has a built-in serfHealth check that reflects cluster membership status. Services can have multiple checks:

  • HTTP checks - Periodic HTTP requests to a health endpoint
  • TCP checks - TCP connection attempts
  • Script checks - Custom scripts executed by the agent
  • TTL checks - Manual status updates with expiration

The catalog aggregates check statuses to determine if a service instance is passing, warning, or critical.

Service Mesh & Connect

Relevant Files
  • agent/proxycfg/state.go - Proxy configuration state management
  • agent/xds/server.go - XDS gRPC server for Envoy configuration
  • connect/resolver.go - Service discovery and resolution
  • connect/tls.go - TLS certificate verification and mTLS setup
  • agent/consul/leader_connect_ca.go - Certificate Authority management

Overview

Consul's service mesh (Connect) provides secure service-to-service communication using mutual TLS (mTLS) encryption, identity-based authentication, and explicit service authorization. The architecture consists of a control plane (Consul servers and agents) that manages configuration and a data plane (Envoy sidecar proxies) that enforces policies.

Core Architecture

The service mesh operates through three main layers:

  1. Certificate Authority (CA) - Issues and manages SPIFFE X.509 certificates for service identity
  2. Proxy Configuration - Generates and distributes Envoy proxy configurations
  3. XDS Server - Delivers configuration updates to proxies via gRPC
Loading diagram...

Certificate Authority (CA)

The CA subsystem manages the PKI infrastructure for service mesh. Key components:

  • CAManager (agent/consul/leader_connect_ca.go) - Runs on the leader and manages CA state, certificate rotation, and provider lifecycle
  • CA Providers - Pluggable implementations (built-in, Vault, etc.) that handle certificate signing
  • Root Certificates - Distributed to all agents and proxies for trust chain validation
  • Leaf Certificates - Issued per service instance with SPIFFE URIs for identity

The CA maintains state through caState transitions: UNINITIALIZEDINITIALIZINGINITIALIZEDRENEWING/RECONFIGURING.

Proxy Configuration Management

The proxycfg package coordinates data fetching for proxy configuration:

  • Manager - Tracks registered proxies and coordinates state updates
  • State - Maintains configuration for a single proxy, watching multiple data sources (roots, leaf certs, intentions, upstreams, discovery chains)
  • ConfigSnapshot - Immutable snapshot of all configuration needed by a proxy at a point in time

The state machine watches for updates from the catalog, ACL system, and configuration entries, coalescing changes into snapshots that are pushed to consumers.

XDS Server & Envoy Integration

The XDS server (agent/xds/server.go) implements Envoy's Aggregated Discovery Service (ADS) protocol:

  • DeltaAggregatedResources - Primary gRPC endpoint for Envoy proxy connections
  • Resource Types - Listeners, Routes, Clusters, Endpoints, Secrets (certificates)
  • Authorization - Validates that proxy tokens have service:write permission for their service
  • Streaming - Long-lived gRPC streams push configuration updates to proxies in real-time

Service Discovery & Resolution

The connect package provides client-side service discovery:

  • Resolver Interface - Abstracts service discovery mechanisms
  • ConsulResolver - Queries Consul catalog for healthy service instances
  • StaticResolver - For known endpoints without discovery
  • Service - High-level API for establishing mTLS connections with automatic certificate management

mTLS & Certificate Verification

TLS configuration in connect/tls.go enforces security:

  • Minimum TLS 1.2 with strong cipher suites (ECDHE + AES/ChaCha20)
  • Client Certificate Verification - Both sides verify peer certificates
  • SPIFFE URI Validation - Certificates must contain correct service identity URIs
  • Custom Verifiers - Server-side verifier checks authorization; client-side verifier validates chain

The verifyServerCertMatchesURI function ensures the peer certificate identity matches the expected service URI, preventing man-in-the-middle attacks.

Data Flow

  1. Service registers with Consul; proxycfg Manager creates state object
  2. Envoy proxy connects to XDS server via gRPC
  3. XDS server watches proxycfg state for configuration changes
  4. proxycfg state watches CA for certificate updates and catalog for service topology
  5. Configuration snapshots are serialized to Envoy resources (Listeners, Routes, Clusters, Secrets)
  6. Envoy applies configuration and enforces mTLS, authorization, and routing policies

ACL & Security

Relevant Files
  • acl/acl.go
  • acl/authorizer.go
  • acl/policy.go
  • acl/policy_authorizer.go
  • acl/chained_authorizer.go
  • agent/consul/acl.go
  • agent/acl_endpoint.go

Consul's ACL system provides role-based access control (RBAC) for authenticating and authorizing access to HTTP API and RPC operations. The system is built on a foundation of tokens, policies, and roles that work together to enforce fine-grained permissions across the cluster.

Core Components

Tokens are the primary authentication mechanism. Each token has a secret ID used for authentication and an accessor ID for logging. Tokens can be associated with policies, roles, or service/node identities. The system includes special tokens like the anonymous token (used when no token is provided) and agent recovery tokens (for emergency access).

Policies define sets of rules that grant or deny access to resources. Rules are organized by resource type (agent, key, node, service, session, event, query, keyring, operator, mesh, peering) and support both exact-match and prefix-based matching. Each rule specifies an access level: deny, read, list, or write.

Roles group policies and identities together, allowing administrators to manage permissions at a higher level of abstraction. Service identities and node identities are synthetic policies that automatically grant permissions for specific services or nodes.

Authorization Flow

Loading diagram...

The ACLResolver handles token and policy resolution. It maintains caches for identities, policies, and roles to minimize RPC calls. When a token is presented, the resolver:

  1. Checks if ACLs are enabled
  2. Attempts local resolution (agent recovery tokens, server management tokens)
  3. Consults the cache if available
  4. Falls back to remote RPC resolution if needed

Policy Enforcement

The Authorizer interface defines methods for checking permissions on each resource type. The PolicyAuthorizer implements this interface using radix trees for efficient prefix matching. Each resource type maintains its own radix tree for exact and prefix-based rules.

The ChainedAuthorizer combines multiple authorizers in sequence, allowing the first non-default decision to take precedence. This enables layering of authorization logic.

Access Control Decisions

Three enforcement decisions are possible:

  • Allow: A matching rule explicitly grants access
  • Deny: A matching rule explicitly denies access
  • Default: No matching rule found; decision deferred to default policy

The default policy is configurable via acl_default_policy (typically deny for secure deployments). When no rule matches, the system falls back to this default.

Token Resolution Strategies

The ACL down policy determines behavior when the ACL datacenter is unavailable:

  • allow: Permit all requests (unsafe)
  • deny: Deny all requests (conservative)
  • extend-cache: Use cached values indefinitely
  • async-cache: Use cached values while fetching updates asynchronously

This enables graceful degradation during network partitions while maintaining security posture.

Cluster Membership & Gossip

Relevant Files
  • agent/consul/server_serf.go
  • agent/consul/client_serf.go
  • agent/consul/leader_registrator_v1.go
  • agent/consul/merge.go
  • internal/gossip/libserf/serf.go

Consul uses the Serf gossip protocol to manage cluster membership and detect node failures. This distributed protocol enables all agents to maintain a consistent view of the cluster without requiring a central authority.

Gossip Protocol Overview

Serf implements a modified SWIM (Scalable Weakly-consistent Infection-style Process Group Membership) protocol. Each node periodically exchanges membership information with random peers, allowing state changes to propagate exponentially through the cluster. This approach scales to thousands of nodes with minimal overhead.

Key characteristics:

  • Decentralized: No single point of failure for membership management
  • Probabilistic: Uses random peer selection for efficient propagation
  • Failure detection: Detects node failures within seconds
  • Event broadcasting: Disseminates custom events and queries across the cluster

LAN vs WAN Gossip Pools

Consul maintains separate gossip pools for different network topologies:

LAN Pool (serfLAN):

  • Connects agents within a single datacenter
  • Handles local node discovery and health monitoring
  • Supports segments for logical grouping within a datacenter
  • Processes member join, leave, fail, and reap events

WAN Pool (serfWAN):

  • Connects servers across multiple datacenters
  • Uses mesh gateway transport for cross-datacenter federation
  • Validates that joining nodes are servers (not clients)
  • Prevents datacenter mismatches during cluster merges

Member Lifecycle

When a node joins the cluster, Serf broadcasts a EventMemberJoin event. The event handler processes this through several stages:

  1. Join Detection (lanNodeJoin): Identifies Consul servers and updates the server lookup table
  2. Reconciliation (localMemberEvent): Leaders reconcile Serf state with the catalog
  3. Registration (HandleAliveMember): Registers the node in the catalog with a passing health check

When a node fails or leaves, similar handlers (lanNodeFailed, HandleFailedMember) mark it critical or deregister it.

Merge Delegates

Merge delegates validate cluster merges when partitioned networks rejoin:

LAN Merge Delegate:

  • Checks for conflicting node IDs
  • Validates all nodes are in the same datacenter
  • Prevents duplicate node IDs across the cluster

WAN Merge Delegate:

  • Ensures only servers join the WAN pool
  • Validates server metadata consistency
  • Can disable federation if misconfiguration is detected

Event Handling

Serf events flow through dedicated event channels:

// Event types handled
case serf.EventMemberJoin:
    s.lanNodeJoin(e.(serf.MemberEvent))
case serf.EventMemberLeave, serf.EventMemberFailed, serf.EventMemberReap:
    s.lanNodeFailed(e.(serf.MemberEvent))
case serf.EventUser:
    s.localEvent(e.(serf.UserEvent))
case serf.EventMemberUpdate:
    s.lanNodeUpdate(e.(serf.MemberEvent))

User events enable custom workflows like remote execution and cluster-wide notifications.

Bootstrap Coordination

During cluster bootstrap, servers use gossip to discover peers and coordinate Raft initialization. The maybeBootstrap function:

  1. Scans LAN members for expected server count
  2. Queries each peer for existing Raft state
  3. Initializes Raft cluster configuration if no existing state is found
  4. Prevents spurious elections by ensuring only one bootstrap occurs

Configuration & Tuning

Consul-specific Serf defaults in libserf.DefaultConfig():

  • MinQueueDepth: 4096 (dynamically sized based on cluster size)
  • LeavePropagateDelay: 3 seconds (allows graceful leave propagation)
  • QueueDepthWarning: 1,000,000 (effectively disabled)

These settings optimize for large clusters while maintaining responsiveness.

Member Metadata

Serf tags encode critical node information:

  • role: "consul" (server) or "node" (client)
  • dc: Datacenter name
  • id: Unique node ID
  • vsn: Protocol version
  • port: RPC port
  • grpc_port, grpc_tls_port: gRPC endpoints
  • bootstrap, expect: Bootstrap configuration
  • read_replica: Read-only server flag

This metadata enables intelligent routing and version compatibility checks.

HTTP API & Endpoints

Relevant Files
  • agent/http.go
  • agent/http_register.go
  • agent/health_endpoint.go
  • agent/catalog_endpoint.go
  • api/api.go

Consul exposes a comprehensive HTTP API for service discovery, health checks, configuration management, and cluster operations. The HTTP server is built on Go's standard net/http package with a custom routing and middleware layer.

Architecture Overview

Loading diagram...

Endpoint Registration System

Endpoints are registered at package initialization time using the registerEndpoint() function in http_register.go. Each endpoint maps a URL pattern to an HTTP method set and a handler function:

registerEndpoint("/v1/catalog/services", []string{"GET"}, (*HTTPHandlers).CatalogServices)
registerEndpoint("/v1/agent/service/register", []string{"PUT"}, (*HTTPHandlers).AgentRegisterService)

The registration system maintains two global maps:

  • endpoints: Maps URL patterns to unbound endpoint handler functions
  • allowedMethods: Maps patterns to supported HTTP methods (GET, PUT, DELETE, POST)

Request Handling Pipeline

When a request arrives, it flows through multiple layers:

  1. Routing: The http.ServeMux matches the request path to a registered pattern
  2. Middleware: Gzip compression, metrics collection, and ACL authorization
  3. Handler Execution: The endpoint handler processes the request and returns (interface{}, error)
  4. Response Encoding: Results are JSON-encoded and sent to the client

The wrap() function standardizes response handling by converting endpoint results into HTTP responses with proper status codes, headers, and error formatting.

Key Endpoint Categories

  • ACL: Token management, policy creation, role-based access control
  • Agent: Service registration, health checks, node information
  • Catalog: Service discovery, node listing, datacenter information
  • Health: Health status queries, service health checks
  • Connect: mTLS certificate management, intentions, authorization
  • KV Store: Key-value storage operations
  • Operator: Raft configuration, autopilot, keyring management
  • Session: Session creation and management for distributed locks

Error Handling

Three error types provide flexible HTTP response control:

  • HTTPError: Returns custom status code with plain text reason
  • CodeWithPayloadError: Returns non-200 status with custom content type
  • MethodNotAllowedError: Handles unsupported HTTP methods

Performance Features

  • Gzip Compression: Automatically applied to responses (configurable minimum size)
  • Metrics: All requests tracked with method and path labels via Prometheus
  • Caching: Client-side caching via blocking queries and watch mechanisms
  • Rate Limiting: Configurable per-endpoint rate limits via middleware

Configuration & Agent Setup

Relevant Files
  • agent/config/builder.go
  • agent/consul/config.go
  • command/agent/agent.go
  • agent/agent.go

Consul's configuration system is built on a multi-layered approach that merges configuration from multiple sources with a well-defined precedence order. The agent startup process loads and validates configuration, then initializes all necessary components.

Configuration Loading Pipeline

The configuration builder processes sources in this order:

  1. Default configuration – Built-in defaults for all settings
  2. Config files – HCL or JSON files in alphabetical order
  3. Command-line flags – Override file-based settings
  4. Overrides – Final programmatic overrides

The LoadOpts struct in builder.go controls this process. It accepts ConfigFiles (paths to HCL/JSON files), FlagValues (command-line arguments), and optional Overrides for testing or special cases. The builder validates file extensions and skips non-HCL/JSON files in directories with a warning.

RuntimeConfig Construction

The RuntimeConfig struct represents the fully resolved configuration after all sources are merged. Key sections include:

  • Network Configuration – Bind addresses, advertise addresses, ports for RPC, DNS, HTTP, gRPC
  • Cluster Settings – Datacenter, node name, bootstrap mode, Raft parameters
  • ACL Configuration – Token settings, policy TTLs, default policies
  • TLS Configuration – Certificate paths, verification modes, minimum TLS versions
  • Gossip Protocol – Serf LAN/WAN settings, probe intervals, suspicion multipliers
  • Service Discovery – DNS settings, service TTLs, recursors
  • Connect/Service Mesh – CA provider, virtual IP CIDRs, mesh gateway settings

Agent Initialization

The Agent struct in agent/agent.go orchestrates startup through the New() and Start() methods:

New() creates the agent instance and registers cache types. It initializes:

  • Token store for ACL tokens
  • Service manager for proxy configuration
  • RPC clients for health, config entries, and peering
  • File watcher for auto-reload capability

Start() brings up all agent subsystems:

  • Creates local state and anti-entropy synchronizer
  • Initializes Consul server or client based on ServerMode
  • Starts DNS, HTTP, HTTPS, and gRPC listeners
  • Launches proxy configuration manager
  • Begins retry join attempts and watch plan execution

Consul Server/Client Configuration

The newConsulConfig() function translates RuntimeConfig into consul.Config. This includes:

  • Raft configuration (election timeout, heartbeat timeout, snapshot settings)
  • Serf LAN/WAN configuration (bind addresses, gossip parameters)
  • ACL resolver settings (token TTLs, default policies)
  • Autopilot configuration (dead server cleanup, stabilization time)
  • Connect CA configuration (provider type, certificate TTLs)

Configuration Validation

The builder validates:

  • Port ranges (DNS, HTTP, gRPC, Serf, RPC)
  • Address formats (IPv4/IPv6 compatibility)
  • Raft multiplier bounds (1 to 10)
  • Virtual IP CIDR blocks for Connect
  • Deprecated configuration keys with warnings

Invalid configurations return errors during the Load() call, preventing agent startup with broken settings.

Dynamic Configuration Reload

Agents support configuration reload via SIGHUP signal. The ReloadConfig() method updates:

  • Request rate limits
  • RPC timeouts and burst settings
  • Raft snapshot thresholds
  • Config entry bootstrap entries
  • Reporting settings

File watchers can trigger automatic reloads when TLS certificates or config files change, controlled by the AutoReloadConfig setting.

Loading diagram...

Health Checking System

Relevant Files
  • agent/checks/check.go
  • agent/consul/health_endpoint.go
  • agent/agent.go
  • agent/structs/check_type.go

Consul's health checking system enables agents to monitor service and node health through periodic checks. The system supports multiple check types, each suited for different monitoring scenarios.

Check Types

Consul supports nine distinct check types, each with specific use cases:

  • Script: Executes a custom script at regular intervals. Exit code 0 = passing, 1 = warning, other = critical.
  • HTTP: Makes periodic HTTP requests. Status codes 2xx = passing, 429 = warning, others = critical.
  • TCP: Attempts TCP connections to verify service availability.
  • UDP: Sends UDP datagrams and validates responses.
  • gRPC: Sends gRPC health check requests following the standard gRPC health protocol.
  • H2PING: Performs HTTP/2 ping operations to verify connectivity.
  • TTL: Client-driven checks where the client must periodically update status. Automatically marks critical if TTL expires.
  • Docker: Executes scripts inside Docker containers using the Docker API.
  • OS Service: Monitors Windows services or systemd services on Linux.

Check Lifecycle

Each check type (except TTL) runs in its own goroutine with a configurable interval. The lifecycle follows this pattern:

  1. Initialization: Check is created with configuration (interval, timeout, target URL/address, etc.)
  2. Start: Start() method launches the monitoring goroutine
  3. Periodic Execution: Check runs at specified intervals with randomized initial delay to prevent thundering herd
  4. Status Update: Results are passed to StatusHandler which applies threshold logic
  5. Stop: Stop() method gracefully terminates the goroutine

Status Handling and Thresholds

The StatusHandler implements failure and success thresholds to prevent flapping:

  • Success Before Passing: Number of consecutive passing checks before status changes to passing
  • Failures Before Warning: Threshold for transitioning to warning state
  • Failures Before Critical: Threshold for transitioning to critical state

This prevents temporary network glitches from immediately marking services as unhealthy.

Health Endpoint

The Health RPC endpoint (agent/consul/health_endpoint.go) provides query capabilities:

  • ChecksInState: Retrieve all checks in a specific state (passing, warning, critical)
  • NodeChecks: Get all checks for a specific node
  • ServiceChecks: Get all checks for a specific service
  • ServiceNodes: Get healthy nodes running a service with health information

All queries support ACL filtering, bexpr filtering, and node metadata filtering.

Output Management

Check output is captured in circular buffers with configurable maximum size (default 4KB) to prevent excessive memory consumption. Output is truncated with a notice if it exceeds the limit.

Timeout and Execution Safety

  • Minimum interval enforced to prevent fork bombing (1 second minimum)
  • Configurable timeouts prevent hung checks from blocking the system
  • Script checks use process subtrees to ensure proper cleanup
  • Concurrent check execution is prevented through synchronization primitives
Loading diagram...