Install

mongodb/mongo

MongoDB Server Architecture

Last updated on Dec 17, 2025 (Commit: a013280)

Overview

Relevant Files
  • README.md
  • src/mongo/db/README.md
  • docs/README.md
  • src/mongo/db/mongod_main.cpp
  • src/mongo/s/mongos_main.cpp

MongoDB is a document-oriented NoSQL database system that stores data in flexible JSON-like documents. This repository contains the complete source code for MongoDB Server, including both the database engine (mongod) and the sharding router (mongos).

Core Architecture

MongoDB's architecture is organized into several major subsystems:

Loading diagram...

Main Components

Database Server (mongod) - The core database process that handles data storage, queries, and replication. Located in src/mongo/db/, it manages:

  • Command execution and query processing
  • Data persistence through the storage engine
  • Replica set coordination
  • Shard server operations

Sharding Router (mongos) - Routes queries to appropriate shards in a distributed cluster. Located in src/mongo/s/, it provides:

  • Query routing and aggregation
  • Shard discovery and topology management
  • Distributed transaction coordination

Key Subsystems

Query Execution (src/mongo/db/exec/) - Three execution engines process queries:

  • SBE (Slot-Based Engine) - Modern, optimized execution engine
  • Classic Engine - Traditional execution framework
  • Document Sources - Aggregation pipeline execution

Storage (src/mongo/db/storage/) - Abstraction layer over WiredTiger with support for:

  • CRUD operations on collections
  • Index management
  • Snapshot isolation and transactions

Replication (src/mongo/db/repl/) - Maintains data consistency across replica sets:

  • Oplog (operation log) management
  • Primary/secondary synchronization
  • Failover coordination

Sharding (src/mongo/db/s/) - Distributes data across multiple servers:

  • Chunk management and balancing
  • Shard key routing
  • Distributed transactions

BSON (src/mongo/bson/) - Binary JSON serialization format for data representation and network communication.

RPC & Transport (src/mongo/rpc/, src/mongo/transport/) - Network communication layer supporting multiple protocols and connection types.

Build System

The project uses Bazel for building. Configuration files are in bazel/ and BUILD.bazel files throughout the codebase. Tests are organized in jstests/ (JavaScript tests) and src/mongo/dbtests/ (C++ unit tests).

Development Resources

Comprehensive documentation is available in docs/ covering topics like building, testing, architecture patterns, and internal systems. The buildscripts/ directory contains automation for compilation, testing, and deployment.

Architecture & Core Components

Relevant Files
  • src/mongo/db/service_context.h
  • src/mongo/db/operation_context.h
  • src/mongo/db/client.h
  • src/mongo/transport/session_workflow.h
  • src/mongo/transport/session_workflow.cpp
  • src/mongo/transport/service_entry_point.h
  • src/mongo/db/commands.h

MongoDB's server architecture is built on a hierarchical context model that manages state from the process level down to individual operations. Understanding these core components is essential for working with the codebase.

The Context Hierarchy

MongoDB uses a three-level context hierarchy to manage server state:

  1. ServiceContext - The root singleton representing the entire server process (mongod or mongos). It owns all Clients and manages global resources like the storage engine, transport layer, and periodic task runners.

  2. Service - A grouping of Clients that determines their ClusterRole (shard or router). A ServiceContext can own multiple Services, allowing a single process to act as both shard and router.

  3. Client - Represents a logical connection to the database. Each Client can maintain at most one active OperationContext at a time and is associated with a transport Session.

  4. OperationContext - Encapsulates the state of a single operation from dispatch until completion. It tracks transaction state, deadlines, locks, and recovery units.

Request Flow: From Network to Execution

Loading diagram...

The request flow follows these steps:

  1. Transport Layer accepts a client connection and creates a Session
  2. SessionWorkflow manages the session lifecycle, parsing incoming messages into work items
  3. ServiceEntryPoint receives the work item and creates an OperationContext
  4. Command Handler executes the command using the OperationContext
  5. ReplyBuilder constructs the response message
  6. SessionWorkflow sends the response back through the Session

Key Components

SessionWorkflow organizes network messages into MongoDB protocol requests and responses. It handles special message types like exhaust commands (multiple responses) and fire-and-forget commands (no response). Each SessionWorkflow manages one active work item at a time, ensuring sequential processing of requests from a single client.

ServiceEntryPoint is the entry point from the transport layer into command execution. It receives messages from SessionWorkflow, creates OperationContexts, and dispatches commands. Different implementations exist for shard-role and router-role servers.

OperationContext is heavily decorated (inheriting from Decorable), making it dynamically extensible. It manages operation-specific state including locks, recovery units, deadlines, and cancellation tokens. Each operation has a unique OperationId for tracking and killing.

Client represents a logical connection and acts as a factory for OperationContexts. It maintains a reference to the transport Session and enforces the invariant that only one OperationContext can be active at a time.

Decorable Pattern

All context classes inherit from Decorable, allowing subsystems to attach arbitrary data without modifying core classes. This enables loose coupling between components and makes the architecture highly extensible.

Concurrency and Synchronization

The architecture uses several synchronization primitives:

  • RWMutex for reader-writer locks on shared resources
  • SpinLock for protecting Client state
  • Condition Variables for signaling state changes
  • Atomic Words for lock-free counters and flags
  • Cancellation Tokens for cooperative operation cancellation

This design allows multiple operations to proceed concurrently while maintaining consistency through fine-grained locking.

Query Engine & Optimization

Relevant Files
  • src/mongo/db/query/README.md
  • src/mongo/db/query/README_QO.md
  • src/mongo/db/query/README_explain.md
  • src/mongo/db/exec/README.md
  • src/mongo/db/query/plan_cache/README.md

MongoDB's query engine transforms user requests into optimized execution plans. The system handles find, aggregate, count, distinct, and write commands through a unified pipeline of parsing, optimization, planning, and execution.

Core Architecture

The query system follows a multi-stage pipeline:

  1. Parsing & Canonicalization: User queries are parsed into a CanonicalQuery object containing the filter (as a MatchExpression), projection, and sort specifications.

  2. Logical Optimization: The MatchExpression is optimized through heuristic rewrites. For aggregation pipelines, stages are analyzed to determine which can be pushed down to the find layer.

  3. Query Planning: The plan enumerator generates candidate QuerySolution trees representing different execution strategies. Each solution is a tree of QuerySolutionNodes (e.g., CollectionScanNode, IndexScanNode, FetchNode).

  4. Plan Selection: Candidate plans are ranked using either the classic multiplanner (trial execution) or the cost-based ranker (cardinality estimation & cost model). The winning plan is selected.

  5. Execution: The PlanExecutor executes the winning plan through either the classic engine (PlanStage tree) or the Slot-Based Execution (SBE) engine.

Plan Cache

The plan cache stores winning execution plans to avoid redundant planning for recurring query shapes. Two implementations exist:

  • Classic Plan Cache: Per-collection, stores SolutionCacheData with index tags that reconstruct QuerySolution trees.
  • SBE Plan Cache: Process-wide, stores complete sbe::PlanStage trees with auto-parameterized expressions.

Cache entries transition from inactive (unproven) to active (proven efficient). Replanning occurs if a cached plan's work exceeds the eviction ratio threshold (10x by default).

Explain Command

The explain command provides query observability with three verbosity modes:

  • queryPlanner: Shows the winning and rejected plans without execution.
  • executionStats: Executes the winning plan and reports statistics.
  • allPlansExecution: Executes all candidate plans and compares their performance.

Different PlanExplainer implementations handle classic, SBE, express, and pipeline executors, each providing stage-level execution statistics and plan metadata.

Execution Models

Classic Yielding: Plan stages use interrupt-style yielding. When a yield is needed, NEED_YIELD unwinds the call stack, allowing the executor to release and reacquire storage resources.

SBE Yielding: Stages perform cooperative yielding in-place without unwinding. Each stage must check for interrupts/yields regularly and avoid unbounded delays between checks.

Loading diagram...

Storage Engine & Data Management

Relevant Files
  • src/mongo/db/storage/storage_engine.h
  • src/mongo/db/storage/recovery_unit.h
  • src/mongo/db/storage/write_unit_of_work.h
  • src/mongo/db/storage/record_store.h
  • src/mongo/db/storage/kv/kv_engine.h
  • src/mongo/db/storage/README.md

MongoDB's storage engine is a pluggable abstraction layer that manages how data is persisted to disk. The architecture supports multiple storage engine implementations through a well-defined API, with WiredTiger being the default production engine.

Core Architecture

The storage engine stack consists of several key layers:

  • StorageEngine - Top-level interface defining the contract for any storage engine implementation
  • KVEngine - Key-value engine abstraction for engines that use key-value storage (like WiredTiger)
  • RecordStore - Manages collections as ordered sets of documents with unique RecordIds
  • SortedDataInterface - Implements indexes as sorted key-value structures
  • RecoveryUnit - Manages transaction semantics and snapshot isolation
Loading diagram...

Transactions & Recovery Units

A RecoveryUnit represents a storage transaction with snapshot isolation guarantees. Each operation context holds a recovery unit for its lifetime. Key properties:

  • Snapshot Isolation - All reads see a consistent view of data at a specific point in time
  • Atomicity - All writes commit together or roll back together
  • Timestamps - Optional point-in-time reads using commit timestamps
  • Lazy Initialization - Storage transactions start implicitly on first read/write

Write Units of Work

WriteUnitOfWork is an RAII wrapper that manages transactional writes:

{
    WriteUnitOfWork wuow(opCtx);
    recordStore->insertRecord(opCtx, data);
    index->insert(opCtx, key, recordId);
    wuow.commit();  // All writes become visible atomically
}  // If commit() not called, transaction rolls back

Writes outside a WriteUnitOfWork are illegal. Nested WUOWs are supported, but only the top-level one commits to the storage engine.

Concurrency & Conflicts

MongoDB uses optimistic concurrency control. Write conflicts can occur when multiple operations modify the same data:

  • WriteConflictException - Transient failure; operation should retry
  • TemporarilyUnavailableException - Cache pressure; retry with backoff
  • TransactionTooLargeForCacheException - Operation exceeds cache capacity

The writeConflictRetry helper transparently handles retries with exponential backoff.

Data Organization

Collections and indexes map to storage engine idents (unique identifiers):

  • Collection idents: collection-<uuid>
  • Index idents: index-<uuid>

Each ident corresponds to a separate storage structure (e.g., WiredTiger table with .wt file).

Durability & Checkpoints

MongoDB ensures durability through two mechanisms:

  1. Checkpoints - Periodic snapshots of all data; frequency controlled by storage.syncPeriodSecs
  2. Journaling - Write-ahead log for replicated writes; enables recovery between checkpoints

On startup, the storage engine recovers to the last checkpoint, then replays journaled writes.

Oplog & Replication

The oplog (local.oplog.rs) is a capped collection storing all replicated writes. Key considerations:

  • Oplog holes can occur with concurrent writes at different timestamps
  • oplogReadTimestamp tracks the no-holes point to prevent secondaries from missing entries
  • Oplog entries are written in the same transaction as the write they log (on primary)

Replication & High Availability

Relevant Files
  • src/mongo/db/repl/README.md
  • src/mongo/db/repl/replication_coordinator.h
  • src/mongo/db/repl/oplog.h
  • src/mongo/db/repl/initial_sync/initial_syncer.h
  • src/mongo/db/repl/bgsync.h
  • src/mongo/db/repl/oplog_fetcher.h
  • src/mongo/db/repl/topology_coordinator.h

MongoDB achieves high availability through replica sets, which consist of multiple nodes with one primary and multiple secondaries. The primary handles all writes, while secondaries continuously replicate data from the primary or other secondaries through a pull-based mechanism.

Core Architecture

Loading diagram...

Steady State Replication

Primary Operations: When a write occurs on the primary, it is applied to the database and an entry is written to the oplog (operation log), a capped collection in the local database. The oplog contains idempotent descriptions of all write operations, allowing secondaries to replay them exactly.

Secondary Synchronization: Secondaries pull oplog entries from their sync source (typically the primary or another secondary) via the OplogFetcher. This fetcher establishes an exhaust cursor to continuously receive batches of oplog entries without requiring additional getMore commands. Entries are buffered in an OplogBuffer before being applied.

Oplog Processing Pipeline

Secondaries process fetched oplog entries through three stages:

  1. OplogWriter: Persists fetched entries to the oplog collection and updates oplog visibility
  2. OplogApplierBatcher: Groups entries into batches respecting operation dependencies (e.g., commands must be applied alone)
  3. OplogApplier: Applies batches in parallel when possible, serializing operations on the same document

Sync Source Selection

The SyncSourceResolver and TopologyCoordinator work together to select an optimal sync source. Selection criteria include:

  • Node must be ahead of the secondary (have newer oplog entries)
  • Node's oplog must not be more than maxSyncSourceLagSecs behind the primary (default: 30 seconds)
  • Closest node by ping time is preferred (within changeSyncSourceThresholdMillis, default: 5 ms)
  • Chaining can be disabled to force syncing from the primary

Cluster Communication

Nodes communicate through three primary mechanisms:

  • Heartbeats: Every 2 seconds, nodes exchange replSetHeartbeat commands to check liveness and propagate configuration changes
  • Oplog Fetching: Secondaries continuously fetch oplog entries from their sync source
  • Update Position Commands: Secondaries send replSetUpdatePosition to inform the primary of their replication progress

Write Concern & Commit Point

Write concern specifies how many nodes must acknowledge a write before returning. The commit point is the OpTime such that all earlier entries have been replicated to a majority. The primary advances the commit point by checking the highest lastWritten or lastDurable OpTime on a majority of nodes. Secondaries learn the commit point via heartbeats and oplog fetcher metadata.

Initial Sync

When a new node joins the replica set, it performs initial sync to copy all data:

  1. Sync Source Selection: Choose a healthy replica set member
  2. Database Cloning: Clone all databases and collections from the sync source
  3. Oplog Fetching: Fetch oplog entries that occurred during cloning
  4. Oplog Application: Apply fetched entries to reach consistency with the primary

The InitialSyncer orchestrates this process, using AllDatabaseCloner and CollectionCloner to copy data, then applying buffered oplog entries.

Elections & Failover

When the primary becomes unavailable, secondaries initiate an election using the Raft-based protocol (PV1). Each node has a term counter; only one primary can be elected per term. Elections consider node priority, oplog recency, and voting eligibility. The ReplicationCoordinator manages election state and coordinates with the TopologyCoordinator for topology decisions.

Read Concern Levels

  • Local: Reads the most recent data on the node
  • Majority: Reads from the stable snapshot (committed to a majority)
  • Linearizable: Blocks to ensure the node remains primary, preventing stale reads
  • Snapshot: Reads from a specific point-in-time snapshot
  • Available: Like local, but faster on secondaries (may return orphan data in sharded clusters)

Sharding & Distributed Execution

Relevant Files
  • src/mongo/s/mongos_main.cpp
  • src/mongo/s/query/README.md
  • src/mongo/s/query/planner/README.md
  • src/mongo/s/write_ops/batch_write_exec.h
  • src/mongo/s/transaction_router.h

MongoDB sharding distributes data across multiple servers (shards) to horizontally scale the database. Each shard owns a subset of data based on the shard key, enabling parallel processing and improved performance. The router (mongos) coordinates all distributed operations.

Architecture Overview

Loading diagram...

Query Routing & Targeting

When a query arrives at mongos, it must determine which shards own the relevant data. The router consults the routing table, which maps chunks (contiguous ranges of shard keys) to shard IDs. If the query includes the shard key, mongos targets only the shards owning those key ranges. Without a shard key, the query broadcasts to all shards.

The targeting process extracts the shard key from the query predicate and uses the chunk manager to identify intersecting chunks. For example, if a collection is sharded on location and the query filters by location: "5th Avenue", only the shard owning that location range is targeted.

Distributed Query Execution

Queries on sharded collections follow a two-phase execution model:

  1. Shards Part: Filters and operations that can run in parallel on each shard (e.g., $match, partial $group)
  2. Merge Part: Global aggregation on a single node (mongos or a designated shard)

The router splits the query pipeline, dispatches the shards part to targeted shards with consistent versioning information, collects partial results via cursors, then executes the merge part globally.

Write Operations & Batch Execution

Write operations use BatchWriteExec to target documents to appropriate shards based on the shard key. For multi-shard writes, the executor:

  • Targets each document to its owning shard
  • Sends batches to multiple shards in parallel
  • Handles retries on stale routing information
  • Aggregates responses and reports errors

Distributed Transactions

The TransactionRouter manages multi-shard transactions. It tracks which shards participate in the transaction and coordinates commit:

  • Single-shard transactions: Commit directly to the participant
  • Multi-shard transactions: Use two-phase commit via a coordinator
  • Stale errors: Refresh routing table and retry if safe

Versioning & Consistency

MongoDB uses placement versioning to ensure routing correctness. Each request includes the router's known placement version. Shards validate this version and reject requests if their version is newer, signaling a stale routing table. The router then refreshes from the config server and retries, maintaining consistency across distributed operations.

Transactions & Session Management

Relevant Files
  • src/mongo/db/transaction/transaction_participant.h
  • src/mongo/db/session/session_catalog.h
  • src/mongo/db/session/logical_session_id.h
  • src/mongo/db/session/session.h

MongoDB provides two mechanisms for ensuring data consistency across multiple operations: retryable writes for single operations and multi-document transactions for coordinated changes across multiple documents and collections.

Logical Sessions

Every client connection establishes a logical session identified by a LogicalSessionId (UUID). Sessions enable:

  • Retryable writes: Automatic retry of write operations on network failures
  • Multi-document transactions: ACID guarantees across multiple operations
  • Session state tracking: Persistent record of executed operations in config.transactions

The SessionCatalog maintains runtime state for all active sessions on a server instance. It uses a parent-child session model where internal transactions for retryable writes create child sessions linked to parent sessions.

Retryable Writes

Retryable writes allow drivers to safely retry non-idempotent operations. Each write operation receives:

  • Transaction Number (txnNumber): Identifies the logical write sequence
  • Statement ID (stmtId): Unique identifier for each operation within a batch
  • Oplog Entry: Records the operation with lsid, txnNumber, and stmtId for reconstruction

When a retry arrives, the server checks if the stmtId was already executed. If so, it returns the cached result from the oplog instead of re-executing.

Multi-Document Transactions

Transactions follow a strict state machine managed by TransactionParticipant:

None → InProgress → Prepared → Committed
                  ↘ AbortedWithoutPrepare
                  ↘ AbortedWithPrepare

Key states:

  • InProgress: Operations are collected and buffered
  • Prepared: Two-phase commit phase 1; transaction is durable but not yet applied
  • Committed: Changes are applied and visible
  • Aborted: Transaction rolled back; no changes persisted

Transaction Lifecycle

  1. Begin: beginOrContinue() starts a transaction with autocommit: false
  2. Execute: Operations accumulate in TransactionOperations
  3. Prepare (optional): prepareTransaction durably records intent; drops replication state locks
  4. Commit: commitTransaction applies changes; for prepared transactions, requires a commitTimestamp
  5. Abort: abortTransaction rolls back all changes

Resource Management

Transactions stash and unstash resources (TxnResources) between network operations:

  • Stash: Saves locks, recovery unit, and read concern when yielding
  • Unstash: Restores resources when resuming; validates consistency

This enables long-running transactions without blocking other operations.

Internal Transactions for Retryable Writes

When a retryable write targets multiple shards, MongoDB creates an internal transaction (child session) to coordinate the write atomically. These transactions:

  • Share the parent session's txnNumber
  • Execute with autocommit: false internally
  • Automatically commit after all writes complete
  • Enable retry semantics across distributed systems
Loading diagram...

Session Checkout and Concurrency

The SessionCatalog enforces single-threaded access per logical session:

  • Only one OperationContext can hold a session at a time
  • Concurrent requests block until the session becomes available
  • Sessions are reaped when idle to free resources
  • Kill tokens enable safe session termination

Aggregation Pipeline

Relevant Files
  • src/mongo/db/pipeline/README.md
  • src/mongo/db/pipeline/pipeline.h
  • src/mongo/db/pipeline/document_source.h
  • src/mongo/db/pipeline/expression.h
  • src/mongo/db/pipeline/optimization/optimize.cpp

The aggregation pipeline is MongoDB's framework for transforming and analyzing data through a series of stages. Each stage processes documents sequentially, passing results to the next stage.

Core Architecture

The pipeline is built on two fundamental abstractions:

DocumentSource - Represents a single stage in the pipeline (e.g., $match, $group, $project). Each DocumentSource subclass implements stage-specific logic for filtering, transforming, or aggregating documents. Stages are chained together in a container and executed sequentially.

Expression - A stateless component that evaluates to a value without mutating inputs. Expressions are used within stages to compute field values, apply operators, and reference document fields or variables. Examples include $add, $sum, $cond, and field path expressions like "$inventory.total".

Key Pipeline Stages

Common stages include:

  • $match - Filters documents using query predicates
  • $project - Reshapes documents by including/excluding fields or computing new ones
  • $group - Groups documents by a key and applies accumulators
  • $sort - Orders documents by specified fields
  • $limit / $skip - Restricts result set size and position
  • $lookup - Performs left outer joins with other collections
  • $unwind - Deconstructs array fields into separate documents
  • $facet - Processes multiple sub-pipelines in parallel
  • $merge / $out - Writes results to a collection

Pipeline Optimization

After parsing, pipelines undergo two-phase optimization via pipeline_optimization::optimizePipeline():

Inter-stage Optimization - Reorders and combines stages to improve efficiency. Examples include pushing $match filters earlier to reduce documents processed by expensive stages, coalescing $sort with $limit, and combining consecutive $match stages.

Stage-specific Optimization - Optimizes individual stages independently. This includes removing no-op stages (e.g., empty $match), constant folding (evaluating expressions with constant values), and stage-specific rewrites.

Loading diagram...

Dependency Tracking

The optimizer uses dependency analysis to determine which stages can be safely reordered. Dependencies include:

  • Field dependencies - Specific document fields required by a stage
  • Computed fields - New fields generated by stages like $addFields
  • Variable references - Dependencies on scoped variables ($$CURRENT, user-defined)
  • Metadata - Non-field contextual information like text search scores

Stages with dependencies on earlier stages must remain sequentially after them, while independent stages can be pushed down for optimization.

Execution Model

Pipelines execute through a pull-based model where each stage requests documents from its source. The pipeline is converted to an execution engine representation (agg::Stage) for efficient processing. Stages can be distributed across shards in sharded clusters, with partial aggregation on shards and final merging on the router.

Security & Authentication

Relevant Files
  • src/mongo/db/auth/README.md
  • src/mongo/db/auth/authorization_manager.h
  • src/mongo/db/auth/sasl_commands.cpp
  • src/mongo/crypto/README.JWT.md
  • src/mongo/crypto/jws_validated_token.h

MongoDB implements a comprehensive security model combining authentication, authorization, and encryption. The system validates client identity, grants appropriate permissions, and protects data in transit and at rest.

Authentication

Authentication verifies client identity through multiple mechanisms:

SASL Mechanisms handle most client authentication. Supported mechanisms include:

  • SCRAM-SHA-256 (preferred): Salted Challenge Response with SHA-256 hashing, provides mutual authentication without transmitting passwords
  • SCRAM-SHA-1: Legacy variant using SHA-1
  • PLAIN: Simple username/password exchange, typically used with LDAP
  • GSSAPI: Kerberos-based authentication for enterprise environments
  • MONGODB-X509: Certificate-based authentication using X.509 certificates from TLS handshakes

Speculative Authentication reduces connection overhead by embedding authentication in the initial hello command, potentially completing authentication in a single round trip.

Cluster Authentication uses either X.509 certificates or keyfile-based SCRAM-SHA-256 for server-to-server communication. X.509 supports zero-downtime certificate rotation through override mechanisms.

Authorization

Authorization determines what authenticated users can do:

Users and Roles form a hierarchical privilege model. Users are assigned roles, which grant privileges on specific resources. Roles can inherit from other roles, creating a privilege tree.

Privileges combine a ResourcePattern (what can be accessed) with ActionTypes (what operations are allowed). Resource patterns support fine-grained scoping:

  • Cluster-wide operations (cluster: true)
  • Database-level access (db: 'name')
  • Collection-specific access (collection: 'name')
  • System collections and time-series buckets

Authorization Caching uses a ReadThroughCache for performance. User information is cached and invalidated when user management commands execute. On sharded clusters, mongos instances periodically check a cache generation counter to stay synchronized.

Authentication Restrictions limit where users can connect from using CIDR notation for IP ranges and server addresses.

Cryptography & JWT

JSON Web Tokens (JWT) provide secure claims exchange for OIDC and multitenancy:

Loading diagram...

JWSValidatedToken validates signed tokens through:

  1. Parsing the header to extract the Key ID
  2. Retrieving the corresponding public key from JWKManager
  3. Verifying the signature using platform-specific crypto (OpenSSL on Linux)
  4. Validating expiration (exp) and not-before (nbf) claims
  5. Parsing the token body as a JWT object

Callers must independently validate issuer (iss) and audience (aud) claims.

Authorization Flow

When a client executes a command:

  1. AuthorizationSession checks if the user has required privileges
  2. AuthorizationManager looks up user information (cached via ReadThroughCache)
  3. Privileges are computed from the user's roles and their subordinate roles
  4. If authorization succeeds, the command executes; otherwise, an error is returned

Authorization Backends support multiple sources:

  • Local: Users stored in admin.system.users collection
  • LDAP: External LDAP server for credential validation
  • X.509: Certificate-based authorization from external databases

Security Considerations

  • Passwords are never stored in plaintext; SCRAM mechanisms use salted hashes
  • Cluster authentication uses strong mechanisms (X.509 or SCRAM-SHA-256)
  • Localhost auth bypass allows initial setup but is disabled once users exist
  • Multitenancy isolates users and roles per tenant with prefixed collections
  • JWT validation is platform-specific; signature validation only available on Linux

Utilities & Infrastructure

Relevant Files
  • src/mongo/util/README.md
  • src/mongo/bson/README
  • src/mongo/executor/README.md
  • src/mongo/logv2/log.h
  • src/mongo/util/cancellation.h
  • src/mongo/util/future.h
  • src/mongo/util/concurrency/

MongoDB's infrastructure layer provides essential utilities for asynchronous execution, data serialization, and system-level operations. These components form the foundation upon which higher-level features are built.

BSON: Binary JSON Format

BSON (Binary JSON) is MongoDB's binary storage format, inspired by JSON but with additional types like Date and ObjectId. The C++ implementation in src/mongo/bson/ provides efficient serialization and deserialization. Key components include:

  • BSONObj & BSONElement: Core types for representing BSON documents and individual fields
  • BSONObjBuilder: Fluent API for constructing BSON documents
  • BSONColumn: Columnar compression format for efficient storage of repeated values
  • Validation: BSON validation and integrity checking utilities

Executors: Asynchronous Task Scheduling

Executors are the backbone of MongoDB's asynchronous execution model. They schedule and execute work in FIFO order without blocking the caller.

Core Types:

  • OutOfLineExecutor: Base class declaring schedule(Task task) for non-blocking task submission
  • TaskExecutor: Extends OutOfLineExecutor with event management and remote command scheduling
  • ThreadPoolTaskExecutor: Implements TaskExecutor using a thread pool
  • PinnedConnectionTaskExecutor: Ensures all operations run over the same transport connection
  • TaskExecutorPool: Distributes work across multiple executors

Cancellation: Hierarchical Cancellation Tokens

The cancellation system enables clean shutdown and resource cleanup through CancellationSource and CancellationToken pairs:

  • CancellationSource: Manages cancellation state; call cancel() to trigger cancellation
  • CancellationToken: Obtained from a source via token(); check isCanceled() or chain continuations via onCancel()
  • Hierarchies: Child sources inherit parent cancellation, enabling cascading cancellation of related operations

Example: When a parent operation is canceled, all child operations automatically cancel without manual tracking.

Logging: Structured Logging with LOGV2

MongoDB uses structured logging via the LOGV2 system for observability:

#define MONGO_LOGV2_DEFAULT_COMPONENT ::mongo::logv2::LogComponent::kDefault
LOGV2(1234500, "Operation completed", "duration_ms"_attr=elapsed);

Features include severity levels, component filtering, and attribute-based structured output for easier parsing and analysis.

Concurrency Utilities

The src/mongo/util/concurrency/ directory provides thread-safe primitives:

  • ThreadPool: Manages worker threads for task execution
  • TicketHolder: Admission control mechanism limiting concurrent operations
  • SpinLock: Low-latency synchronization for short critical sections
  • Notification: Lock-free signaling between threads

Futures and Promises

MongoDB's Future/Promise implementation enables composable asynchronous operations:

  • Future<T>: Represents a value that will be available in the future
  • Promise<T>: Sets the value for an associated Future
  • SemiFuture<T>: Requires explicit executor for continuations
  • SharedSemiFuture<T>: Shareable across threads

Futures integrate seamlessly with cancellation tokens via future_util::withCancellation().

Memory and Performance

  • Allocator: Custom memory allocation with tcmalloc integration for performance
  • LRU Cache: Efficient caching with automatic eviction
  • String utilities: Fast string manipulation and formatting
  • Timer & Clock: High-resolution timing for performance monitoring