Overview
Relevant Files
README.mdsrc/mongo/db/README.mddocs/README.mdsrc/mongo/db/mongod_main.cppsrc/mongo/s/mongos_main.cpp
MongoDB is a document-oriented NoSQL database system that stores data in flexible JSON-like documents. This repository contains the complete source code for MongoDB Server, including both the database engine (mongod) and the sharding router (mongos).
Core Architecture
MongoDB's architecture is organized into several major subsystems:
Loading diagram...
Main Components
Database Server (mongod) - The core database process that handles data storage, queries, and replication. Located in src/mongo/db/, it manages:
- Command execution and query processing
- Data persistence through the storage engine
- Replica set coordination
- Shard server operations
Sharding Router (mongos) - Routes queries to appropriate shards in a distributed cluster. Located in src/mongo/s/, it provides:
- Query routing and aggregation
- Shard discovery and topology management
- Distributed transaction coordination
Key Subsystems
Query Execution (src/mongo/db/exec/) - Three execution engines process queries:
- SBE (Slot-Based Engine) - Modern, optimized execution engine
- Classic Engine - Traditional execution framework
- Document Sources - Aggregation pipeline execution
Storage (src/mongo/db/storage/) - Abstraction layer over WiredTiger with support for:
- CRUD operations on collections
- Index management
- Snapshot isolation and transactions
Replication (src/mongo/db/repl/) - Maintains data consistency across replica sets:
- Oplog (operation log) management
- Primary/secondary synchronization
- Failover coordination
Sharding (src/mongo/db/s/) - Distributes data across multiple servers:
- Chunk management and balancing
- Shard key routing
- Distributed transactions
BSON (src/mongo/bson/) - Binary JSON serialization format for data representation and network communication.
RPC & Transport (src/mongo/rpc/, src/mongo/transport/) - Network communication layer supporting multiple protocols and connection types.
Build System
The project uses Bazel for building. Configuration files are in bazel/ and BUILD.bazel files throughout the codebase. Tests are organized in jstests/ (JavaScript tests) and src/mongo/dbtests/ (C++ unit tests).
Development Resources
Comprehensive documentation is available in docs/ covering topics like building, testing, architecture patterns, and internal systems. The buildscripts/ directory contains automation for compilation, testing, and deployment.
Architecture & Core Components
Relevant Files
src/mongo/db/service_context.hsrc/mongo/db/operation_context.hsrc/mongo/db/client.hsrc/mongo/transport/session_workflow.hsrc/mongo/transport/session_workflow.cppsrc/mongo/transport/service_entry_point.hsrc/mongo/db/commands.h
MongoDB's server architecture is built on a hierarchical context model that manages state from the process level down to individual operations. Understanding these core components is essential for working with the codebase.
The Context Hierarchy
MongoDB uses a three-level context hierarchy to manage server state:
-
ServiceContext - The root singleton representing the entire server process (mongod or mongos). It owns all Clients and manages global resources like the storage engine, transport layer, and periodic task runners.
-
Service - A grouping of Clients that determines their ClusterRole (shard or router). A ServiceContext can own multiple Services, allowing a single process to act as both shard and router.
-
Client - Represents a logical connection to the database. Each Client can maintain at most one active OperationContext at a time and is associated with a transport Session.
-
OperationContext - Encapsulates the state of a single operation from dispatch until completion. It tracks transaction state, deadlines, locks, and recovery units.
Request Flow: From Network to Execution
Loading diagram...
The request flow follows these steps:
- Transport Layer accepts a client connection and creates a Session
- SessionWorkflow manages the session lifecycle, parsing incoming messages into work items
- ServiceEntryPoint receives the work item and creates an OperationContext
- Command Handler executes the command using the OperationContext
- ReplyBuilder constructs the response message
- SessionWorkflow sends the response back through the Session
Key Components
SessionWorkflow organizes network messages into MongoDB protocol requests and responses. It handles special message types like exhaust commands (multiple responses) and fire-and-forget commands (no response). Each SessionWorkflow manages one active work item at a time, ensuring sequential processing of requests from a single client.
ServiceEntryPoint is the entry point from the transport layer into command execution. It receives messages from SessionWorkflow, creates OperationContexts, and dispatches commands. Different implementations exist for shard-role and router-role servers.
OperationContext is heavily decorated (inheriting from Decorable), making it dynamically extensible. It manages operation-specific state including locks, recovery units, deadlines, and cancellation tokens. Each operation has a unique OperationId for tracking and killing.
Client represents a logical connection and acts as a factory for OperationContexts. It maintains a reference to the transport Session and enforces the invariant that only one OperationContext can be active at a time.
Decorable Pattern
All context classes inherit from Decorable, allowing subsystems to attach arbitrary data without modifying core classes. This enables loose coupling between components and makes the architecture highly extensible.
Concurrency and Synchronization
The architecture uses several synchronization primitives:
- RWMutex for reader-writer locks on shared resources
- SpinLock for protecting Client state
- Condition Variables for signaling state changes
- Atomic Words for lock-free counters and flags
- Cancellation Tokens for cooperative operation cancellation
This design allows multiple operations to proceed concurrently while maintaining consistency through fine-grained locking.
Query Engine & Optimization
Relevant Files
src/mongo/db/query/README.mdsrc/mongo/db/query/README_QO.mdsrc/mongo/db/query/README_explain.mdsrc/mongo/db/exec/README.mdsrc/mongo/db/query/plan_cache/README.md
MongoDB's query engine transforms user requests into optimized execution plans. The system handles find, aggregate, count, distinct, and write commands through a unified pipeline of parsing, optimization, planning, and execution.
Core Architecture
The query system follows a multi-stage pipeline:
-
Parsing & Canonicalization: User queries are parsed into a
CanonicalQueryobject containing the filter (as aMatchExpression), projection, and sort specifications. -
Logical Optimization: The
MatchExpressionis optimized through heuristic rewrites. For aggregation pipelines, stages are analyzed to determine which can be pushed down to the find layer. -
Query Planning: The plan enumerator generates candidate
QuerySolutiontrees representing different execution strategies. Each solution is a tree ofQuerySolutionNodes (e.g.,CollectionScanNode,IndexScanNode,FetchNode). -
Plan Selection: Candidate plans are ranked using either the classic multiplanner (trial execution) or the cost-based ranker (cardinality estimation & cost model). The winning plan is selected.
-
Execution: The
PlanExecutorexecutes the winning plan through either the classic engine (PlanStage tree) or the Slot-Based Execution (SBE) engine.
Plan Cache
The plan cache stores winning execution plans to avoid redundant planning for recurring query shapes. Two implementations exist:
- Classic Plan Cache: Per-collection, stores
SolutionCacheDatawith index tags that reconstructQuerySolutiontrees. - SBE Plan Cache: Process-wide, stores complete
sbe::PlanStagetrees with auto-parameterized expressions.
Cache entries transition from inactive (unproven) to active (proven efficient). Replanning occurs if a cached plan's work exceeds the eviction ratio threshold (10x by default).
Explain Command
The explain command provides query observability with three verbosity modes:
- queryPlanner: Shows the winning and rejected plans without execution.
- executionStats: Executes the winning plan and reports statistics.
- allPlansExecution: Executes all candidate plans and compares their performance.
Different PlanExplainer implementations handle classic, SBE, express, and pipeline executors, each providing stage-level execution statistics and plan metadata.
Execution Models
Classic Yielding: Plan stages use interrupt-style yielding. When a yield is needed, NEED_YIELD unwinds the call stack, allowing the executor to release and reacquire storage resources.
SBE Yielding: Stages perform cooperative yielding in-place without unwinding. Each stage must check for interrupts/yields regularly and avoid unbounded delays between checks.
Loading diagram...
Storage Engine & Data Management
Relevant Files
src/mongo/db/storage/storage_engine.hsrc/mongo/db/storage/recovery_unit.hsrc/mongo/db/storage/write_unit_of_work.hsrc/mongo/db/storage/record_store.hsrc/mongo/db/storage/kv/kv_engine.hsrc/mongo/db/storage/README.md
MongoDB's storage engine is a pluggable abstraction layer that manages how data is persisted to disk. The architecture supports multiple storage engine implementations through a well-defined API, with WiredTiger being the default production engine.
Core Architecture
The storage engine stack consists of several key layers:
- StorageEngine - Top-level interface defining the contract for any storage engine implementation
- KVEngine - Key-value engine abstraction for engines that use key-value storage (like WiredTiger)
- RecordStore - Manages collections as ordered sets of documents with unique RecordIds
- SortedDataInterface - Implements indexes as sorted key-value structures
- RecoveryUnit - Manages transaction semantics and snapshot isolation
Loading diagram...
Transactions & Recovery Units
A RecoveryUnit represents a storage transaction with snapshot isolation guarantees. Each operation context holds a recovery unit for its lifetime. Key properties:
- Snapshot Isolation - All reads see a consistent view of data at a specific point in time
- Atomicity - All writes commit together or roll back together
- Timestamps - Optional point-in-time reads using commit timestamps
- Lazy Initialization - Storage transactions start implicitly on first read/write
Write Units of Work
WriteUnitOfWork is an RAII wrapper that manages transactional writes:
{
WriteUnitOfWork wuow(opCtx);
recordStore->insertRecord(opCtx, data);
index->insert(opCtx, key, recordId);
wuow.commit(); // All writes become visible atomically
} // If commit() not called, transaction rolls back
Writes outside a WriteUnitOfWork are illegal. Nested WUOWs are supported, but only the top-level one commits to the storage engine.
Concurrency & Conflicts
MongoDB uses optimistic concurrency control. Write conflicts can occur when multiple operations modify the same data:
- WriteConflictException - Transient failure; operation should retry
- TemporarilyUnavailableException - Cache pressure; retry with backoff
- TransactionTooLargeForCacheException - Operation exceeds cache capacity
The writeConflictRetry helper transparently handles retries with exponential backoff.
Data Organization
Collections and indexes map to storage engine idents (unique identifiers):
- Collection idents:
collection-<uuid> - Index idents:
index-<uuid>
Each ident corresponds to a separate storage structure (e.g., WiredTiger table with .wt file).
Durability & Checkpoints
MongoDB ensures durability through two mechanisms:
- Checkpoints - Periodic snapshots of all data; frequency controlled by
storage.syncPeriodSecs - Journaling - Write-ahead log for replicated writes; enables recovery between checkpoints
On startup, the storage engine recovers to the last checkpoint, then replays journaled writes.
Oplog & Replication
The oplog (local.oplog.rs) is a capped collection storing all replicated writes. Key considerations:
- Oplog holes can occur with concurrent writes at different timestamps
oplogReadTimestamptracks the no-holes point to prevent secondaries from missing entries- Oplog entries are written in the same transaction as the write they log (on primary)
Replication & High Availability
Relevant Files
src/mongo/db/repl/README.mdsrc/mongo/db/repl/replication_coordinator.hsrc/mongo/db/repl/oplog.hsrc/mongo/db/repl/initial_sync/initial_syncer.hsrc/mongo/db/repl/bgsync.hsrc/mongo/db/repl/oplog_fetcher.hsrc/mongo/db/repl/topology_coordinator.h
MongoDB achieves high availability through replica sets, which consist of multiple nodes with one primary and multiple secondaries. The primary handles all writes, while secondaries continuously replicate data from the primary or other secondaries through a pull-based mechanism.
Core Architecture
Loading diagram...
Steady State Replication
Primary Operations: When a write occurs on the primary, it is applied to the database and an entry is written to the oplog (operation log), a capped collection in the local database. The oplog contains idempotent descriptions of all write operations, allowing secondaries to replay them exactly.
Secondary Synchronization: Secondaries pull oplog entries from their sync source (typically the primary or another secondary) via the OplogFetcher. This fetcher establishes an exhaust cursor to continuously receive batches of oplog entries without requiring additional getMore commands. Entries are buffered in an OplogBuffer before being applied.
Oplog Processing Pipeline
Secondaries process fetched oplog entries through three stages:
- OplogWriter: Persists fetched entries to the oplog collection and updates oplog visibility
- OplogApplierBatcher: Groups entries into batches respecting operation dependencies (e.g., commands must be applied alone)
- OplogApplier: Applies batches in parallel when possible, serializing operations on the same document
Sync Source Selection
The SyncSourceResolver and TopologyCoordinator work together to select an optimal sync source. Selection criteria include:
- Node must be ahead of the secondary (have newer oplog entries)
- Node's oplog must not be more than
maxSyncSourceLagSecsbehind the primary (default: 30 seconds) - Closest node by ping time is preferred (within
changeSyncSourceThresholdMillis, default: 5 ms) - Chaining can be disabled to force syncing from the primary
Cluster Communication
Nodes communicate through three primary mechanisms:
- Heartbeats: Every 2 seconds, nodes exchange
replSetHeartbeatcommands to check liveness and propagate configuration changes - Oplog Fetching: Secondaries continuously fetch oplog entries from their sync source
- Update Position Commands: Secondaries send
replSetUpdatePositionto inform the primary of their replication progress
Write Concern & Commit Point
Write concern specifies how many nodes must acknowledge a write before returning. The commit point is the OpTime such that all earlier entries have been replicated to a majority. The primary advances the commit point by checking the highest lastWritten or lastDurable OpTime on a majority of nodes. Secondaries learn the commit point via heartbeats and oplog fetcher metadata.
Initial Sync
When a new node joins the replica set, it performs initial sync to copy all data:
- Sync Source Selection: Choose a healthy replica set member
- Database Cloning: Clone all databases and collections from the sync source
- Oplog Fetching: Fetch oplog entries that occurred during cloning
- Oplog Application: Apply fetched entries to reach consistency with the primary
The InitialSyncer orchestrates this process, using AllDatabaseCloner and CollectionCloner to copy data, then applying buffered oplog entries.
Elections & Failover
When the primary becomes unavailable, secondaries initiate an election using the Raft-based protocol (PV1). Each node has a term counter; only one primary can be elected per term. Elections consider node priority, oplog recency, and voting eligibility. The ReplicationCoordinator manages election state and coordinates with the TopologyCoordinator for topology decisions.
Read Concern Levels
- Local: Reads the most recent data on the node
- Majority: Reads from the stable snapshot (committed to a majority)
- Linearizable: Blocks to ensure the node remains primary, preventing stale reads
- Snapshot: Reads from a specific point-in-time snapshot
- Available: Like local, but faster on secondaries (may return orphan data in sharded clusters)
Sharding & Distributed Execution
Relevant Files
src/mongo/s/mongos_main.cppsrc/mongo/s/query/README.mdsrc/mongo/s/query/planner/README.mdsrc/mongo/s/write_ops/batch_write_exec.hsrc/mongo/s/transaction_router.h
MongoDB sharding distributes data across multiple servers (shards) to horizontally scale the database. Each shard owns a subset of data based on the shard key, enabling parallel processing and improved performance. The router (mongos) coordinates all distributed operations.
Architecture Overview
Loading diagram...
Query Routing & Targeting
When a query arrives at mongos, it must determine which shards own the relevant data. The router consults the routing table, which maps chunks (contiguous ranges of shard keys) to shard IDs. If the query includes the shard key, mongos targets only the shards owning those key ranges. Without a shard key, the query broadcasts to all shards.
The targeting process extracts the shard key from the query predicate and uses the chunk manager to identify intersecting chunks. For example, if a collection is sharded on location and the query filters by location: "5th Avenue", only the shard owning that location range is targeted.
Distributed Query Execution
Queries on sharded collections follow a two-phase execution model:
- Shards Part: Filters and operations that can run in parallel on each shard (e.g.,
$match, partial$group) - Merge Part: Global aggregation on a single node (mongos or a designated shard)
The router splits the query pipeline, dispatches the shards part to targeted shards with consistent versioning information, collects partial results via cursors, then executes the merge part globally.
Write Operations & Batch Execution
Write operations use BatchWriteExec to target documents to appropriate shards based on the shard key. For multi-shard writes, the executor:
- Targets each document to its owning shard
- Sends batches to multiple shards in parallel
- Handles retries on stale routing information
- Aggregates responses and reports errors
Distributed Transactions
The TransactionRouter manages multi-shard transactions. It tracks which shards participate in the transaction and coordinates commit:
- Single-shard transactions: Commit directly to the participant
- Multi-shard transactions: Use two-phase commit via a coordinator
- Stale errors: Refresh routing table and retry if safe
Versioning & Consistency
MongoDB uses placement versioning to ensure routing correctness. Each request includes the router's known placement version. Shards validate this version and reject requests if their version is newer, signaling a stale routing table. The router then refreshes from the config server and retries, maintaining consistency across distributed operations.
Transactions & Session Management
Relevant Files
src/mongo/db/transaction/transaction_participant.hsrc/mongo/db/session/session_catalog.hsrc/mongo/db/session/logical_session_id.hsrc/mongo/db/session/session.h
MongoDB provides two mechanisms for ensuring data consistency across multiple operations: retryable writes for single operations and multi-document transactions for coordinated changes across multiple documents and collections.
Logical Sessions
Every client connection establishes a logical session identified by a LogicalSessionId (UUID). Sessions enable:
- Retryable writes: Automatic retry of write operations on network failures
- Multi-document transactions: ACID guarantees across multiple operations
- Session state tracking: Persistent record of executed operations in
config.transactions
The SessionCatalog maintains runtime state for all active sessions on a server instance. It uses a parent-child session model where internal transactions for retryable writes create child sessions linked to parent sessions.
Retryable Writes
Retryable writes allow drivers to safely retry non-idempotent operations. Each write operation receives:
- Transaction Number (
txnNumber): Identifies the logical write sequence - Statement ID (
stmtId): Unique identifier for each operation within a batch - Oplog Entry: Records the operation with
lsid,txnNumber, andstmtIdfor reconstruction
When a retry arrives, the server checks if the stmtId was already executed. If so, it returns the cached result from the oplog instead of re-executing.
Multi-Document Transactions
Transactions follow a strict state machine managed by TransactionParticipant:
None → InProgress → Prepared → Committed
↘ AbortedWithoutPrepare
↘ AbortedWithPrepare
Key states:
- InProgress: Operations are collected and buffered
- Prepared: Two-phase commit phase 1; transaction is durable but not yet applied
- Committed: Changes are applied and visible
- Aborted: Transaction rolled back; no changes persisted
Transaction Lifecycle
- Begin:
beginOrContinue()starts a transaction withautocommit: false - Execute: Operations accumulate in
TransactionOperations - Prepare (optional):
prepareTransactiondurably records intent; drops replication state locks - Commit:
commitTransactionapplies changes; for prepared transactions, requires acommitTimestamp - Abort:
abortTransactionrolls back all changes
Resource Management
Transactions stash and unstash resources (TxnResources) between network operations:
- Stash: Saves locks, recovery unit, and read concern when yielding
- Unstash: Restores resources when resuming; validates consistency
This enables long-running transactions without blocking other operations.
Internal Transactions for Retryable Writes
When a retryable write targets multiple shards, MongoDB creates an internal transaction (child session) to coordinate the write atomically. These transactions:
- Share the parent session's
txnNumber - Execute with
autocommit: falseinternally - Automatically commit after all writes complete
- Enable retry semantics across distributed systems
Loading diagram...
Session Checkout and Concurrency
The SessionCatalog enforces single-threaded access per logical session:
- Only one
OperationContextcan hold a session at a time - Concurrent requests block until the session becomes available
- Sessions are reaped when idle to free resources
- Kill tokens enable safe session termination
Aggregation Pipeline
Relevant Files
src/mongo/db/pipeline/README.mdsrc/mongo/db/pipeline/pipeline.hsrc/mongo/db/pipeline/document_source.hsrc/mongo/db/pipeline/expression.hsrc/mongo/db/pipeline/optimization/optimize.cpp
The aggregation pipeline is MongoDB's framework for transforming and analyzing data through a series of stages. Each stage processes documents sequentially, passing results to the next stage.
Core Architecture
The pipeline is built on two fundamental abstractions:
DocumentSource - Represents a single stage in the pipeline (e.g., $match, $group, $project). Each DocumentSource subclass implements stage-specific logic for filtering, transforming, or aggregating documents. Stages are chained together in a container and executed sequentially.
Expression - A stateless component that evaluates to a value without mutating inputs. Expressions are used within stages to compute field values, apply operators, and reference document fields or variables. Examples include $add, $sum, $cond, and field path expressions like "$inventory.total".
Key Pipeline Stages
Common stages include:
$match- Filters documents using query predicates$project- Reshapes documents by including/excluding fields or computing new ones$group- Groups documents by a key and applies accumulators$sort- Orders documents by specified fields$limit/$skip- Restricts result set size and position$lookup- Performs left outer joins with other collections$unwind- Deconstructs array fields into separate documents$facet- Processes multiple sub-pipelines in parallel$merge/$out- Writes results to a collection
Pipeline Optimization
After parsing, pipelines undergo two-phase optimization via pipeline_optimization::optimizePipeline():
Inter-stage Optimization - Reorders and combines stages to improve efficiency. Examples include pushing $match filters earlier to reduce documents processed by expensive stages, coalescing $sort with $limit, and combining consecutive $match stages.
Stage-specific Optimization - Optimizes individual stages independently. This includes removing no-op stages (e.g., empty $match), constant folding (evaluating expressions with constant values), and stage-specific rewrites.
Loading diagram...
Dependency Tracking
The optimizer uses dependency analysis to determine which stages can be safely reordered. Dependencies include:
- Field dependencies - Specific document fields required by a stage
- Computed fields - New fields generated by stages like
$addFields - Variable references - Dependencies on scoped variables (
$$CURRENT, user-defined) - Metadata - Non-field contextual information like text search scores
Stages with dependencies on earlier stages must remain sequentially after them, while independent stages can be pushed down for optimization.
Execution Model
Pipelines execute through a pull-based model where each stage requests documents from its source. The pipeline is converted to an execution engine representation (agg::Stage) for efficient processing. Stages can be distributed across shards in sharded clusters, with partial aggregation on shards and final merging on the router.
Security & Authentication
Relevant Files
src/mongo/db/auth/README.mdsrc/mongo/db/auth/authorization_manager.hsrc/mongo/db/auth/sasl_commands.cppsrc/mongo/crypto/README.JWT.mdsrc/mongo/crypto/jws_validated_token.h
MongoDB implements a comprehensive security model combining authentication, authorization, and encryption. The system validates client identity, grants appropriate permissions, and protects data in transit and at rest.
Authentication
Authentication verifies client identity through multiple mechanisms:
SASL Mechanisms handle most client authentication. Supported mechanisms include:
- SCRAM-SHA-256 (preferred): Salted Challenge Response with SHA-256 hashing, provides mutual authentication without transmitting passwords
- SCRAM-SHA-1: Legacy variant using SHA-1
- PLAIN: Simple username/password exchange, typically used with LDAP
- GSSAPI: Kerberos-based authentication for enterprise environments
- MONGODB-X509: Certificate-based authentication using X.509 certificates from TLS handshakes
Speculative Authentication reduces connection overhead by embedding authentication in the initial hello command, potentially completing authentication in a single round trip.
Cluster Authentication uses either X.509 certificates or keyfile-based SCRAM-SHA-256 for server-to-server communication. X.509 supports zero-downtime certificate rotation through override mechanisms.
Authorization
Authorization determines what authenticated users can do:
Users and Roles form a hierarchical privilege model. Users are assigned roles, which grant privileges on specific resources. Roles can inherit from other roles, creating a privilege tree.
Privileges combine a ResourcePattern (what can be accessed) with ActionTypes (what operations are allowed). Resource patterns support fine-grained scoping:
- Cluster-wide operations (
cluster: true) - Database-level access (
db: 'name') - Collection-specific access (
collection: 'name') - System collections and time-series buckets
Authorization Caching uses a ReadThroughCache for performance. User information is cached and invalidated when user management commands execute. On sharded clusters, mongos instances periodically check a cache generation counter to stay synchronized.
Authentication Restrictions limit where users can connect from using CIDR notation for IP ranges and server addresses.
Cryptography & JWT
JSON Web Tokens (JWT) provide secure claims exchange for OIDC and multitenancy:
Loading diagram...
JWSValidatedToken validates signed tokens through:
- Parsing the header to extract the Key ID
- Retrieving the corresponding public key from
JWKManager - Verifying the signature using platform-specific crypto (OpenSSL on Linux)
- Validating expiration (
exp) and not-before (nbf) claims - Parsing the token body as a
JWTobject
Callers must independently validate issuer (iss) and audience (aud) claims.
Authorization Flow
When a client executes a command:
AuthorizationSessionchecks if the user has required privilegesAuthorizationManagerlooks up user information (cached viaReadThroughCache)- Privileges are computed from the user's roles and their subordinate roles
- If authorization succeeds, the command executes; otherwise, an error is returned
Authorization Backends support multiple sources:
- Local: Users stored in
admin.system.userscollection - LDAP: External LDAP server for credential validation
- X.509: Certificate-based authorization from external databases
Security Considerations
- Passwords are never stored in plaintext; SCRAM mechanisms use salted hashes
- Cluster authentication uses strong mechanisms (X.509 or SCRAM-SHA-256)
- Localhost auth bypass allows initial setup but is disabled once users exist
- Multitenancy isolates users and roles per tenant with prefixed collections
- JWT validation is platform-specific; signature validation only available on Linux
Utilities & Infrastructure
Relevant Files
src/mongo/util/README.mdsrc/mongo/bson/READMEsrc/mongo/executor/README.mdsrc/mongo/logv2/log.hsrc/mongo/util/cancellation.hsrc/mongo/util/future.hsrc/mongo/util/concurrency/
MongoDB's infrastructure layer provides essential utilities for asynchronous execution, data serialization, and system-level operations. These components form the foundation upon which higher-level features are built.
BSON: Binary JSON Format
BSON (Binary JSON) is MongoDB's binary storage format, inspired by JSON but with additional types like Date and ObjectId. The C++ implementation in src/mongo/bson/ provides efficient serialization and deserialization. Key components include:
- BSONObj & BSONElement: Core types for representing BSON documents and individual fields
- BSONObjBuilder: Fluent API for constructing BSON documents
- BSONColumn: Columnar compression format for efficient storage of repeated values
- Validation: BSON validation and integrity checking utilities
Executors: Asynchronous Task Scheduling
Executors are the backbone of MongoDB's asynchronous execution model. They schedule and execute work in FIFO order without blocking the caller.
Core Types:
- OutOfLineExecutor: Base class declaring
schedule(Task task)for non-blocking task submission - TaskExecutor: Extends OutOfLineExecutor with event management and remote command scheduling
- ThreadPoolTaskExecutor: Implements TaskExecutor using a thread pool
- PinnedConnectionTaskExecutor: Ensures all operations run over the same transport connection
- TaskExecutorPool: Distributes work across multiple executors
Cancellation: Hierarchical Cancellation Tokens
The cancellation system enables clean shutdown and resource cleanup through CancellationSource and CancellationToken pairs:
- CancellationSource: Manages cancellation state; call
cancel()to trigger cancellation - CancellationToken: Obtained from a source via
token(); checkisCanceled()or chain continuations viaonCancel() - Hierarchies: Child sources inherit parent cancellation, enabling cascading cancellation of related operations
Example: When a parent operation is canceled, all child operations automatically cancel without manual tracking.
Logging: Structured Logging with LOGV2
MongoDB uses structured logging via the LOGV2 system for observability:
#define MONGO_LOGV2_DEFAULT_COMPONENT ::mongo::logv2::LogComponent::kDefault
LOGV2(1234500, "Operation completed", "duration_ms"_attr=elapsed);
Features include severity levels, component filtering, and attribute-based structured output for easier parsing and analysis.
Concurrency Utilities
The src/mongo/util/concurrency/ directory provides thread-safe primitives:
- ThreadPool: Manages worker threads for task execution
- TicketHolder: Admission control mechanism limiting concurrent operations
- SpinLock: Low-latency synchronization for short critical sections
- Notification: Lock-free signaling between threads
Futures and Promises
MongoDB's Future/Promise implementation enables composable asynchronous operations:
- Future<T>: Represents a value that will be available in the future
- Promise<T>: Sets the value for an associated Future
- SemiFuture<T>: Requires explicit executor for continuations
- SharedSemiFuture<T>: Shareable across threads
Futures integrate seamlessly with cancellation tokens via future_util::withCancellation().
Memory and Performance
- Allocator: Custom memory allocation with tcmalloc integration for performance
- LRU Cache: Efficient caching with automatic eviction
- String utilities: Fast string manipulation and formatting
- Timer & Clock: High-resolution timing for performance monitoring