Install

opensearch-project/OpenSearch

OpenSearch Wiki

Last updated on Dec 18, 2025 (Commit: fc01431)

Overview

Relevant Files
  • README.md
  • server/src/main/java/org/opensearch/node/Node.java
  • server/src/main/java/org/opensearch/bootstrap/Bootstrap.java
  • server/src/main/java/org/opensearch/bootstrap/OpenSearch.java

OpenSearch is an open-source, enterprise-grade search and observability suite designed to bring order to unstructured data at scale. It is a distributed search and analytics engine built on top of Apache Lucene, providing powerful full-text search, real-time analytics, and observability capabilities for large-scale data workloads.

Core Purpose

OpenSearch enables organizations to index, search, and analyze massive volumes of structured and unstructured data in real-time. It powers use cases including log and event data analysis, application performance monitoring, security analytics, and full-text search across diverse data sources.

Architecture Overview

Loading diagram...

Startup Process

The system initializes through a well-defined sequence:

  1. Entry Point (OpenSearch.java): The main entry point parses command-line arguments and configures logging.

  2. Bootstrap Phase (Bootstrap.java): Performs critical initialization including:

    • Loading secure settings and keystores
    • Validating the Java environment and jar dependencies
    • Installing security managers and FIPS compliance checks
    • Initializing system probes (JVM, OS, filesystem)
  3. Node Creation (Node.java): Constructs the core node instance with:

    • Plugin loading and initialization
    • Service instantiation (cluster, search, indexing, transport)
    • Thread pool configuration
    • Module binding (action, cluster, search, script)
  4. Node Startup: Activates all services in dependency order, starting discovery, cluster coordination, and accepting incoming requests.

Key Components

  • Cluster Service: Manages cluster state, node discovery, and coordination
  • Search Service: Handles query execution and result aggregation
  • Indices Service: Manages index lifecycle, shards, and segment storage
  • Transport Service: Enables inter-node communication and request routing
  • Action Module: Registers and dispatches all cluster actions
  • Plugin System: Extensible architecture for custom functionality
  • Thread Pool: Manages concurrent task execution across multiple executor types

Module Organization

The codebase is organized into logical modules under server/src/main/java/org/opensearch/:

  • action/ - Request handlers and transport actions
  • cluster/ - Cluster coordination and state management
  • index/ - Index operations and shard management
  • search/ - Query parsing, execution, and aggregations
  • transport/ - Network communication layer
  • plugins/ - Plugin interfaces and lifecycle management
  • node/ - Core node lifecycle and service orchestration

Architecture

Relevant Files
  • server/src/main/java/org/opensearch/cluster/ClusterModule.java
  • server/src/main/java/org/opensearch/index/IndexService.java
  • server/src/main/java/org/opensearch/indices/IndicesService.java
  • server/src/main/java/org/opensearch/action/ActionModule.java

OpenSearch follows a modular, layered architecture designed for distributed search and analytics. The system is organized into distinct layers that handle cluster coordination, index management, request processing, and data storage.

Core Layers

Cluster Coordination Layer manages distributed consensus and state propagation. The ClusterModule configures cluster-wide services including AllocationService, which orchestrates shard placement decisions. Allocation deciders (disk threshold, awareness, concurrent recovery, etc.) enforce constraints on where shards can be placed. The BalancedShardsAllocator makes final placement decisions based on cluster balance metrics.

Index Management Layer operates through IndicesService and IndexService. IndicesService manages the lifecycle of all indices on a node, handling index creation, deletion, and shard allocation. IndexService manages individual index operations including shard creation, field data caching, and query execution context. Both services coordinate with the cluster state to maintain consistency.

Request Processing Layer routes incoming requests through REST handlers and transport actions. The ActionModule registers all available actions and their corresponding transport implementations. REST handlers parse HTTP requests and delegate to transport actions via the NodeClient. Transport actions execute on the local node or coordinate across the cluster.

Storage Layer manages Lucene indices, translog operations, and segment storage. Each shard maintains an Engine (typically InternalEngine) that handles indexing, searching, and recovery operations.

Request Flow

Loading diagram...

Shard Allocation Pipeline

When cluster state changes, AllocationService executes a multi-phase allocation process:

  1. Existing Shards: GatewayAllocator recovers primary shards from disk
  2. Unassigned Shards: Allocation deciders filter candidate nodes
  3. Balanced Allocation: ShardsAllocator assigns shards to minimize imbalance
  4. Rebalancing: Shards relocate if cluster balance improves

Each decision passes through AllocationDeciders, a composite that chains individual deciders. Early rejection (NO decision) short-circuits evaluation for performance.

Cluster State Management

ClusterState is an immutable snapshot containing:

  • Metadata: Index settings, mappings, templates, aliases
  • RoutingTable: Shard assignments to nodes
  • DiscoveryNodes: Active cluster members
  • Customs: Plugin-specific state (snapshots, ingest pipelines, etc.)

State updates are published through a quorum-based consensus protocol. All nodes apply updates in the same order, ensuring consistency.

Module Composition

The architecture uses dependency injection (Guice) to wire components. ClusterModule, ActionModule, and IndicesModule configure their respective services as eager singletons. This enables loose coupling and testability while maintaining strict initialization order.

Search & Query Execution

Relevant Files
  • server/src/main/java/org/opensearch/search/SearchService.java
  • server/src/main/java/org/opensearch/search/SearchModule.java
  • server/src/main/java/org/opensearch/action/search/SearchTransportService.java
  • server/src/main/java/org/opensearch/search/query/QueryPhase.java
  • server/src/main/java/org/opensearch/search/fetch/FetchPhase.java

Overview

Search and query execution in OpenSearch follows a distributed, multi-phase architecture. When a search request arrives, it is decomposed into phases that execute across shards and then aggregate results. The main components are:

  • SearchService: Core orchestrator managing shard-level search execution
  • SearchTransportService: Handles inter-node communication for search operations
  • SearchModule: Registers query types, aggregations, and search plugins at startup

Search Phases

OpenSearch executes searches through distinct phases:

  1. DFS Phase (optional): Distributed frequency search collects term statistics across shards for accurate scoring
  2. Query Phase: Executes the query on each shard, returning top-scoring document IDs and scores
  3. Fetch Phase: Retrieves full document content for the top results identified in the query phase

For single-shard searches, query and fetch can be combined into a single phase for efficiency.

Query Phase Execution

The query phase is initiated by SearchService.executeQueryPhase(), which:

  • Retrieves the target shard and rewrites the request
  • Checks if the shard can match documents using canMatch() to short-circuit empty results
  • Forks execution to the search thread pool
  • Calls QueryPhase.execute() to run the actual query against the Lucene index
// Query phase can use cached results if available
private void loadOrExecuteQueryPhase(ShardSearchRequest request, SearchContext context) {
    boolean canCache = indicesService.canCache(request, context);
    if (canCache) {
        indicesService.loadIntoContext(request, context, queryPhase);
    } else {
        queryPhase.execute(context);
    }
}

The query phase processes aggregations, applies scoring, and handles rescoring if configured.

Fetch Phase Execution

After query results are reduced at the coordinator level, the fetch phase retrieves actual documents:

  • SearchService.executeFetchPhase() loads the stored fields and document content
  • FetchPhase.execute() iterates through document IDs and applies fetch sub-phases
  • Sub-phases include highlighting, field retrieval, and script fields
private QueryFetchSearchResult executeFetchPhase(ReaderContext reader, SearchContext context, long afterQueryTime) {
    shortcutDocIdsToLoad(context);
    fetchPhase.execute(context);
    return new QueryFetchSearchResult(context.queryResult(), context.fetchResult());
}

Transport Layer

SearchTransportService registers request handlers for each phase:

  • QUERY_ACTION_NAME: Query phase execution on shards
  • FETCH_ID_ACTION_NAME: Fetch phase execution on shards
  • DFS_ACTION_NAME: DFS phase for term statistics
  • FREE_CONTEXT_ACTION_NAME: Cleanup of reader contexts

These handlers are registered via registerRequestHandler() and delegate to corresponding SearchService methods.

Reader Context Management

Search contexts are maintained in activeReaders map to support pagination and point-in-time (PIT) searches. Contexts are:

  • Created per shard when a search begins
  • Kept alive via keep-alive timeouts (configurable via search.keep_alive)
  • Freed explicitly or when they expire
  • Reused across multiple fetch requests for the same search
Loading diagram...

Caching and Optimization

OpenSearch caches query results at the shard level when:

  • The same query is executed multiple times
  • The index has not been modified
  • Cache settings permit caching

The loadOrExecuteQueryPhase() method checks cache eligibility before executing queries, reducing redundant computation.

Indexing & Storage

Relevant Files
  • server/src/main/java/org/opensearch/index/shard/IndexShard.java
  • server/src/main/java/org/opensearch/index/engine/InternalEngine.java
  • server/src/main/java/org/opensearch/index/IndexModule.java
  • server/src/main/java/org/opensearch/index/store/Store.java
  • server/src/main/java/org/opensearch/index/translog/Translog.java

OpenSearch uses a multi-layered architecture for indexing and storage, combining Lucene's segment-based indexing with a transaction log for durability. Each shard maintains its own engine, store, and translog to ensure data consistency and recovery capabilities.

Core Components

IndexShard is the primary entry point for all indexing operations. It coordinates document indexing, manages the engine lifecycle, and handles shard-level operations like flushing and recovery. The shard delegates actual indexing to the Engine, which manages the Lucene index writer and maintains version tracking.

InternalEngine is the default engine implementation that orchestrates document indexing into Lucene. It uses a DocumentIndexWriter to add, update, or delete documents. The engine maintains a version map to track document versions and sequence numbers, enabling conflict detection and optimistic concurrency control.

Store provides low-level access to Lucene's Directory abstraction, which represents the file system interface for reading and writing index files. The Store manages metadata snapshots of committed segments, handles file checksums, and supports recovery operations. It uses reference counting to ensure safe concurrent access.

Indexing Pipeline

Document Request
    ↓
IndexShard.index()
    ↓
Engine.prepareIndex() [Parse & Map]
    ↓
Engine.index() [Version Check]
    ↓
indexIntoLucene() [Add/Update in Lucene]
    ↓
Translog.add() [Durability]
    ↓
IndexResult [Return to Client]

When a document arrives, the shard prepares an Engine.Index operation that includes parsing, mapping, and version resolution. The engine then executes the indexing strategy—either optimized append-only for new documents or update-based for existing ones. Documents are written to Lucene's index writer and simultaneously logged to the translog for durability.

Transaction Log (Translog)

The Translog records all non-committed index operations in a durable manner. Each shard has one translog instance per engine. Operations are written sequentially to a current translog file, and the translog is synced to disk based on the durability setting: ASYNC (time-based) or REQUEST (per operation).

Translog files are organized by generation ID. When a file reaches capacity or during periodic maintenance, a new generation is rolled. Old generations are retained based on the TranslogDeletionPolicy, which ensures operations needed for recovery or replication are preserved. The translog UUID is stored with each Lucene commit to prevent accidental recovery from mismatched logs.

Storage Layer

The Store wraps Lucene's Directory and provides:

  • MetadataSnapshot: A snapshot of committed segment files with checksums and metadata
  • Reference Counting: Prevents segment deletion while readers hold references (for point-in-time queries)
  • Corruption Detection: Marks and detects corrupted indices
  • File Verification: Validates checksums during recovery and replication

The Store uses a StoreDirectory wrapper that caches directory size information and logs delete operations for debugging.

Durability & Recovery

OpenSearch ensures durability through:

  1. Lucene Commits: Periodic flushes write segments to disk and create a commit point
  2. Translog Syncing: Operations are synced to disk based on durability settings
  3. Sequence Numbers: Track operation order for consistent recovery
  4. Local Checkpoint Tracker: Maintains the highest sequence number of persisted operations

During recovery, the engine replays translog operations up to the global checkpoint, restoring any uncommitted writes. Segment replication can skip translog replay by copying committed segments directly from the primary.

Key Settings

  • index.translog.durability: Controls translog sync frequency (ASYNC or REQUEST)
  • index.store.type: Selects the directory implementation (niofs, mmapfs, etc.)
  • index.store.preload: Pre-loads specific file extensions into memory
  • index.store.hybrid.nio.extensions: Files to load with NIO instead of mmap

Cluster Coordination & Discovery

Relevant Files
  • server/src/main/java/org/opensearch/cluster/coordination/Coordinator.java
  • server/src/main/java/org/opensearch/discovery/DiscoveryModule.java
  • server/src/main/java/org/opensearch/gateway/GatewayService.java
  • server/src/main/java/org/opensearch/cluster/coordination/CoordinationState.java
  • server/src/main/java/org/opensearch/cluster/coordination/JoinHelper.java
  • server/src/main/java/org/opensearch/discovery/PeerFinder.java
  • server/src/main/java/org/opensearch/cluster/coordination/LeaderChecker.java
  • server/src/main/java/org/opensearch/cluster/coordination/FollowersChecker.java

OpenSearch uses a Raft-inspired consensus algorithm called Zen2 to coordinate cluster state and elect a cluster manager. The coordination system ensures all nodes agree on cluster state changes and maintains a single leader.

Core Architecture

The Coordinator class is the central orchestrator that manages cluster formation, leader election, and state publication. It operates in three modes:

  • CANDIDATE - Node is seeking election or waiting for a leader
  • LEADER - Node has won election and publishes cluster state
  • FOLLOWER - Node has joined a leader and applies published state

The DiscoveryModule bootstraps the coordination system by instantiating the Coordinator with seed hosts providers, election strategies, and join validators. For single-node clusters, it uses simplified discovery logic.

Discovery & Peer Finding

The PeerFinder discovers other cluster-manager-eligible nodes by:

  1. Resolving configured seed hosts (from settings or files)
  2. Probing addresses to establish connections
  3. Exchanging peer information via PeersRequest/PeersResponse
  4. Tracking the current leader and found peers

When a candidate node discovers enough peers to form a quorum, it triggers the election scheduler.

Leader Election

Election follows a two-phase process:

Pre-voting Phase: The PreVoteCollector gathers pre-votes from peers without incrementing the term. This prevents unnecessary term bumps when a node cannot win.

Voting Phase: Once pre-voting quorum is reached, the node sends StartJoinRequest to all discovered nodes, incrementing the term. Nodes respond with Join votes. The CoordinationState tracks join votes and determines if an election quorum is achieved using the ElectionStrategy.

The JoinHelper manages join requests and responses. When a candidate accumulates enough votes, it transitions to LEADER mode and begins publishing cluster state.

State Publication & Consistency

Once elected, the leader publishes cluster state updates via PublicationTransportHandler. The CoordinationState ensures:

  • Only the leader can publish new state
  • State updates include term and version numbers
  • Followers acknowledge publication before applying state
  • A publish quorum (majority of voting nodes) must acknowledge before committing

Health Monitoring

The LeaderChecker allows followers to detect leader failures by periodically sending health checks. If checks fail, the follower becomes a candidate and restarts election.

The FollowersChecker allows the leader to detect unhealthy followers and remove them from the cluster.

Gateway & State Recovery

The GatewayService manages cluster state recovery on startup. When the cluster manager is elected, it waits for a configurable number of data nodes to join before removing the STATE_NOT_RECOVERED_BLOCK, allowing normal operations to begin.

// Example: Coordinator mode transitions
if (mode == Mode.CANDIDATE) {
    startElection(); // Send StartJoinRequest to peers
} else if (mode == Mode.LEADER) {
    publishClusterState(newState); // Publish to followers
}

Key Concepts

  • Term - Monotonically increasing epoch number; higher term always wins
  • Voting Configuration - Set of nodes whose votes count toward quorum
  • Last Accepted State - Latest state persisted locally
  • Committed State - State acknowledged by a publish quorum
Loading diagram...

REST API & Actions

Relevant Files
  • server/src/main/java/org/opensearch/rest/RestController.java
  • server/src/main/java/org/opensearch/rest/BaseRestHandler.java
  • server/src/main/java/org/opensearch/rest/RestHandler.java
  • server/src/main/java/org/opensearch/action/ActionType.java
  • server/src/main/java/org/opensearch/action/support/TransportAction.java

OpenSearch exposes functionality through two complementary layers: the REST API for HTTP clients and the Action framework for internal node-to-node communication. These layers work together to handle user requests and coordinate cluster operations.

REST API Layer

The REST API is the primary interface for external clients. The RestController acts as the central dispatcher, routing HTTP requests to appropriate handlers based on method and path.

Request Flow:

  1. HTTP request arrives at RestController.dispatchRequest()
  2. Controller matches the request path and method against registered handlers using a PathTrie data structure
  3. Matched RestHandler processes the request and sends a response through the RestChannel

Handler Registration:

Handlers are registered via RestController.registerHandler() with route definitions:

public void registerHandler(final RestHandler restHandler) {
    restHandler.routes().forEach(route -> 
        registerHandler(route.getMethod(), route.getPath(), restHandler));
    restHandler.deprecatedRoutes().forEach(route -> 
        registerAsDeprecatedHandler(route.getMethod(), route.getPath(), 
            restHandler, route.getDeprecationMessage()));
}

Routes support three types: active routes, deprecated routes, and replaced routes (deprecated with a new replacement).

BaseRestHandler:

Most REST handlers extend BaseRestHandler, which provides:

  • Parameter validation and consumption tracking
  • Automatic usage counting for monitoring
  • Request preparation via prepareRequest() returning a RestChannelConsumer
  • Response parameter filtering to exclude format-control parameters from strict validation

Action Framework

Actions represent internal operations that can be executed via transport (node-to-node) or REST layers. Each action is defined by an ActionType with a unique name and response reader.

ActionType Structure:

public class ActionType<Response extends ActionResponse> {
    private final String name;
    private final Writeable.Reader<Response> responseReader;
}

Execution Pipeline:

  1. NodeClient.execute(ActionType, Request, ActionListener) initiates action execution
  2. TransportAction validates the request and registers a task
  3. doExecute() implements the actual operation logic
  4. Response or exception is delivered via ActionListener callback

Transport Registration:

Actions are registered in ActionModule and mapped to TransportAction implementations. The transport layer handles serialization, network communication, and thread pool execution.

Request-Response Cycle

Loading diagram...

Circuit Breaker Integration

The REST layer integrates with circuit breakers to prevent resource exhaustion. Handlers declare via canTripCircuitBreaker() whether they should count against in-flight request limits. The controller tracks request content length and releases resources when responses are sent.

Error Handling

The controller provides standardized error responses:

  • 400 Bad Request: No matching handler or invalid parameters
  • 405 Method Not Allowed: Valid path but unsupported HTTP method
  • 406 Not Acceptable: Invalid Content-Type for streaming handlers

Errors include detailed messages with suggestions for similar parameter names using Levenshtein distance matching.

Ingest Pipeline & Processing

Relevant Files
  • server/src/main/java/org/opensearch/ingest/IngestService.java
  • server/src/main/java/org/opensearch/ingest/Pipeline.java
  • server/src/main/java/org/opensearch/ingest/CompoundProcessor.java
  • server/src/main/java/org/opensearch/ingest/Processor.java

The ingest pipeline is OpenSearch's data transformation framework that processes documents before indexing. It enables enrichment, filtering, and modification of data through a composable chain of processors.

Architecture Overview

Loading diagram...

Core Components

Pipeline is the top-level container that holds a sequence of processors and optional on-failure handlers. Each pipeline has a unique ID and tracks execution metrics (latency, success/failure counts). When a document is processed, the pipeline delegates to its CompoundProcessor and measures the total time spent in ingest.

CompoundProcessor orchestrates sequential execution of multiple processors. It maintains two lists: regular processors and on-failure processors. If any processor throws an exception and ignoreFailure is false, the compound processor invokes the on-failure chain. On-failure processors receive the original document plus metadata about the error (message, processor type, tag).

Processor is the base interface for all transformation logic. Implementations can override either the synchronous execute(IngestDocument) method or the asynchronous execute(IngestDocument, BiConsumer) method for operations requiring async calls (e.g., HTTP enrichment). Processors return null to drop a document or the modified document to continue.

IngestService manages the lifecycle of all pipelines. It stores pipeline definitions in cluster state, validates processor configurations, and coordinates execution across nodes. It also handles pipeline resolution (default, final, and system pipelines) and batch processing for bulk operations.

Execution Flow

  1. Resolution: IngestService resolves which pipeline(s) to apply based on request parameters, index settings, and cluster defaults.
  2. Sequential Processing: CompoundProcessor executes each processor in order, passing the document through the chain.
  3. Error Handling: If a processor fails and on-failure handlers exist, they execute with failure metadata injected into the document.
  4. Metrics: Pipeline tracks execution time and failure counts for monitoring and debugging.
  5. Batch Optimization: For bulk operations, processors can implement batchExecute() to process multiple documents efficiently.

Key Features

  • Composability: Pipelines can invoke other pipelines via the pipeline processor, enabling reusable pipeline templates.
  • Conditional Processing: Processors support conditional execution based on document state.
  • Error Recovery: On-failure handlers allow graceful degradation or alternative processing paths.
  • Metrics & Observability: Per-pipeline and per-processor metrics track latency, throughput, and failure rates.
  • Batch Processing: Bulk indexing benefits from batch-aware processors that optimize for multiple documents.
  • Processor Limits: Configurable maximum processor count per pipeline prevents runaway complexity.

Plugin System & Extensions

Relevant Files
  • server/src/main/java/org/opensearch/plugins/Plugin.java
  • server/src/main/java/org/opensearch/plugins/SearchPlugin.java
  • server/src/main/java/org/opensearch/plugins/ActionPlugin.java
  • server/src/main/java/org/opensearch/plugins/PluginsService.java
  • server/src/main/java/org/opensearch/plugins/IngestPlugin.java
  • server/src/main/java/org/opensearch/plugins/AnalysisPlugin.java

OpenSearch provides a comprehensive plugin system that allows developers to extend core functionality without modifying the main codebase. Plugins are loaded at startup and can hook into multiple subsystems including search, actions, ingest pipelines, and analysis.

Plugin Architecture

The plugin system is built on a base Plugin class that serves as the foundation for all extensions. Plugins can implement specialized interfaces to extend specific functionality areas. The PluginsService is responsible for discovering, loading, and managing all plugins and modules at runtime.

Key lifecycle methods in the base Plugin class include:

  • createComponents() - Register custom services and components with dependency injection
  • createGuiceModules() - Define Guice modules for dependency management
  • getSettings() - Add custom configuration settings
  • onIndexModule() - Hook into index creation to register index-level extensions
  • getBootstrapChecks() - Enforce startup validation rules

Extension Points

Plugins can implement multiple specialized interfaces to extend different subsystems:

Search Extensions (SearchPlugin): Register custom queries, aggregations, suggesters, score functions, and rescore implementations. Plugins can also provide custom fetch sub-phases and highlighters.

Action Extensions (ActionPlugin): Add custom REST handlers, transport actions, and action filters. Plugins can wrap REST requests for authentication, logging, or request validation.

Ingest Extensions (IngestPlugin): Define custom ingest processors for data transformation during indexing. Supports both regular and system-level processors.

Analysis Extensions (AnalysisPlugin): Contribute custom tokenizers, token filters, and character filters for text analysis.

Other Extensions: Additional interfaces support cluster coordination, discovery, repositories, scripting, mappers, and network transport customization.

Registration Pattern

Plugins follow a consistent registration pattern using specification classes. For example, SearchPlugin.QuerySpec wraps a custom query builder with its parser and serialization reader:

public List<QuerySpec<?>> getQueries() {
    return Arrays.asList(
        new QuerySpec<>(
            "my_custom_query",
            MyCustomQueryBuilder::new,
            MyCustomQueryParser::parse
        )
    );
}

Each spec includes:

  • A name (used for serialization and REST parsing)
  • A reader (deserializes from wire protocol)
  • A parser (converts from JSON/YAML to builder objects)

Component Lifecycle

Plugins can register components that participate in the node lifecycle. Components implementing LifecycleComponent are automatically started when the node starts and stopped during shutdown. This enables plugins to manage resources like thread pools, connections, or caches.

public Collection<Object> createComponents(
    Client client,
    ClusterService clusterService,
    ThreadPool threadPool,
    // ... other services
) {
    return Arrays.asList(new MyCustomService(client, threadPool));
}

Best Practices

  • Use ParseField for query names to support deprecated aliases
  • Register result readers for aggregations to enable serialization
  • Implement proper error handling in REST handlers
  • Use bootstrap checks to validate required configurations
  • Avoid blocking operations in hot paths
  • Document custom settings with descriptions and defaults

Transport & Networking

Relevant Files
  • server/src/main/java/org/opensearch/transport/TransportService.java
  • server/src/main/java/org/opensearch/transport/Transport.java
  • server/src/main/java/org/opensearch/transport/TcpTransport.java
  • server/src/main/java/org/opensearch/transport/ConnectionManager.java
  • server/src/main/java/org/opensearch/http/AbstractHttpServerTransport.java
  • server/src/main/java/org/opensearch/http/HttpServerTransport.java
  • server/src/main/java/org/opensearch/common/network/NetworkService.java

OpenSearch uses a layered networking architecture to handle both inter-node communication (transport) and client-facing HTTP requests. Understanding these layers is essential for debugging network issues, implementing custom transports, or optimizing cluster communication.

Transport Layer (Inter-Node Communication)

The transport layer enables nodes to communicate with each other using a binary protocol. The architecture follows a clear hierarchy:

Core Components:

  • Transport - Interface defining the contract for sending/receiving messages between nodes
  • TcpTransport - Abstract implementation using TCP sockets for inter-node communication
  • TransportService - High-level service wrapping Transport, managing connections, request routing, and response handling
  • ConnectionManager - Manages persistent connections to remote nodes with connection pooling and lifecycle management

Request/Response Flow:

TransportService.sendRequest()
  → ConnectionManager.getConnection(node)
  → Transport.Connection.sendRequest()
  → OutboundHandler (serializes &amp; sends bytes)
  → TcpChannel (low-level socket)
  → Remote Node receives via InboundHandler
  → ProtocolMessageHandler (deserializes)
  → RequestHandlerRegistry (routes to handler)
  → Handler executes &amp; sends response back

HTTP Layer (Client Communication)

The HTTP layer handles REST API requests from clients. It's built on top of Netty for high-performance async I/O.

Request Processing Pipeline:

  1. HttpServerTransport - Binds to HTTP port, manages server channels
  2. Netty4HttpServerTransport - Netty-based implementation with HTTP/1.1 and HTTP/2 support
  3. HttpRequestDecoder - Parses raw HTTP into structured requests
  4. HttpPipeliningHandler - Manages request pipelining for multiple concurrent requests
  5. RestController - Routes HTTP requests to appropriate REST handlers
  6. RestHandler - Executes business logic and sends responses
HTTP Client Request
  → Netty4HttpServerTransport (receives on port 9200)
  → HttpRequestDecoder (parses HTTP)
  → Netty4HttpRequestHandler (creates HttpRequest)
  → AbstractHttpServerTransport.incomingRequest()
  → RestController.dispatchRequest()
  → RestHandler.handleRequest()
  → Response sent back via HttpChannel

Network Configuration

The NetworkService provides centralized network configuration:

  • Bind hosts - Addresses the server listens on (default: localhost)
  • Publish hosts - Addresses advertised to other nodes (for multi-NIC setups)
  • TCP settings - TCP_NO_DELAY, TCP_KEEP_ALIVE, buffer sizes, etc.

Key settings:

network.host              # Bind and publish address
network.bind_host         # Override bind address
network.publish_host      # Override publish address
network.tcp.no_delay      # Disable Nagle's algorithm (default: true)
network.tcp.keep_alive    # Enable TCP keep-alive (default: true)

Connection Profiles

ConnectionProfile defines how many connections to maintain per node and their purposes:

  • LIGHT - Single connection for cluster coordination
  • BULK - Multiple connections for bulk indexing
  • RECOVERY - Dedicated connections for shard recovery
  • HANDSHAKE - Initial connection for version negotiation

Each profile specifies connection count, timeouts, and request types.

Key Design Patterns

Local Node Optimization - Requests to the local node bypass the network stack entirely, using direct method calls for performance.

Connection Pooling - ConnectionManager maintains a pool of connections per node, reusing them across requests to reduce overhead.

Async I/O - Both transport and HTTP layers use non-blocking I/O with Netty, allowing thousands of concurrent connections.

Protocol Abstraction - The Transport interface allows pluggable implementations (native protocol, gRPC, etc.) without changing upper layers.

Build System & Testing

Relevant Files
  • build.gradle
  • DEVELOPER_GUIDE.md
  • TESTING.md
  • gradle/ (build configuration)
  • buildSrc/ (custom Gradle plugins)
  • test/framework/ (test framework)

OpenSearch uses Gradle as its build system with custom plugins for compilation, testing, and distribution. The build orchestrates unit tests, integration tests, REST API tests, and backwards compatibility (BWC) tests across multiple modules and plugins.

Build System Overview

The root build.gradle applies modular Gradle scripts from the gradle/ directory, including formatting, code coverage, FIPS compliance, and local distribution builds. Custom plugins in buildSrc/ extend Gradle to handle OpenSearch-specific concerns like test clustering, REST API specifications, and plugin packaging.

Key build tasks:

  • ./gradlew assemble - Build all distributions (tar, zip, RPM, DEB)
  • ./gradlew localDistro - Build platform-specific distribution
  • ./gradlew check - Run all verification tasks (precommit, unit tests, integration tests)
  • ./gradlew precommit - Run static checks and formatting validation

Test Framework Architecture

OpenSearch uses JUnit 5 with randomized testing via the randomizedtesting-runner library. The test framework (test/framework/) provides base classes for different test scopes:

  • OpenSearchTestCase - Base for unit tests
  • OpenSearchSingleNodeTestCase - Single-node cluster tests
  • OpenSearchIntegTestCase - Multi-node integration tests
  • OpenSearchRestTestCase - REST API tests against external clusters
  • OpenSearchClientYamlSuiteTestCase - YAML-based REST tests

Test Types and Execution

# Unit tests (default test task)
./gradlew :module:test

# Integration tests (in-memory cluster)
./gradlew :module:internalClusterTest

# REST tests (YAML-based)
./gradlew :rest-api-spec:yamlRestTest

# Java REST tests
./gradlew :module:javaRestTest

# Backwards compatibility tests
./gradlew bwcTest

Test filtering and customization:

# Run specific test
./gradlew :server:test --tests "*.ClassName.testMethod"

# Run with custom seed for reproducibility
./gradlew test -Dtests.seed=DEADBEEF

# Repeat test N times
./gradlew test -Dtests.iters=5 -Dtests.class=*.ClassName

# Configure heap size and JVM args
./gradlew test -Dtests.heap.size=4G -Dtests.jvm.argline="-XX:+UseG1GC"

Test Clustering and REST Testing

The TestClustersPlugin manages ephemeral test clusters for integration tests. REST tests can run against:

  • Embedded clusters - Created per test via OpenSearchIntegTestCase
  • External clusters - Via OpenSearchRestTestCase with properties:
    • -Dtests.cluster=localhost:9200
    • -Dtests.rest.cluster=localhost:9200

YAML REST tests support filtering and denylisting:

./gradlew :rest-api-spec:yamlRestTest \
  -Dtests.rest.suite=index,get \
  -Dtests.rest.denylist="index/**/Index document"

Code Quality and Coverage

  • Formatting - Spotless plugin enforces Eclipse JDT formatter (4-space indent, 140-char lines)
  • Code Coverage - JaCoCo generates reports: ./gradlew jacocoTestReport
  • Forbidden APIs - Prevents use of unsafe JDK internals
  • Javadoc - Enforced with custom tags (@opensearch.api, @opensearch.internal)

Test Reliability and Retries

Flaky tests are mitigated via the test-retry plugin. In CI, tests retry up to 3 times with max 10 failures before stopping. Known flaky tests are explicitly listed in build.gradle and tracked with the flaky-test GitHub label.

Backwards Compatibility Testing

BWC tests verify upgrades from previous versions. Tests can run against released versions (downloaded) or unreleased versions (built from source). Environment variables JAVA11_HOME, JAVA17_HOME, etc., are required for multi-version testing.

# Test specific version
./gradlew v5.3.2#bwcTest

# Test with custom distribution
./gradlew bwcTest -PcustomDistributionDownloadType=bundle

Running OpenSearch Locally

# Start OpenSearch from source
./gradlew run

# With custom settings
./gradlew run -Dtests.opensearch.http.host=0.0.0.0 -Dtests.heap.size=4G

# Debug mode
./gradlew run --debug-jvm