Jaeger - Distributed Tracing Platform | Augment Code

Overview

Relevant Files

README.md
doc.go
cmd/jaeger/main.go
cmd/jaeger/internal/command.go
cmd/jaeger/internal/components.go
internal/storage/

Jaeger is a distributed tracing platform created by Uber Technologies and donated to the Cloud Native Computing Foundation (CNCF). It is used to monitor and troubleshoot microservices-based distributed systems by collecting, storing, and visualizing trace data from applications.

What is Jaeger?

Jaeger helps developers understand the behavior of complex distributed systems by tracking requests as they flow through multiple services. Each request generates a trace composed of spans (individual operations), allowing teams to identify performance bottlenecks, debug issues, and understand service dependencies.

Architecture Overview

Loading diagram...

Core Components

Jaeger v2 is built on the OpenTelemetry Collector framework and consists of:

Collector - Receives traces from applications via OTLP, Jaeger, Kafka, or Zipkin protocols. Processes traces through configurable pipelines (receivers, processors, exporters).
Storage Backend - Persists trace data. Supports multiple backends: memory, Elasticsearch, Cassandra, ClickHouse, ScyllaDB, and others via plugins.
Query Service - Retrieves traces from storage and serves the Jaeger UI. Provides gRPC and HTTP APIs for querying trace data.
Jaeger UI - React-based web interface for searching, visualizing, and analyzing traces.

Key Features

Multiple Protocol Support: OTLP, Jaeger (Thrift), Zipkin, Kafka
Flexible Processing: Batch processing, tail sampling, adaptive sampling, filtering, and metrics generation
Pluggable Storage: Extensible architecture supporting various backends
All-in-One Mode: Single binary with embedded memory storage for quick evaluation
Metrics Integration: Span-to-metrics conversion via connectors
Multi-Tenancy: Built-in support for tenant isolation

Project Structure

cmd/jaeger/ - Main Jaeger v2 binary and CLI
cmd/query/ - Legacy query service
internal/storage/ - Storage backend implementations
internal/ - Core libraries (config, telemetry, sampling, etc.)
jaeger-ui/ - Frontend (React, submodule)
idl/ - Data models (Protobuf, Thrift, submodule)

Architecture & Core Components

Relevant Files

cmd/jaeger/internal/components.go
cmd/jaeger/internal/exporters/storageexporter
cmd/jaeger/internal/extension/jaegerstorage
cmd/jaeger/internal/extension/jaegerquery
internal/storage/v2/api/tracestore
internal/storage/v1/factory.go

Jaeger v2 is built on the OpenTelemetry Collector framework, combining a modular pipeline architecture with pluggable storage backends. The system separates concerns into three main layers: the collector pipeline, storage abstraction, and query interface.

Collector Pipeline Architecture

Jaeger v2 uses the OpenTelemetry Collector's standard pipeline model with custom Jaeger components:

Receivers accept traces in multiple formats (OTLP, Jaeger, Zipkin, Kafka)
Processors transform and filter traces (batch, tail sampling, adaptive sampling)
Exporters write processed traces to storage backends
Connectors bridge pipelines (e.g., span metrics generation)
Extensions provide auxiliary services (storage, query UI, health checks)

The components.go file registers all available factories. Jaeger adds custom receivers (Jaeger, Kafka), processors (adaptive sampling), and exporters (storage exporter) to the standard OTEL Collector components.

Storage Layer Design

The storage layer uses a factory pattern with two API versions:

V2 API (internal/storage/v2/api/tracestore) defines the modern interface:

type Factory interface {
    CreateTraceReader() (Reader, error)
    CreateTraceWriter() (Writer, error)
}

V1 API (internal/storage/v1) provides legacy span store interfaces. V1 implementations are wrapped by adapters to present the V2 interface, ensuring backward compatibility.

Supported Backends

Jaeger supports multiple storage backends through pluggable factories:

Memory – In-memory store for testing and all-in-one deployments
Badger – Embedded key-value store for single-node deployments
Cassandra – Distributed NoSQL database
Elasticsearch/OpenSearch – Full-text search and analytics
ClickHouse – Columnar database for high-volume tracing
gRPC – Remote storage backend via gRPC protocol

Each backend implements the factory pattern, allowing the system to instantiate readers and writers on demand.

Extension Integration

The jaegerstorage extension manages all configured storage backends. It:

Initializes storage factories during startup
Provides lookup by name for exporters and query services
Handles authentication and metrics collection per backend

The jaeger_storage_exporter connects the pipeline to storage by obtaining a trace writer from the extension and writing sanitized OTEL traces to the backend.

Query Service

The jaegerquery extension implements the traditional Jaeger UI and query API. It depends on the storage extension to access trace readers and provides HTTP endpoints for trace retrieval and UI serving.

Loading diagram...

Configuration

Storage backends are configured declaratively in YAML. Each backend gets a name and type-specific settings:

extensions:
  jaeger_storage:
    backends:
      primary:
        elasticsearch:
          endpoints: [http://localhost:9200]
      archive:
        memory:
          max_traces: 100000

The exporter and query service reference backends by name, enabling flexible multi-backend deployments (e.g., hot storage for recent traces, archive storage for older data).

Storage Backends & Abstraction

Relevant Files

internal/storage/v2/api/tracestore – V2 trace storage interfaces
internal/storage/v2/api/depstore – V2 dependency storage interfaces
internal/storage/v1/api/spanstore – V1 span storage interfaces (legacy)
internal/storage/v2/memory – In-memory backend
internal/storage/v2/elasticsearch – Elasticsearch/OpenSearch backend
internal/storage/v2/clickhouse – ClickHouse backend
internal/storage/v2/cassandra – Cassandra backend
internal/storage/v2/badger – Badger embedded store backend
internal/storage/v2/grpc – Remote gRPC backend
internal/storage/v2/v1adapter – Adapter layer for V1 → V2 migration
cmd/internal/storageconfig – Storage factory configuration

Jaeger uses a pluggable storage abstraction to support multiple backends. The architecture separates storage concerns into two API versions: V2 (current) and V1 (legacy), with an adapter layer enabling gradual migration.

Storage API Architecture

The V2 API defines three core interfaces:

tracestore.Writer – Writes spans to storage via WriteTraces(ctx, ptrace.Traces)
tracestore.Reader – Queries traces using iterators for efficient streaming
depstore.Reader – Retrieves service dependency graphs

Each backend implements a Factory interface that creates reader/writer instances. This factory pattern allows configuration-driven backend selection at startup.

Supported Backends

Loading diagram...

Memory – In-process store for testing and all-in-one deployments
Badger – Embedded key-value store for single-node production use
Cassandra – Distributed NoSQL for high-availability clusters
Elasticsearch/OpenSearch – Full-text search with tag-based filtering
ClickHouse – Columnar database optimized for high-volume analytics
gRPC – Remote storage backend via gRPC protocol

V1 → V2 Migration Pattern

The v1adapter package bridges legacy V1 implementations to the V2 API:

// V1 factories (Cassandra, Badger, Elasticsearch) wrap via adapters
v1Reader := v1Factory.CreateSpanReader()
v2Reader := v1adapter.NewTraceReader(v1Reader)

This allows existing V1 backends to function as V2 implementations without rewriting them. New backends like ClickHouse implement V2 directly.

Factory Configuration

The storageconfig package provides unified configuration:

type TraceBackend struct {
    Memory        *memory.Configuration
    Badger        *badger.Config
    Cassandra     *cassandra.Options
    Elasticsearch *escfg.Configuration
    ClickHouse    *clickhouse.Configuration
    GRPC          *grpc.Config
}

At startup, exactly one backend is selected and its factory is instantiated. The factory creates reader/writer instances on demand, enabling lazy initialization and resource pooling.

Key Design Patterns

Iterator-Based Streaming – V2 readers return Go 1.22 iterators (iter.Seq2) for memory-efficient pagination of large result sets.

Metrics Decoration – Readers are wrapped with tracestoremetrics.ReadMetricsDecorator to collect latency and error metrics per operation.

Idempotent Writes – Writers support idempotent span ingestion; partial failures return errors without atomic guarantees.

Dependency Tracking – Separate depstore interface handles service dependency graphs, enabling independent optimization.

Trace Processing & Conversion

Relevant Files

internal/jptrace - Core trace utilities and aggregation
internal/jptrace/sanitizer - Trace data sanitization and normalization
internal/converter/thrift/jaeger - Thrift format conversions
internal/storage/v2/v1adapter - OTLP to Jaeger model translation
internal/uimodel/converter/v1/json - Domain model to UI JSON conversion
cmd/query/app/otlp_translator.go - Query API OTLP translation

Jaeger processes traces through multiple conversion layers, transforming data from wire formats (OTLP, Thrift) into internal domain models and finally into UI-consumable JSON. This pipeline ensures data consistency, validates integrity, and enriches traces with metadata.

Conversion Pipeline

Traces flow through three primary representations:

Wire Format (OTLP/Thrift) - Raw incoming data from instrumented applications
Domain Model (model.Trace, model.Span) - Jaeger's internal canonical representation
UI Model (uimodel.Trace, uimodel.Span) - JSON-serializable format for frontend consumption

The v1adapter package bridges OTLP and domain models using OpenTelemetry's translator, while sanitizers ensure data quality at each stage.

OTLP to Domain Conversion

The V1BatchesFromTraces() function converts OpenTelemetry Protocol traces to Jaeger batches:

// Converts ptrace.Traces to []*model.Batch
batches := v1adapter.V1BatchesFromTraces(otlpTraces)

// Reverse conversion
traces := v1adapter.V1BatchesToTraces(batches)

This leverages the OpenTelemetry Collector's jaegertranslator package, then applies warning transfer to preserve metadata about transformations.

Trace Aggregation

The AggregateTraces() function in jptrace combines trace chunks into complete traces:

// Aggregates iter.Seq2[[]ptrace.Traces, error] into individual traces
aggregated := jptrace.AggregateTraces(tracesSeq)

Storage backends may return traces in chunks; this function merges spans by trace ID, yielding complete traces for processing.

Sanitization Pipeline

Three standard sanitizers clean and normalize trace data:

Empty Service Name - Replaces missing or empty service names with placeholders
UTF-8 Validation - Fixes invalid UTF-8 in span names and attributes
Negative Duration - Corrects spans where end time < start time

// Apply all standard sanitizers
sanitized := sanitizer.Sanitize(traces)

Sanitizers are composable; custom chains can be created via NewChainedSanitizer().

Domain to UI Conversion

The FromDomain() function transforms domain spans into UI format:

// Converts *model.Trace to *uimodel.Trace
uiTrace := json.FromDomain(domainTrace)

This handles:

Deduplicating processes and assigning process IDs
Converting timestamps to microseconds since epoch
Normalizing span references
Preserving warnings and validation errors

Warning System

Warnings track data quality issues during transformations. They're stored as span attributes using the @jaeger@warnings key:

// Add warnings to a span
jptrace.AddWarnings(span, "invalid-utf8-detected")

// Retrieve warnings
warnings := jptrace.GetWarnings(span)

Warnings propagate through conversions, allowing the UI to display data integrity notes.

Thrift Format Support

Legacy Thrift spans convert via converter/thrift/jaeger:

// Thrift to domain model
spans := jaeger.ToDomain(thriftSpans, thriftProcess)

// Single span conversion
span := jaeger.ToDomainSpan(thriftSpan, thriftProcess)

Errors during conversion are embedded as span tags rather than failing the entire batch, ensuring partial data is preserved.

Query API Translation

The Query API accepts OTLP JSON and converts it to domain traces:

// Unmarshal OTLP JSON, convert to batches, aggregate by trace ID
traces, err := otlp2traces(otlpJsonBytes)

This enables the Query API to accept traces in OpenTelemetry format while maintaining internal consistency.

Data Flow Diagram

Loading diagram...

Key Design Patterns

Composable Sanitizers - Chain multiple validation functions without nesting
Warning Preservation - Metadata about transformations survives conversions
Graceful Degradation - Errors become tags/warnings rather than failures
Lazy Aggregation - Traces aggregate on-demand via iterators, reducing memory overhead

Query Service & API

Relevant Files

cmd/query/app/server.go
cmd/query/app/querysvc/query_service.go
cmd/query/app/grpc_handler.go
cmd/query/app/http_handler.go
cmd/query/app/apiv3/grpc_handler.go
cmd/query/app/apiv3/http_gateway.go

The Query Service is the core component that exposes Jaeger's trace data through multiple APIs. It handles trace retrieval, service discovery, and dependency analysis, supporting both gRPC and HTTP protocols with multiple API versions.

Architecture Overview

Loading diagram...

Server Setup

The Server struct in server.go manages both HTTP and gRPC listeners. It initializes separate servers for each protocol, with support for TLS and multi-port configurations. The server requires separate ports when TLS is enabled to avoid conflicts.

Key initialization steps:

Create gRPC server with interceptors for authentication and tenancy
Register gRPC handlers for API v2 and v3
Create HTTP server with router and middleware
Register HTTP routes for all API versions

Query Service Core

The QueryService (in querysvc/query_service.go) is the business logic layer that:

Retrieves traces by ID or search criteria
Fetches service and operation metadata
Applies trace adjustments (clock skew correction)
Falls back to archive storage when traces aren't found in primary storage

Main methods:

GetTrace() - Fetch a single trace by ID
FindTraces() - Search traces by service, operation, tags, duration
GetServices() - List all services
GetOperations() - List operations for a service
GetDependencies() - Retrieve service dependency graph

API Versions

API v2 (Legacy): Implemented via GRPCHandler and APIHandler. Uses Jaeger's native span model. Supports streaming responses for large traces.

API v3 (Current): Implemented via apiv3.Handler and HTTPGateway. Uses OpenTelemetry Protocol (OTLP) format internally. Provides both gRPC and HTTP endpoints with consistent interfaces.

HTTP Endpoints

The HTTP API exposes RESTful endpoints under /api/ prefix:

GET /api/traces/{traceID} - Retrieve trace by ID
GET /api/traces?service=...&operation=... - Search traces
GET /api/services - List services
GET /api/operations?service=... - List operations
GET /api/dependencies - Service dependencies
POST /api/archive/{traceID} - Archive a trace
GET /api/metrics/latencies - Latency metrics
GET /api/metrics/calls - Call rate metrics
GET /api/metrics/errors - Error rate metrics

API v3 endpoints are available under /api/v3/ with similar functionality but using OTLP format.

Request Flow

Request arrives at HTTP or gRPC server
Handler validates request parameters and converts to internal format
QueryService executes the query against primary storage
Archive fallback triggered if trace not found (when configured)
Adjustments applied (clock skew, span ordering) unless raw traces requested
Response streamed back to client in chunks (for large traces)

Storage Integration

The query service abstracts storage through two interfaces:

Primary storage: Fast, recent trace data (e.g., Elasticsearch, Cassandra)
Archive storage: Long-term retention (optional, separate backend)

When a trace isn't found in primary storage, the service automatically queries archive storage. This enables cost-effective retention policies where hot data stays in fast storage and cold data moves to cheaper backends.

Error Handling

Trace not found: Returns codes.NotFound (gRPC) or 404 (HTTP)
Invalid parameters: Returns codes.InvalidArgument (gRPC) or 400 (HTTP)
Storage errors: Returns codes.Internal (gRPC) or 500 (HTTP)
Nil requests: Rejected with validation errors

Performance Considerations

Streaming responses: Large traces are sent in chunks (max 10 spans per chunk) to avoid memory overhead
Raw traces option: Skips adjustment processing for faster responses when clock skew correction isn't needed
Concurrent requests: Both HTTP and gRPC servers run concurrently with independent listeners
Tenancy support: Multi-tenant deployments use interceptors to enforce data isolation

Sampling & Adaptive Sampling

Relevant Files

internal/sampling/samplingstrategy
internal/sampling/samplingstrategy/adaptive
internal/sampling/http/handler.go
internal/sampling/grpc/grpc_handler.go
cmd/jaeger/internal/processors/adaptivesampling
cmd/jaeger/internal/extension/remotesampling

Jaeger supports two sampling strategies: file-based (static) and adaptive (dynamic). Both are served via HTTP and gRPC endpoints that SDKs query to determine which traces to sample.

File-Based Sampling

File-based sampling uses a static JSON configuration file that defines sampling probabilities for services and operations. The Provider periodically reloads the file at a configurable interval, allowing updates without restarting the collector.

Configuration example:

{
  "default_strategy": {
    "type": "probabilistic",
    "param": 0.5
  },
  "service_strategies": [
    {
      "service": "my-service",
      "type": "probabilistic",
      "param": 0.8,
      "operation_strategies": [
        {
          "operation": "/health",
          "type": "probabilistic",
          "param": 0.0
        }
      ]
    }
  ]
}

Supported strategy types: probabilistic (sampling rate 0-1) and ratelimiting (max traces per second).

Adaptive Sampling

Adaptive sampling dynamically adjusts sampling probabilities based on observed traffic patterns. It aims to maintain a target number of samples per second across all services and operations.

Three main components:

Aggregator — Runs in the trace processing pipeline. Observes root spans, counts traces per service/operation, and periodically flushes throughput metrics to storage.
Post-Aggregator — Loads throughput data from storage (aggregated across all collector instances), calculates optimal sampling probabilities to meet the target QPS, and writes probabilities back to storage. Uses leader-follower election to ensure only one instance performs calculations.
Provider — Periodically reads computed probabilities from storage and converts them into SamplingStrategyResponse objects served to SDKs. Followers refresh probabilities at a shorter interval than leaders.

Loading diagram...

Key Configuration

Adaptive sampling options:

target_samples_per_second — Global target QPS (e.g., 100 traces/sec)
initial_sampling_probability — Probability for new services/operations (default 0.001)
min_samples_per_second — Minimum QPS per operation
aggregation_interval — How often aggregator flushes throughput (default 1 second)
calculation_interval — How often post-aggregator recalculates probabilities (default 1 minute)

Important: Adaptive sampling does not perform sampling itself. The Jaeger backend calculates probabilities and exposes them via the Remote Sampling protocol. OpenTelemetry SDKs query this endpoint and perform the actual sampling based on returned probabilities.

Remote Sampling Extension

The remotesampling extension manages both file-based and adaptive providers. It exposes HTTP and gRPC endpoints for SDKs to query sampling strategies. Configuration specifies either file or adaptive (not both):

extensions:
  remotesampling:
    http:
      endpoint: 0.0.0.0:5778
    adaptive:
      sampling_store: badger
      target_samples_per_second: 100

The extension integrates with the jaegerstorage extension to access the sampling store backend (memory, Cassandra, Badger, Elasticsearch, OpenSearch).

Authentication & Multi-Tenancy

Relevant Files

internal/auth/transport.go
internal/auth/bearertoken/http.go
internal/auth/bearertoken/grpc.go
internal/auth/bearertoken/context.go
internal/auth/apikey/apikey-context.go
internal/tenancy/manager.go
internal/tenancy/grpc.go
internal/tenancy/http.go
internal/tenancy/context.go
internal/tenancy/flags.go

Overview

Jaeger implements a flexible authentication and multi-tenancy system that supports bearer tokens, API keys, and tenant isolation. Authentication mechanisms propagate credentials across HTTP and gRPC transports, while the tenancy system enables data isolation in multi-tenant deployments.

Authentication Architecture

The authentication system uses a pluggable design with two primary mechanisms:

Bearer Tokens extract credentials from HTTP Authorization headers or gRPC metadata. The PropagationHandler middleware parses bearer tokens from incoming requests and stores them in the request context for downstream propagation. It supports both standard Authorization: Bearer <token> format and fallback to X-Forwarded-Access-Token headers.

API Keys provide an alternative authentication method stored directly in context. The GetAPIKey() and ContextWithAPIKey() functions manage API key lifecycle within request contexts.

The RoundTripper wrapper injects authentication headers into outbound HTTP requests by extracting tokens from the request context or using fallback token functions. This enables seamless credential propagation across service boundaries.

Bearer Token Propagation

Bearer tokens propagate through both HTTP and gRPC layers:

HTTP: PropagationHandler extracts tokens from request headers and injects them into context
gRPC Server: NewUnaryServerInterceptor() and NewStreamServerInterceptor() extract tokens from gRPC metadata
gRPC Client: NewUnaryClientInterceptor() and NewStreamClientInterceptor() inject tokens into outgoing request metadata

// Extract from HTTP request
authHeaderValue := r.Header.Get("Authorization")
token := strings.Split(authHeaderValue, " ")[1] // "Bearer token"
ctx = ContextWithBearerToken(ctx, token)

// Propagate to gRPC metadata
ctx = metadata.AppendToOutgoingContext(ctx, "bearer.token", token)

Multi-Tenancy System

The tenancy system isolates data by tenant when enabled. The Manager validates tenant headers against a configured list of allowed tenants.

Configuration uses command-line flags:

--multi-tenancy.enabled: Enable tenant isolation
--multi-tenancy.header: HTTP header name (default: x-tenant)
--multi-tenancy.tenants: Comma-separated list of allowed tenant values

Tenant Extraction follows a priority order:

Check if tenant is already in context (via GetTenant())
Check OpenTelemetry client metadata
Extract from gRPC incoming metadata or HTTP headers

Validation ensures exactly one tenant value per request. Multiple or missing tenant headers return PermissionDenied or Unauthenticated gRPC errors.

Request Flow Diagram

Loading diagram...

Context Propagation Pattern

Both authentication and tenancy use Go's context.Context for request-scoped storage:

// Bearer token context
ctx = ContextWithBearerToken(ctx, token)
token, ok := GetBearerToken(ctx)

// Tenant context
ctx = WithTenant(ctx, "tenant-id")
tenant := GetTenant(ctx)

// API key context
ctx = ContextWithAPIKey(ctx, apiKey)
apiKey, ok := GetAPIKey(ctx)

Integration Points

HTTP Handlers: Wrap with PropagationHandler and ExtractTenantHTTPHandler
gRPC Servers: Register interceptors from bearertoken and tenancy packages
gRPC Clients: Use client interceptors to inject credentials into outbound calls
Storage Backends: Access credentials via context for per-tenant data isolation

Utilities & Tools

Relevant Files

cmd/tracegen
cmd/anonymizer
cmd/remote-storage
cmd/es-index-cleaner
cmd/es-rollover
cmd/esmapping-generator
internal/metrics

Jaeger provides several command-line utilities and tools to support operational tasks, testing, and observability. These tools extend the core platform with specialized functionality for trace generation, data management, and metrics collection.

Trace Generation & Testing

tracegen generates a steady flow of synthetic traces for performance testing and tuning. It creates traces concurrently from multiple worker goroutines, allowing you to simulate realistic trace patterns without requiring actual application instrumentation.

Key features:

Configurable number of workers and trace duration
Support for multiple services and child spans
Integration with OpenTelemetry exporters (OTLP, stdout)
Adaptive sampling support via remote sampling endpoints
Available as Docker image: jaegertracing/jaeger-tracegen

Example usage:

docker run jaegertracing/jaeger-tracegen -service myapp -traces 1000 -workers 4

Data Anonymization

anonymizer queries Jaeger for a specific trace and anonymizes sensitive fields using hashing. This utility is useful for sharing trace data for debugging without exposing production information.

The tool:

Fetches traces from Jaeger query service
Hashes standard tags, custom tags, logs, and process information
Outputs original, anonymized, and UI-compatible JSON files
Generates mapping files to track anonymization transformations

Remote Storage

remote-storage exposes single-node storage backends (memory, Badger) over gRPC, implementing the Jaeger Remote Storage API. This enables distributed deployments where multiple Jaeger components share a centralized storage backend.

Configuration supports:

Memory storage with configurable trace limits
Badger persistent storage with TTL policies
Multi-tenancy with tenant-specific storage backends
gRPC endpoint configuration and TLS

Elasticsearch Index Management

es-index-cleaner removes old Jaeger indices from Elasticsearch to manage storage costs and retention policies. It calculates a cutoff date and deletes indices older than the specified number of days.

es-rollover manages Elasticsearch index lifecycle through three operations:

init: Creates initial indices and aliases
rollover: Transitions to new write indices when size or age thresholds are met
lookback: Removes old indices from read aliases

Both tools support:

Index prefix customization
Archive and dependency index handling
Elasticsearch authentication and TLS
Index Lifecycle Management (ILM) policies

esmapping-generator generates Elasticsearch mappings for Jaeger indices, ensuring proper field types and analysis configurations.

Metrics Infrastructure

The internal/metrics package provides an abstraction layer for metrics collection with multiple backend implementations:

Core Interfaces:

Counter: Tracks event occurrences
Gauge: Records instantaneous measurements
Timer: Measures operation duration
Histogram: Tracks value distributions

Implementations:

Prometheus: Default backend using Prometheus client library
OpenTelemetry: OTEL metrics integration
Local: In-memory metrics for testing via metricstest
Null: No-op implementation for disabling metrics

Metrics are initialized via reflection using struct tags:

type MyMetrics struct {
    RequestCount metrics.Counter `metric:"requests.count"`
    Duration     metrics.Timer   `metric:"request.duration"`
}
metrics.Init(&m, factory, globalTags)

The metricsbuilder package provides CLI flag support for selecting backends and configuring HTTP scrape endpoints.

Configuration & Deployment

Relevant Files

cmd/jaeger/config.yaml
cmd/jaeger/internal/all-in-one.yaml
cmd/jaeger/internal/command.go
cmd/internal/storageconfig/config.go
cmd/jaeger/Dockerfile
docker-compose/

Jaeger v2 uses YAML-based configuration built on the OpenTelemetry Collector framework. The system supports multiple deployment modes, from all-in-one development setups to distributed production architectures with various storage backends.

Configuration System

Jaeger v2 configuration is managed through YAML files that define the complete pipeline: receivers, processors, exporters, extensions, and telemetry settings. The configuration system uses Viper for loading and environment variable substitution.

Configuration Loading:

Configurations are loaded via the --config flag pointing to a YAML file
Environment variables can override values using ${env:VAR_NAME:-default} syntax
If no config is provided, Jaeger defaults to an embedded all-in-one configuration with memory storage
Multiple configuration providers are supported: file, HTTP, HTTPS, environment, and YAML

Core Configuration Structure:

service:
  extensions: [jaeger_storage, jaeger_query]
  pipelines:
    traces:
      receivers: [otlp, jaeger, zipkin]
      processors: [batch]
      exporters: [jaeger_storage_exporter]

extensions:
  jaeger_storage:
    backends:
      primary-store:
        memory:
          max_traces: 100000
  jaeger_query:
    storage:
      traces: primary-store

Storage Backends

Jaeger v2 supports multiple storage backends configured under jaeger_storage.backends. Each backend can be independently configured and referenced by name.

Supported Backends:

Memory - In-process storage, ideal for development and testing
Badger - Embedded key-value store with TTL support for single-node deployments
Cassandra - Distributed NoSQL database for high-scale deployments
Elasticsearch/OpenSearch - Full-text search and analytics capabilities
ClickHouse - Columnar database optimized for trace analytics
gRPC - Remote storage via the Jaeger Remote Storage API

Example: Elasticsearch Configuration

jaeger_storage:
  backends:
    main-storage:
      elasticsearch:
        server_urls:
          - http://localhost:9200
        indices:
          index_prefix: "jaeger-main"
          spans:
            date_layout: "2006-01-02"
            rollover_frequency: "day"

Deployment Modes

All-in-One (Default): Runs without a config file using embedded all-in-one.yaml. Includes memory storage, query service, and sampling endpoints. Suitable for development and demos.

Distributed: Multiple Jaeger instances with separate collector and query components, each configured independently. Enables horizontal scaling and high availability.

Remote Storage: Uses the jaeger-remote-storage binary to expose single-node backends (Memory, Badger) over gRPC, allowing multiple Jaeger instances to share storage.

Docker Deployment

The Dockerfile exposes standard Jaeger ports:

4317 - OTLP gRPC receiver
4318 - OTLP HTTP receiver
14250 - Jaeger gRPC receiver
14268 - Jaeger HTTP receiver
9411 - Zipkin receiver
16686 - Web UI
5778 - Sampling config HTTP
5779 - Sampling config gRPC
13133 - Health check HTTP

Docker Compose Example:

services:
  jaeger:
    image: jaegertracing/jaeger:latest
    volumes:
      - "./config.yaml:/etc/jaeger/config.yml"
    command: ["--config", "/etc/jaeger/config.yml"]
    ports:
      - "16686:16686"
      - "4317:4317"
      - "4318:4318"

Configuration Validation

All configuration components implement validation through the Validate() method. Storage backends, query settings, and exporters are validated at startup to catch configuration errors early. The system uses govalidator for struct validation with required field checks.

Environment Variable Substitution

Configuration values support environment variable expansion:

jaeger_storage:
  backends:
    main:
      elasticsearch:
        server_urls:
          - "${env:ELASTICSEARCH_URL:-http://localhost:9200}"

This enables flexible deployment across different environments without modifying configuration files.

Testing & Integration

Relevant Files

internal/storage/integration – Unit-mode storage integration tests
cmd/jaeger/internal/integration – End-to-end integration tests for Jaeger v2
internal/testutils – Shared testing utilities (leak detection, logging)
internal/metricstest – Metrics testing helpers

Jaeger uses a two-tier testing strategy: unit-mode tests that directly exercise storage APIs, and end-to-end (E2E) tests that validate the full Jaeger v2 collector pipeline.

Unit-Mode Storage Integration Tests

Located in internal/storage/integration, these tests write and read span data directly to storage backends without going through the network layer. The StorageIntegration struct provides a reusable framework:

type StorageIntegration struct {
    TraceWriter      tracestore.Writer
    TraceReader      tracestore.Reader
    DependencyWriter depstore.Writer
    DependencyReader depstore.Reader
    SamplingStore    samplingstore.Store
    CleanUp          func(t *testing.T)
}

Each storage backend (Elasticsearch, Cassandra, Badger, etc.) implements its own test file that instantiates this struct and calls RunAll() or RunSpanStoreTests(). Tests are conditionally skipped using SkipUnlessEnv() based on environment variables like STORAGE=elasticsearch.

End-to-End Integration Tests

The cmd/jaeger/internal/integration package tests the complete Jaeger v2 OtelCol binary. These tests:

Start the Jaeger v2 binary with a specific storage backend configuration
Write spans via OTLP RPC client to the collector’s receiver
Read spans via RPC client to the jaeger_query extension
Verify results match expected data

The E2EStorageIntegration struct extends StorageIntegration and manages binary lifecycle:

type E2EStorageIntegration struct {
    integration.StorageIntegration
    ConfigFile      string
    BinaryName      string
    HealthCheckPort int
    EnvVarOverrides map[string]string
}

Storage Cleaner Extension

Integration tests require clean state between test runs. The storagecleaner extension auto-injects into collector configs and exposes an HTTP endpoint (POST /purge) that calls the storage backend’s Purge() method.

Test Utilities

Leak Detection (internal/testutils/leakcheck.go):

VerifyGoLeaks() detects goroutine leaks in TestMain
Ignores expected leaks from glog, go-metrics, and HTTP transports
Call via defer testutils.VerifyGoLeaksOnce(t) for specific tests

Logging (internal/testutils/logger.go):

NewLogger() creates a zap logger backed by a test buffer
Useful for capturing and asserting log output in tests

Metrics Testing (internal/metricstest):

AssertCounterMetrics() and AssertGaugeMetrics() verify metric values
Snapshot metrics and compare against expected values

Running Tests

# Unit tests with memory storage
make test

# Integration tests for a specific backend
STORAGE=elasticsearch make jaeger-v2-storage-integration-test

# Coverage report
STORAGE=memory make cover

# Format and lint
make fmt
make lint

Tests use go test with build tags like memory_storage_integration to conditionally compile storage implementations. The Makefile orchestrates test execution with proper environment setup and colorized output.