Does continuous profiling replace distributed tracing?

Continuous profiling does not replace distributed tracing. Distributed tracing identifies which service and request path is slow, while continuous profiling identifies which function within that service consumes the resources. Linking the two through trace ID correlation provides the full diagnostic path from symptom to root cause.

What overhead does eBPF-based profiling add in production?

eBPF-based profiling adds bounded production overhead. Peer-reviewed studies report low runtime overhead for some eBPF-based tracing approaches, though specific figures vary by benchmark. eBPF profilers sample stack traces at high frequency and use BPF maps to pass data to user space for symbolization, keeping the in-kernel path minimal.

How does tail sampling differ from head sampling, and when should teams use each?

Tail sampling makes the sampling decision after trace completion, while head sampling decides at trace start with no buffering required. Tail sampling waits for the complete trace, enabling outcome-based policies that catch partial failures affecting small request percentages; head sampling works well during full outages where most requests fail. Most production deployments combine both: a low head sampling baseline rate with tail sampling policies that retain 100% of error and high-latency traces.

What is the relationship between APM and observability?

APM sits within the broader observability landscape. APM focuses on application-layer performance with predefined metrics and known failure modes, while observability supports open-ended exploration of system state from telemetry outputs without re-instrumentation. Gartner renamed its Magic Quadrant to "Application Performance Monitoring and Observability" by 2023.

How does continuous profiling support performance tools?

Continuous profiling adds a time dimension to resource measurement, enabling correlation between performance changes and specific deployments, traffic patterns, or configuration changes. Where performance testing tools measure aggregate throughput and latency, profiling identifies the exact functions responsible for regressions, including code in third-party libraries and kernel paths invisible to application-level instrumentation.

Application Performance Monitoring: The 2026 Guide

Q: Is the OpenTelemetry profiling signal production-ready?

The OpenTelemetry profiling signal is in Public Alpha as of early 2026 and should be treated as pre-production. The Profiling Data Model v2 remains compatible with pprof through conversion while reducing wire size by roughly 40% through string dictionaries. For production continuous profiling, use Pyroscope or Parca directly until the specification reaches Stable status.

The modern application performance monitoring (APM) approach in 2026 is a unified observability layer that correlates traces, metrics, logs, real user telemetry, synthetic checks, and continuous profiling against a shared OpenTelemetry resource model, so engineering teams can follow a production symptom from a user-facing latency spike down to the specific function consuming CPU, memory, or lock time.

TL;DR

Production latency propagates across services, and metrics or traces alone often stop short of the responsible function. In 2026, OpenTelemetry-based APM commonly centers on traces, metrics, and logs, with continuous profiling emerging as a fourth signal that closes the code-level diagnostic gap, alongside real user monitoring and synthetic monitoring on the user-experience side. These six signals (distributed tracing, metrics, structured logs, real user monitoring, synthetic monitoring, and continuous profiling) define the modern APM stack by connecting service-level symptoms to code-level causes. The CNCF Annual Cloud Native Survey reports that 82% of container users now run Kubernetes in production, which is the deployment substrate most of these signals run on.

Why Cross-Service Latency Hides the Root Cause

A p99 latency spike in one downstream service quietly becomes the median response time of a user-facing frontend. Google SRE materials show that tail latency in downstream dependencies propagates to parent services, so the symptom may appear in one place while the cause sits several hops away.

Aggregate metrics alone cannot expose these cross-service causal chains. CNCF identifies distributed systems and microservices as the shift that pushed organizations beyond basic monitoring into full observability frameworks.

This guide explains what APM includes in 2026, how continuous profiling closes the diagnostic gap that traces and metrics leave open, and which production implementation patterns matter most. Agent-driven workflows sharpen the problem because instrumentation, deployment, and rollback decisions now span build, test, review, and production in parallel. Cosmos, the unified cloud agents platform, runs those agents with shared context and memory across the software development lifecycle, so observability changes stay consistent as services and signals evolve.

[ Coming up next ]

The New Code Review Workflow for AI-Native Engineering Teams

See how leading teams keep code review fast and rigorous as AI writes more of the code.

Save your seat

— Thu, Jul 9 // 9:45 AM PDT

The Six Components of APM in 2026

The six APM components define modern application performance monitoring through a shared OpenTelemetry resource model. That shared model creates cross-signal correlation that moves diagnosis from service-level symptoms to code-level causes.

Component	What It Captures	Production Readiness (OTel)
Distributed Tracing	Request path across services as a DAG of spans	Stable across most major languages, with some exceptions such as Kotlin (development) and Rust (beta)
Metrics	Numeric aggregations: error rate, CPU, request rate	Stable in several major languages, but maturity still varies by SDK
Structured Logs	Timestamped records correlated to traces via TraceId/SpanId	Stable in Java, .NET, PHP, C++; Beta in Go; Development in Python and JS
Real User Monitoring	Core Web Vitals, JS exceptions, network requests from browsers	OTel browser SDK experimental
Synthetic Monitoring	Active probes from global locations emulating user behavior	Protocol-level via Prometheus blackbox exporter; scripted via k6
Continuous Profiling	Code-level CPU, memory, and lock contention by function	Public Alpha in OTel

The OpenTelemetry specification defines distributed traces as directed acyclic graphs of spans. Each span carries a parent span ID, trace context, attributes, events, and span links. Span links specifically address asynchronous message-queue scenarios: when an async operation is queued, a parent-child relationship can be semantically incorrect because the downstream operation executes later.

Cross-signal correlation follows specific technical mechanisms. Log records emitted within an active span context automatically receive TraceId and SpanId fields from the OTel SDK. Metric exemplars carry trace context attached to histogram data points, which creates a "click a metric → navigate to the causing trace" path. The W3C Trace Context specification defines the traceparent header propagated across service boundaries, and browser-side FetchInstrumentation injects this header on outbound API calls to link frontend and backend traces into a single distributed trace.

Continuous Profiling: From "Which Service" to "Which Function"

Continuous profiling identifies function-level resource consumption by sampling production code paths continuously, which moves diagnosis from the service layer to the specific call stacks consuming CPU, memory, or lock time.

Continuous profiling records code-level resource consumption in production at low enough overhead to remain always-on. Parca states the foundational rationale: "You never know at which point in time you are going to need profiling data, so always collect it at low overhead."

A profiler records the current call stack of a process at a fixed interval. When the same call stack is observed multiple times, the profiler increments a counter for that stack, and after a collection window, aggregated data is persisted. Brendan Gregg, who created flame graphs, describes the approach: sampling at a fixed rate is a coarse but effective way to identify which code paths are hot. The width of each bar equals the time consumed by that function, and vertical stacking represents the call hierarchy.

What Profiling Catches That Traces Miss

Continuous profiling catches function-level resource behavior through sampled call stacks, and those samples explain slow requests that distributed tracing can only localize to a service or span.

Traces identify which requests were slow at the service level, while profiles identify why they were slow at the code level. Five categories of production issues illustrate this gap:

Memory leaks invisible until OOM. Gradual increases in memory allocations can appear first as aggregate growth in metrics and later as crashes in logs, while profiling can isolate the responsible call stack.
Intermittent CPU spikes drowned in aggregate profiles. Netflix's FlameScope was built after engineer Vadim Filanovsky investigated an intermittent latency issue and found short CPU utilization increases lasting only a few seconds. FlameScope isolates these spikes by letting users examine arbitrary time slices of a captured profile.
CPU time in uninstrumented third-party libraries. Time spent in serialization libraries, TLS implementations, or kernel syscall paths can appear as unexplained span duration with no child spans.
Garbage collector consuming disproportionate CPU. Garbage collection can dominate CPU usage even though GC is a runtime process with no trace instrumentation.
NUMA effects causing hardware-level degradation. Google Research documents 10-20% performance degradation on Gmail backend and web-search frontend services caused by non-uniform memory access topology, discoverable only through continuous profiling correlated with hardware performance counter data.

Collection Mechanisms and Overhead

Continuous profiling collection mechanisms determine production overhead through sampling and scraping paths. Those collection choices change operational visibility, deployment requirements, and rollout complexity.

Mechanism	How It Works	Overhead	Requirements
eBPF profiling	Kernel-level stack sampling at 100 Hz via BPF maps	Bounded low overhead	No code changes, no restarts
pprof scraping	Application-runtime profiling via annotations	Application-dependent	Code instrumentation or pod annotations

eBPF profiling works on compiled languages without runtime support, covering heterogeneous microservice environments where modifying every service is impractical. The ICPE 2026 paper on tracing agent overhead reports overheads ranging from 92 ns to 657 ns per method invocation, with OpenTelemetry at about 315 ns per method invocation.

Parca uses eBPF and discovers targets via Kubernetes or systemd automatically, with later-documented GPU profiling support that uses NVIDIA CUDA-related USDT probes and eBPF.

OpenTelemetry as the Unifying Data Model

OpenTelemetry unifies APM signals through a shared instrumentation framework, collector pipeline, and OTLP wire protocol. That common data model lets teams correlate traces, metrics, logs, and profiling without separate signal silos.

OpenTelemetry provides the vendor-neutral instrumentation framework, collector pipeline, and wire protocol (OTLP) that unifies all APM signals. CNCF materials describe OpenTelemetry as the second-largest CNCF project and report a 39% rise in commits year-over-year, with contributors increasing from 1,301 to 1,756.

The OTel profiling signal reached Public Alpha in early 2026 with eBPF agent support, ARM64 Node.js V8 support, and .NET 9/10 support. A specification commit introduced Profiling Data Model v2, which remains compatible with pprof through conversion while reducing wire size by roughly 40% through string dictionaries. Engineers planning continuous profiling implementations should use Pyroscope or Parca directly and treat OTLP-native profiling as pre-production until the specification stabilizes.

The OTel Collector serves as the central pipeline, with 65% of users now running more than 10 Collectors in production per the 2026 Collector follow-up survey. Production deployment patterns from case studies follow three distinct topologies.

Organization	Deployment pattern
Mastodon	Single Collector per namespace handling all signals. Zero issues over two years.
Adobe	Three-tier architecture with immutable sidecar config, Helm-configurable routing, and separate Collector deployments per signal type to prevent volume cross-contamination.
Skyscanner	Gateway ReplicaSet for bulk OTLP traffic alongside DaemonSet agents scraping Prometheus endpoints across 24 production clusters.

Each topology is documented in the OpenTelemetry developer experience case studies for Mastodon, Adobe, and Skyscanner, which describe how each team chose its Collector pattern based on volume, signal isolation, and cluster footprint.

Cosmos applies a similar consolidation principle across agents. Agents on Cosmos share an event bus and codebase context, so traces, metrics, and profiles route through one governed path across the software development lifecycle.

Production Implementation: Five Patterns That Matter

Production APM implementation depends on instrumentation and collection patterns matched to real failure modes. Those patterns determine whether signal quality, sampling behavior, SLO alignment, percentile accuracy, and trace-profile correlation hold up under production conditions.

Pattern	Core idea	Production outcome
Hybrid Instrumentation	Combine auto-instrumentation with manual business spans	Preserves baseline visibility while capturing domain events incident analysis depends on
Tail Sampling	Make sampling decisions after trace completion	Captures partial failures that head sampling can miss
SLO-First Design	Start with reliability targets before instrumentation	Keeps telemetry tied to user impact instead of arbitrary dashboards
Metrics-from-Traces	Derive latency histograms before trace sampling discards spans	Keeps p99 and p999 calculations reliable
Link Profiles to Traces	Connect slow spans to CPU or memory-consuming code paths	Shortens diagnosis from service symptoms to specific functions

1. Hybrid Instrumentation: Auto + Manual Business Spans

Hybrid instrumentation combines automatic framework coverage with manual business spans. That combination preserves low-effort baseline visibility while capturing the domain events incident analysis depends on.

Auto-instrumentation covers HTTP, database, and gRPC calls. Manual instrumentation captures domain events like "order created" or "payment authorized" that incident retros require. Skyscanner pre-configures the OTel Java agent in base Docker images and uses opinionated auto-instrumentation to provide HTTP and gRPC span generation out of the box.

2. Tail Sampling for Partial Failures

Tail sampling preserves high-value traces by making the sampling decision after trace completion. That post-completion policy captures partial failures that head sampling can miss.

Head sampling (decision at trace start) misses partial failures. Grafana identifies the failure mode: "In a partial failure where only a small percentage of requests are affected, a head sampling policy may not capture them." Tail sampling requires that all spans for a given trace reach the same collector instance. Teams typically achieve this with a load-balancing component that routes spans by trace ID, such as an OpenTelemetry Collector with the load-balancing exporter, though it does not strictly require a two-tier Collector architecture.

3. SLO-First Design

SLO-first APM design starts with reliability targets, and that target-first instrumentation keeps telemetry tied to user impact instead of arbitrary dashboards.

Stripe set goals that 99.99% of cron jobs start within 20 minutes and that jobs run to completion 99.99% of the time before proceeding with the Kubernetes migration. The instrumentation served those SLOs directly, anchored to user-impact targets defined before any telemetry choices. Multi-window burn rate alerting, following the Google SRE Workbook pattern, replaces static thresholds per Grafana's burn rate notifications docs: a 14.4x burn rate over 5 minutes and 1 hour triggers a page; 6x over 30 minutes and 6 hours triggers a page; slower-burn alerts such as 1x over 6 hours and 3 days are typically routed to ticketing or team notifications.

4. Metrics-from-Traces for Percentile Accuracy

Metrics-from-traces preserves percentile accuracy by deriving latency histograms before trace sampling discards spans. That ingest-time conversion keeps p99 and p999 calculations reliable even when traces are retained selectively.

Open source

augmentcode/auggie★249

Star on GitHub

Accurate p99/p999 latency calculations depend on sufficient sample volume and appropriate histogram-based metric collection, rather than on trace sampling rate alone. Generate histogram metrics from trace data at ingest time, before sampling decisions discard spans. The OTel Collector's spanmetrics connector produces RED (Rate/Errors/Duration) metrics from spans, and Grafana Tempo's TraceQL metrics creates metrics directly from traces without a separate time-series database.

5. Link Profiles to Traces

Linking profiles to traces connects slow spans to the CPU or memory-consuming code paths behind them. That trace-to-profile path shortens diagnosis from service symptoms to specific functions.

eBPF-based profiling captures stack traces linkable to trace spans. When both Datadog tracing and profiler are enabled, Datadog can correlate traces with profiling data, including CPU flame graph views. Parca Agent extracts the current trace ID during stack collection, which links tracing and profiling data streams directly.

On Cosmos, agent-authored production changes run against a live understanding of the stack across repos, services, and history. Every change stays attributable, reviewable, and revertable, even when multiple agents work in parallel.

AI-Assisted APM: Current Production Boundaries

AI-assisted APM narrows high-volume telemetry through correlation, clustering, and summarization. That narrowing reduces the review scope for operators while keeping escalation and remediation under human ownership.

Correlation, clustering, and summarization have reached production maturity in major APM tools, while escalation and remediation still depend on human ownership. Seasonality-aware anomaly detection and topology-aware root cause correlation are production-ready in several major APM tools. NLP-based log clustering appears in some log-analysis products and research but lacks clear documentation as a mature feature across major APM vendors.

Full autonomous remediation for novel failures remains beyond production viability. Netflix's April 2026 engineering blog indicates that, at very large production scale, Ops and Site Reliability Engineers stay online to ensure rapid escalation and remediation of live incidents. A USENIX Security '26 accepted paper demonstrates that LLM-based AIOps agents can be subverted through deliberate manipulation of the telemetry data they consume, which establishes a concrete security boundary for autonomous operations.

This boundary is where Cosmos draws its human-in-the-loop policies. Teams set the points where human judgment is required, and Cosmos enforces those checkpoints across spec review, deep code review, and deployment, so agents narrow the problem while humans stay accountable for escalation.

Instrument Agent-Driven Workflows Before They Ship

Instrumenting agent-driven workflows before release requires production-grade observability because code can be authored, tested, and deployed across multiple services in parallel. That observability makes regressions traceable before they spread through the delivery pipeline. SLO-first APM implementation starts by defining reliability targets before instrumentation, then linking profiles to traces so regressions can be followed back to the responsible function. The gap between individual agent productivity and operational visibility becomes harder to manage without a unified system across the software development lifecycle.

Cosmos closes that gap by running agents on a shared event bus, shared codebase context, and tenant memory that compounds across the team, so traces, profiles, and reviews stay aligned as agents move work from spec to PR to deploy.

Application Performance Monitoring: The 2026 Guide

TL;DR

Why Cross-Service Latency Hides the Root Cause

The New Code Review Workflow for AI-Native Engineering Teams

The Six Components of APM in 2026

Continuous Profiling: From "Which Service" to "Which Function"

What Profiling Catches That Traces Miss

Collection Mechanisms and Overhead

OpenTelemetry as the Unifying Data Model

Production Implementation: Five Patterns That Matter

1. Hybrid Instrumentation: Auto + Manual Business Spans

2. Tail Sampling for Partial Failures

3. SLO-First Design

4. Metrics-from-Traces for Percentile Accuracy

5. Link Profiles to Traces

AI-Assisted APM: Current Production Boundaries

Instrument Agent-Driven Workflows Before They Ship

FAQ

Written by

Ani Galstian

Give your codebase the agents it deserves

TL;DR

Why Cross-Service Latency Hides the Root Cause

The New Code Review Workflow for AI-Native Engineering Teams

The Six Components of APM in 2026

Continuous Profiling: From "Which Service" to "Which Function"

What Profiling Catches That Traces Miss

Collection Mechanisms and Overhead

OpenTelemetry as the Unifying Data Model

Production Implementation: Five Patterns That Matter

1. Hybrid Instrumentation: Auto + Manual Business Spans

2. Tail Sampling for Partial Failures

3. SLO-First Design

4. Metrics-from-Traces for Percentile Accuracy

5. Link Profiles to Traces

AI-Assisted APM: Current Production Boundaries

Instrument Agent-Driven Workflows Before They Ship

FAQ

Does continuous profiling replace distributed tracing?

What overhead does eBPF-based profiling add in production?

Is the OpenTelemetry profiling signal production-ready?

How does tail sampling differ from head sampling, and when should teams use each?

What is the relationship between APM and observability?

How does continuous profiling support performance tools?

Related

Written by

Ani Galstian

Give your codebase the agents it deserves