Install

llvm/llvm-project

LLVM Compiler Infrastructure

Last updated on Dec 19, 2025 (Commit: 6a470bf)

Overview

Relevant Files
  • README.md
  • llvm/README.txt
  • llvm/include/llvm/IR
  • llvm/lib/IR
  • clang/README.md
  • llvm/tools/llc/llc.cpp

LLVM is a modular compiler infrastructure toolkit designed for constructing highly optimized compilers, optimizers, and runtime environments. The project is organized as a monorepo containing multiple interconnected components that work together to transform source code into efficient machine code.

Core Architecture

Loading diagram...

Key Components

LLVM Core (llvm/) is the foundation, containing:

  • IR (Intermediate Representation): A language-independent, low-level representation that serves as the central abstraction. Located in llvm/include/llvm/IR and llvm/lib/IR, it defines fundamental concepts like modules, functions, basic blocks, and instructions.
  • Optimization Passes: Transformations that improve code efficiency (scalar optimizations, vectorization, loop transformations).
  • Code Generation: Converts IR to target-specific machine code through instruction selection, register allocation, and scheduling.
  • Tools: Assembler, disassembler, bitcode analyzer, and optimizer utilities.

Clang (clang/) is the C-family frontend that compiles C, C++, Objective-C, and Objective-C++ into LLVM IR, enabling language-specific parsing and semantic analysis.

Supporting Libraries:

  • libc++: Modern C++ standard library implementation
  • LLD: Fast linker supporting ELF, Mach-O, COFF, and WebAssembly formats
  • Compiler-RT: Runtime support for sanitizers, profiling, and low-level operations
  • MLIR: Multi-Level IR for high-level compiler abstractions and domain-specific optimizations

Compilation Pipeline

The typical flow is: source code → frontend parsing → LLVM IR generation → optimization passes → target-specific code generation → assembly → linking → executable.

Each stage is modular, allowing custom frontends, optimization strategies, and backends. The IR serves as the universal intermediate format, enabling language-agnostic optimizations and cross-platform code generation.

Design Philosophy

LLVM emphasizes modularity, reusability, and extensibility. Components are designed as libraries rather than monolithic tools, allowing developers to embed LLVM in custom applications. The IR is human-readable (text format) and machine-readable (bitcode format), facilitating debugging and serialization.

Architecture & Core Components

Relevant Files
  • llvm/include/llvm/IR/Module.h
  • llvm/include/llvm/IR/Function.h
  • llvm/include/llvm/IR/BasicBlock.h
  • llvm/include/llvm/IR/Instruction.h
  • llvm/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp
  • llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp
  • llvm/lib/CodeGen/RegAllocGreedy.cpp
  • llvm/include/llvm/CodeGen/TargetPassConfig.h

LLVM IR Hierarchy

LLVM's Intermediate Representation is organized in a strict hierarchy. A Module is the top-level container holding all IR objects for a compilation unit. Each Module contains:

  • Functions – executable code units with a signature and body
  • Global Variables – module-level data declarations
  • Aliases and IFuncs – symbol references and indirect functions
  • Metadata – debug info, profiling data, and annotations

Functions contain BasicBlocks, which are sequences of instructions that execute linearly. Each BasicBlock must end with a terminator instruction (branch, return, etc.). Instructions are the atomic operations: arithmetic, memory access, control flow, and intrinsics.

IR Representation

The IR uses a graph-based representation where:

  • Values are the fundamental unit – instructions, constants, and arguments all produce values
  • Uses track data dependencies between values
  • Types constrain operations and enable optimization
  • Metadata attaches auxiliary information without affecting semantics

This design enables efficient analysis and transformation passes to reason about program behavior.

CodeGen Pipeline

LLVM transforms IR to machine code through multiple stages:

  1. IR Preparation – Lowering high-level constructs (exceptions, intrinsics)
  2. Instruction Selection – Converting IR to target-specific instructions via SelectionDAG or GlobalISel
  3. Register Allocation – Assigning virtual registers to physical registers
  4. Machine Code Emission – Generating assembly or object code

SelectionDAG vs GlobalISel

SelectionDAG is the traditional approach: IR is converted to a directed acyclic graph of operations, then pattern-matched to machine instructions. It’s mature and handles complex lowering well.

GlobalISel is the modern alternative: IR translates to generic machine IR (gMIR), then passes through legalization, register bank selection, and instruction selection. It’s more modular and target-independent.

Register Allocation

The greedy allocator assigns virtual registers to physical registers by:

  • Building live intervals for each virtual register
  • Iterating through intervals in priority order
  • Attempting direct assignment, splitting, or spilling as needed
  • Recoloring when conflicts arise

This balances compile-time performance with code quality.

Pass Infrastructure

LLVM uses a hierarchical pass manager supporting:

  • Function Passes – Operate on individual functions
  • Loop Passes – Iterate over loops with nesting awareness
  • Module Passes – Analyze or transform entire modules
  • Machine Function Passes – Work on machine-level IR

Passes declare analysis dependencies and preservation guarantees, enabling the manager to cache results and invalidate only affected analyses.

Target Abstraction

The TargetMachine abstracts target-specific behavior:

  • TargetLowering – Defines legal operations and calling conventions
  • TargetInstrInfo – Describes instruction properties and patterns
  • TargetRegisterInfo – Specifies register classes and constraints
  • TargetFrameLowering – Handles stack frame layout

This separation allows backends to customize lowering while reusing common infrastructure.

Clang C/C++ Frontend

Relevant Files
  • clang/lib/Frontend - Frontend orchestration and compilation pipeline
  • clang/lib/Parse - Syntax parsing and AST construction
  • clang/lib/Sema - Semantic analysis and type checking
  • clang/include/clang/Frontend/CompilerInstance.h - Main compiler coordinator
  • clang/include/clang/Parse/Parser.h - Parser interface
  • clang/include/clang/Sema/Sema.h - Semantic analyzer interface

The Clang C/C++ frontend is the core compilation pipeline that transforms source code into an Abstract Syntax Tree (AST). It orchestrates lexical analysis, parsing, and semantic analysis to produce a fully-typed, semantically-valid AST ready for code generation.

Architecture Overview

The frontend follows a classic three-phase compiler design:

Loading diagram...

Key Components

CompilerInstance (clang/lib/Frontend/CompilerInstance.cpp) is the central coordinator that manages all compilation objects: the preprocessor, target information, diagnostics engine, and AST context. It owns the lifecycle of these components and provides factory methods for creating them.

Parser (clang/lib/Parse/Parser.cpp) performs recursive descent parsing of tokens produced by the lexer. It builds the initial AST structure by recognizing language grammar rules. The parser handles declarations, expressions, statements, and templates, delegating semantic validation to Sema.

Sema (clang/lib/Sema/Sema.cpp) performs semantic analysis on the parsed AST. It validates type correctness, resolves names, checks access control, instantiates templates, and performs overload resolution. Sema is the largest component, split across 100+ files for different language features (declarations, expressions, templates, etc.).

Compilation Flow

  1. Initialization: CompilerInstance creates the Preprocessor, which handles includes and macros
  2. Parsing: Parser consumes preprocessed tokens and builds an initial AST
  3. Semantic Analysis: Sema validates and enriches the AST with type information
  4. Code Generation: CodeGen converts the typed AST to LLVM IR
  5. Backend: LLVM optimizes and generates machine code

Frontend Actions

FrontendAction is the extensibility point for custom compilation behaviors. Subclasses like ParseSyntaxOnlyAction, EmitLLVMAction, and EmitAssemblyAction define what happens after parsing completes. This enables tools like clang-format, clang-tidy, and the static analyzer to reuse the frontend infrastructure.

Diagnostic System

The DiagnosticsEngine collects and reports errors, warnings, and notes throughout compilation. It integrates with SourceManager to provide precise source locations and context for each diagnostic, enabling helpful error messages with code snippets and fix-it suggestions.

Code Generation & Optimization

Relevant Files
  • llvm/lib/CodeGen - Machine code generation and optimization
  • llvm/lib/Transforms - IR-level transformation passes
  • llvm/lib/Passes - Pass pipeline infrastructure and builders
  • llvm/lib/Analysis - Analysis passes providing optimization data
  • llvm/lib/Target - Target-specific code generation

LLVM's code generation and optimization pipeline transforms high-level IR into efficient machine code through multiple stages. The process combines IR-level optimizations, target-independent transformations, and target-specific lowering.

Optimization Pipeline Architecture

The PassBuilder in llvm/lib/Passes orchestrates the entire optimization pipeline. It constructs different pass sequences based on optimization levels (O0, O1, O2, O3):

  • Module Simplification Pipeline - Early IR canonicalization and cleanup
  • Module Optimization Pipeline - Aggressive optimizations including vectorization
  • Function Simplification Pipeline - Per-function IR improvements
  • CGSCC Pipeline - Call-graph-driven interprocedural optimizations

Each pipeline is composed of individual passes that run in sequence, with analysis results cached and invalidated as needed.

Key Transformation Categories

IR-Level Transforms (llvm/lib/Transforms) include:

  • Scalar Optimizations - Dead code elimination, constant propagation, loop unrolling
  • Vectorization - Loop and SLP vectorization using cost models
  • Interprocedural Optimization (IPO) - Inlining, function specialization, attribute inference
  • Instrumentation - Sanitizers, profiling, coverage instrumentation

Code Generation (llvm/lib/CodeGen) handles:

  • SelectionDAG - IR to DAG lowering with pattern matching and optimization
  • Instruction Selection - DAG to machine instructions via target patterns
  • Register Allocation - Live range analysis and register assignment
  • Machine Scheduling - Instruction ordering for pipeline efficiency
  • Assembly Emission - Final machine code or assembly output

Analysis Infrastructure

Analysis passes provide critical information for optimization decisions:

  • Alias Analysis - Memory dependency tracking
  • Dominance Analysis - Control flow structure
  • Loop Analysis - Loop structure and properties
  • Scalar Evolution - Induction variable analysis
  • Target Transform Info - Target-specific cost modeling

Target-Specific Lowering

Each target in llvm/lib/Target implements:

  • Calling Conventions - ABI-compliant parameter passing
  • Instruction Lowering - Complex operations to target instructions
  • Frame Lowering - Stack frame setup and management
  • Register Info - Register constraints and allocation strategies
Loading diagram...

The pipeline preserves analysis results where possible to minimize recomputation. Passes declare which analyses they preserve, enabling the pass manager to avoid redundant computation and maintain correctness across the optimization sequence.

Linker & Binary Tools

Relevant Files
  • lld/README.md
  • lld/ELF/Driver.cpp, Writer.cpp, SymbolTable.h
  • lld/COFF/, lld/MachO/, lld/wasm/
  • llvm/lib/Object/ (Binary.cpp, ObjectFile.cpp, ELFObjectFile.cpp, COFFObjectFile.cpp, MachOObjectFile.cpp, WasmObjectFile.cpp)
  • llvm/lib/MC/ (MCStreamer.cpp, MCAssembler.cpp, MCObjectWriter.cpp, ELFObjectWriter.cpp, WinCOFFObjectWriter.cpp, MachObjectWriter.cpp)

LLVM's linker and binary tools infrastructure consists of two main layers: the Object library for reading/writing binary formats, and the MC (Machine Code) layer for assembly and code emission.

Object Library (llvm/lib/Object)

The Object library provides a unified interface for handling multiple binary formats. The core abstraction is the Binary class hierarchy:

  • Binary – Base class for all binary file types
  • ObjectFile – Represents relocatable object files (ELF, COFF, MachO, Wasm, XCOFF, GOFF)
  • Archive – Handles static libraries (.a, .lib files)
  • SymbolicFile – Base for files containing symbols

Each format has specialized classes: ELFObjectFile, COFFObjectFile, MachOObjectFile, WasmObjectFile. These provide format-specific APIs while maintaining a common interface for symbol iteration, section access, and relocation handling.

MC Layer (llvm/lib/MC)

The MC layer handles assembly and object file generation:

  • MCStreamer – Abstract interface for emitting assembly directives and machine code
  • MCObjectStreamer – Concrete implementation that builds an in-memory representation
  • MCAssembler – Manages sections, symbols, and fragments; performs layout and relaxation
  • MCObjectWriter – Format-specific writer (ELFObjectWriter, WinCOFFObjectWriter, MachObjectWriter)
  • MCAsmBackend – Target-specific assembly backend handling fixups and relaxation
  • MCCodeEmitter – Encodes machine instructions to bytes

The pipeline: MCStreamer receives directives → MCAssembler builds fragments → layout phase computes offsets → MCObjectWriter emits final binary.

LLD Linker Architecture

LLD is a modular linker with format-specific drivers:

  • lld/Common/ – Shared utilities (error handling, DWARF support, command-line parsing)
  • lld/ELF/ – ELF linker with architecture-specific support (Arch/ subdirectory)
  • lld/COFF/ – Windows PE/COFF linker
  • lld/MachO/ – macOS Mach-O linker
  • lld/wasm/ – WebAssembly linker

Each driver follows a similar pattern: Driver parses options → InputFiles reads objects → SymbolTable resolves symbols → Writer emits output. The ELF linker includes specialized passes: ICF (Identical Code Folding), MarkLive (garbage collection), Relocations (fixup application), and LTO integration.

Key Data Flow

Loading diagram...

Tools Built on These Libraries

  • llvm-readobj – Inspects binary files using Object library
  • llvm-objdump – Disassembles and displays object file contents
  • llvm-link – Links LLVM bitcode modules
  • llvm-jitlink – JIT linker for runtime code linking
  • lld – Main linker (supports ELF, COFF, MachO, Wasm via format detection)

Runtime Libraries & Standard Library

Relevant Files
  • libcxx/include - C++ Standard Library headers
  • libcxx/src - C++ Standard Library implementations
  • libcxxabi/src - C++ ABI runtime support
  • compiler-rt/lib - Compiler runtime support (builtins, sanitizers)
  • libunwind/src - Stack unwinding and exception handling
  • libc/src - LLVM C Standard Library implementation

Overview

The LLVM project provides a comprehensive suite of runtime libraries that form the foundation for compiled C and C++ programs. These libraries handle everything from basic arithmetic operations to exception handling, memory management, and standards compliance.

Core Components

libc++ (C++ Standard Library)

The C++ Standard Library provides containers, algorithms, iterators, and utilities required by the C++ standard. Located in libcxx/, it includes:

  • Containers: vector, map, set, deque, list, unordered_map, flat_map
  • Algorithms: Sorting, searching, transforming, and numeric operations
  • Utilities: Smart pointers, memory management, type traits, function objects
  • I/O Streams: File and string stream implementations
  • Threading: Mutexes, condition variables, threads, atomic operations
  • Ranges & Views: Modern C++20 range-based abstractions

libcxxabi (C++ ABI Runtime)

Provides low-level C++ runtime support in libcxxabi/src/:

  • Exception handling (cxa_exception.cpp, cxa_personality.cpp)
  • Type information and RTTI (private_typeinfo.cpp, stdlib_typeinfo.cpp)
  • Guard variables for static initialization (cxa_guard.cpp)
  • Name demangling (cxa_demangle.cpp)
  • Memory management for exceptions (fallback_malloc.cpp)

compiler-rt (Compiler Runtime)

Located in compiler-rt/lib/, provides essential compiler support:

  • Builtins: Low-level arithmetic, bit manipulation, floating-point operations
  • Sanitizers: AddressSanitizer (ASan), MemorySanitizer (MSan), ThreadSanitizer (TSan), UndefinedBehaviorSanitizer (UBSan)
  • Profiling: Code coverage and instrumentation profiling
  • CFI: Control Flow Integrity checking

libunwind (Stack Unwinding)

Implements stack unwinding for exception handling and debugging:

  • DWARF-based unwinding (DwarfParser.hpp, DwarfInstructions.hpp)
  • Compact unwinding support (CompactUnwinder.hpp)
  • Platform-specific register handling (Registers.hpp)
  • Exception handling personality routines

libc (C Standard Library)

LLVM's implementation of the C Standard Library in libc/src/:

  • Math: Comprehensive math functions with GPU support
  • String/Memory: String manipulation and memory operations
  • I/O: File and stream operations
  • Threading: POSIX threads and synchronization primitives
  • System: File system, process, and signal handling

Architecture & Dependencies

Loading diagram...

Key Design Patterns

Header-Only Components: Many libc++ utilities are header-only for optimization and template instantiation.

Platform Abstraction: Runtime libraries use conditional compilation to support multiple platforms (Linux, macOS, Windows, Fuchsia).

Modular Sanitizers: Each sanitizer (ASan, MSan, TSan) can be independently enabled during compilation.

ABI Stability: libcxxabi maintains stable ABI for C++ exception handling across compiler versions.

Integration Points

These libraries integrate seamlessly with the LLVM toolchain:

  • Clang Frontend: Generates calls to runtime functions
  • Code Generation: Emits intrinsics for builtins
  • Linker (lld): Links runtime libraries with user code
  • Optimization Passes: Leverage runtime knowledge for optimizations

MLIR: Multi-Level Intermediate Representation

Relevant Files
  • mlir/include/mlir - Core IR headers
  • mlir/lib/IR - IR implementation
  • mlir/lib/Dialect - All dialect implementations
  • mlir/docs/LangRef.md - Language reference
  • mlir/lib/RegisterAllDialects.cpp - Dialect registration

MLIR (Multi-Level Intermediate Representation) is a compiler infrastructure designed to represent, analyze, and transform computations at multiple abstraction levels. It bridges high-level dataflow graphs (like TensorFlow or PyTorch) down to target-specific machine code, making it ideal for deep learning and high-performance computing.

Core IR Structure

MLIR's fundamental building blocks form a hierarchical graph:

  • Operations - The basic computational units. Every operation has operands (inputs), results (outputs), attributes (metadata), and optional regions (nested code).
  • Values - Edges in the computation graph. Each value is produced by exactly one operation or block argument.
  • Blocks - Ordered lists of operations. Blocks can have arguments (like function parameters) and successors for control flow.
  • Regions - Ordered lists of blocks. Operations can contain multiple regions to represent nested structures (e.g., loop bodies, function definitions).
  • Attributes - Compile-time metadata attached to operations (constants, names, configuration).
  • Types - Value types in the type system (integers, floats, tensors, custom types).

Dialects: Extensibility Framework

Dialects are the mechanism for extending MLIR with domain-specific operations, types, and attributes. Each dialect represents a particular abstraction level or domain.

Key Dialects:

  • Builtin - Core types and operations (implicitly loaded in every context)
  • Func - Function definitions and calls
  • Arith - Arithmetic operations (add, mul, div)
  • MemRef - Memory reference operations (allocate, load, store)
  • Affine - Affine loop nests and polyhedral optimizations
  • Linalg - Linear algebra operations (matmul, conv)
  • Vector - SIMD vector operations
  • GPU - GPU-specific operations (kernels, synchronization)
  • LLVM - LLVM IR operations for lowering to machine code
  • SPIRV - SPIR-V operations for GPU portability
  • SCF - Structured control flow (if, for, while)
  • Tensor - Immutable tensor operations
  • Transform - Rewrite and transformation operations

Dialect Initialization

Dialects are registered via the DialectRegistry. The RegisterAllDialects.cpp file registers 40+ dialects including AMDGPU, ArmNeon, ArmSME, Async, Bufferization, Complex, ControlFlow, DLTI, EmitC, Index, IRDL, MLProgram, MPI, Math, NVGPU, OpenACC, OpenMP, PDL, Ptr, Quant, Shape, Shard, SparseTensor, Tosa, UB, WasmSSA, X86Vector, and XeGPU.

Multi-Level Representation

MLIR's power lies in representing code at multiple abstraction levels simultaneously:

  1. High-level - Domain-specific operations (e.g., linalg.matmul)
  2. Mid-level - Affine loops and memory operations
  3. Low-level - LLVM IR and target-specific instructions

Passes and transformations progressively lower code from one level to another, enabling domain-specific optimizations at each stage.

Passes and Transformations

The pass infrastructure enables composable transformations:

  • Analysis passes - Compute properties without modifying IR
  • Transform passes - Rewrite and optimize IR
  • Conversion passes - Lower between dialects
  • Canonicalization - Simplify operations to canonical forms

Interfaces and Traits

Operations can implement interfaces (like BufferizableOpInterface) and traits (like Commutative, IsTerminator) to enable generic transformations and analyses.

%result = arith.addi %lhs, %rhs : i32
%tensor = linalg.matmul ins(%A, %B : tensor<4x4xf32>, tensor<4x4xf32>)
                        outs(%C : tensor<4x4xf32>) -> tensor<4x4xf32>

MLIR's extensibility and multi-level design make it a powerful foundation for building compilers across diverse domains.

Specialized Frontends & Languages

Relevant Files
  • flang/lib/Parser – Fortran parser implementation
  • flang/lib/Semantics – Semantic analysis for Fortran
  • flang/lib/Lower – Lowering to FIR intermediate representation
  • flang/lib/Optimizer – FIR dialect and optimization passes
  • clang-tools-extra/clangd – C++ language server implementation
  • clang-tools-extra/clangd/index – Symbol indexing and lookup

Flang: Fortran Compiler Frontend

Flang is a modern Fortran compiler frontend written in C++, designed to replace the legacy flang project. It provides a complete implementation of Fortran 2018 with support for OpenMP and OpenACC directives.

Compilation Pipeline:

  1. Prescan & Parsing (flang/lib/Parser) – Converts source code to a parse tree using parser combinators. Handles fixed and free-form Fortran, preprocessor directives, and language extensions.

  2. Semantic Analysis (flang/lib/Semantics) – Validates the parse tree, resolves names, performs type checking, and builds symbol tables. Produces a semantically-correct program representation.

  3. Lowering to FIR (flang/lib/Lower) – Transforms the parse tree into Fortran IR (FIR), an MLIR-based intermediate representation. Uses the PFT (Program Functional Tree) builder to structure the program.

  4. Optimization (flang/lib/Optimizer) – Applies MLIR passes on FIR to optimize code before final code generation.

FIR: Fortran Intermediate Representation

FIR is a dialect of MLIR specifically designed for Fortran semantics. It provides operations for memory management, array operations, and runtime descriptors.

Key Components:

  • FIR Dialect – Core operations like fir.load, fir.store, fir.call, fir.do_loop for low-level Fortran constructs
  • HLFIR Dialect – High-level operations for expressions and assignments without materializing temporaries, enabling better optimization
  • Type System – Supports Fortran types: !fir.ref&lt;T&gt; (reference), !fir.box&lt;T&gt; (descriptor), !fir.array&lt;...&gt; (arrays)

Clangd: C++ Language Server

Clangd provides IDE features for C++ via the Language Server Protocol (LSP), including code completion, hover information, diagnostics, and refactoring.

Architecture:

  • ClangdLSPServer – Implements LSP protocol, translates JSON-RPC messages to internal operations
  • ClangdServer – Core engine managing AST parsing, indexing, and feature computation
  • TUScheduler – Schedules AST parsing and analysis tasks asynchronously per translation unit
  • Symbol Index – Maintains searchable index of symbols using trigram-based Dex or in-memory MemIndex

Key Features:

  • Code Completion – Fuzzy matching with ranking based on context and usage frequency
  • Go-to Definition/Declaration – Cross-reference lookup via symbol index
  • Diagnostics – Real-time error checking with clang-tidy integration
  • Refactoring – Rename, extract function, and other code transformations
  • Background Indexing – Maintains project-wide symbol index for workspace queries

Indexing Strategy:

Clangd uses a multi-level indexing approach: dynamic index for open files, background index for the entire project, and optional static index for system libraries. Symbol references are tracked with location, kind, and container information for precise cross-reference queries.

Debugging, Profiling & Analysis Tools

Relevant Files
  • lldb/source - LLDB debugger implementation
  • llvm/lib/DebugInfo - Debug information handling (DWARF, CodeView, GSYM)
  • llvm/lib/ProfileData - Instrumentation and profiling data
  • llvm/lib/XRay - XRay function tracing system
  • bolt/lib - BOLT binary optimization framework

LLDB: The Debugger

LLDB is a next-generation, high-performance debugger built as reusable components leveraging LLVM libraries. It provides:

  • Multi-platform support: macOS, iOS, Linux, FreeBSD, Windows
  • Language support: C, C++, Objective-C, Objective-C++
  • Expression evaluation: Uses Clang compiler infrastructure for accurate expression parsing and JIT compilation
  • Data formatters: Custom pretty-printing for complex types
  • Remote debugging: Client-server architecture using gdb-remote protocol

Key components include breakpoints, watchpoints, stack frame inspection, variable tracking, and symbol resolution. LLDB exposes functionality through both CLI and C++ API.

Debug Information Tools

LLVM provides comprehensive debug information support:

  • DWARF: Primary debug format for ELF binaries, handled by llvm/lib/DebugInfo/DWARF
  • CodeView: Microsoft debug format for Windows binaries
  • GSYM: Compact debug format for symbolication and lookup
  • llvm-debuginfo-analyzer: Parses and displays debug info in human-readable logical view format

These tools support parsing, verification, and transformation of debug metadata across compilation units.

Profiling & Performance Analysis

XRay is a function call tracing system combining compiler-inserted instrumentation with runtime control:

  • Nop-sled instrumentation points in binaries
  • Flight Data Recorder (FDR) mode for fixed-memory circular buffering
  • Profiling mode for latency analysis
  • Tools: llvm-xray for trace analysis and graph generation

ProfileData library handles multiple profiling formats:

  • Instrumentation-based PGO: Compiler-inserted counters for profile-guided optimization
  • Sample-based profiling: Integration with Linux perf tool
  • MemProf: Memory allocation profiling
  • Coverage: Code coverage tracking

BOLT: Binary Optimization

BOLT is a post-link optimizer that improves application performance through code layout optimization:

  • Reads execution profiles from perf tool
  • Reorders basic blocks and functions for better cache utilization
  • Supports branch history (LBR/BRBE) for high-quality profiles
  • Key optimizations: function reordering, block splitting, shrink-wrapping, indirect call promotion

Workflow: Collect profile with perf → Convert with perf2bolt → Optimize with llvm-bolt

BOLT Address Translation (BAT) enables profile collection from optimized binaries for iterative optimization.

Integration & Workflow

These tools work together in a complete debugging and optimization pipeline:

Loading diagram...

Debug information enables source-level debugging, while profiling tools identify optimization opportunities. BOLT applies those optimizations at the binary level without recompilation.

Parallel Execution & Offloading

Relevant Files
  • openmp/runtime/src/kmp_runtime.cpp - Thread team management and fork/join
  • openmp/runtime/src/kmp_tasking.cpp - Task scheduling and execution
  • openmp/runtime/src/kmp_taskdeps.cpp - Task dependency tracking
  • openmp/device/src/Parallelism.cpp - Device-side parallel regions
  • offload/libomptarget/omptarget.cpp - Target offloading orchestration
  • offload/libomptarget/PluginManager.cpp - Plugin initialization and device management
  • offload/plugins-nextgen/common/include/PluginInterface.h - Device abstraction layer

Thread-Level Parallelism

OpenMP achieves thread-level parallelism through a fork-join model managed by the runtime. When encountering a #pragma omp parallel region, the master thread forks a team of worker threads. The runtime maintains thread pools to avoid expensive thread creation/destruction. Each thread has a kmp_info_t structure tracking its state, team membership, and task queue.

Thread synchronization uses barriers at fork and join points. The runtime implements multiple barrier algorithms (distributed, centralized) selectable via environment variables. Threads coordinate through atomic operations and condition variables to minimize busy-waiting.

Task Scheduling & Dependencies

Tasks are queued in per-thread deques managed by __kmp_push_task(). When a thread's deque fills, tasks are either executed immediately or the deque is expanded. The runtime supports task stealing—idle threads can steal work from other threads' deques.

Task dependencies are tracked via a dependency graph (kmp_depnode_t). When a task with dependencies is created, the runtime links it to predecessor tasks. Tasks remain blocked until all dependencies complete. Taskgroups provide synchronization points where the master task waits for all child tasks to finish.

// Task dependency linking
__kmp_depnode_link_successor(gtid, thread, task, node, plist);
// Blocks until dependencies satisfied
__kmpc_omp_task_with_deps(loc, gtid, task, ndeps, dep_list, ...);

Device Offloading Architecture

The offloading system uses a plugin-based architecture. PluginManager loads device-specific plugins (CUDA, Level Zero, AMDGPU, Host) at runtime. Each plugin implements the GenericPluginTy interface, providing device allocation, kernel launch, and data transfer operations.

Data mapping is managed by MappingInfo in libomptarget. Host pointers are mapped to device pointers, with reference counting to track active mappings. The AsyncInfoTy structure wraps asynchronous operations, allowing non-blocking data transfers and kernel launches.

Loading diagram...

Asynchronous Operations

Offloading operations are queued asynchronously via AsyncInfoTy. Each device maintains a queue (e.g., CUDA stream) for pending operations. The runtime can synchronize explicitly or query completion non-blockingly. Post-processing functions registered on async objects execute after synchronization completes, enabling cleanup and dependent operations.

Device-Side Parallelism

On GPU devices, the OpenMP device runtime (openmp/device/src/) implements parallel regions using SPMD (Single Program Multiple Data) execution. All threads in a block execute the same code with different thread IDs. The runtime manages team state, synchronization, and task execution within the device kernel.