Overview
Relevant Files
README.mdllvm/README.txtllvm/include/llvm/IRllvm/lib/IRclang/README.mdllvm/tools/llc/llc.cpp
LLVM is a modular compiler infrastructure toolkit designed for constructing highly optimized compilers, optimizers, and runtime environments. The project is organized as a monorepo containing multiple interconnected components that work together to transform source code into efficient machine code.
Core Architecture
Loading diagram...
Key Components
LLVM Core (llvm/) is the foundation, containing:
- IR (Intermediate Representation): A language-independent, low-level representation that serves as the central abstraction. Located in
llvm/include/llvm/IRandllvm/lib/IR, it defines fundamental concepts like modules, functions, basic blocks, and instructions. - Optimization Passes: Transformations that improve code efficiency (scalar optimizations, vectorization, loop transformations).
- Code Generation: Converts IR to target-specific machine code through instruction selection, register allocation, and scheduling.
- Tools: Assembler, disassembler, bitcode analyzer, and optimizer utilities.
Clang (clang/) is the C-family frontend that compiles C, C++, Objective-C, and Objective-C++ into LLVM IR, enabling language-specific parsing and semantic analysis.
Supporting Libraries:
- libc++: Modern C++ standard library implementation
- LLD: Fast linker supporting ELF, Mach-O, COFF, and WebAssembly formats
- Compiler-RT: Runtime support for sanitizers, profiling, and low-level operations
- MLIR: Multi-Level IR for high-level compiler abstractions and domain-specific optimizations
Compilation Pipeline
The typical flow is: source code → frontend parsing → LLVM IR generation → optimization passes → target-specific code generation → assembly → linking → executable.
Each stage is modular, allowing custom frontends, optimization strategies, and backends. The IR serves as the universal intermediate format, enabling language-agnostic optimizations and cross-platform code generation.
Design Philosophy
LLVM emphasizes modularity, reusability, and extensibility. Components are designed as libraries rather than monolithic tools, allowing developers to embed LLVM in custom applications. The IR is human-readable (text format) and machine-readable (bitcode format), facilitating debugging and serialization.
Architecture & Core Components
Relevant Files
llvm/include/llvm/IR/Module.hllvm/include/llvm/IR/Function.hllvm/include/llvm/IR/BasicBlock.hllvm/include/llvm/IR/Instruction.hllvm/lib/CodeGen/SelectionDAG/SelectionDAGISel.cppllvm/lib/CodeGen/GlobalISel/IRTranslator.cppllvm/lib/CodeGen/RegAllocGreedy.cppllvm/include/llvm/CodeGen/TargetPassConfig.h
LLVM IR Hierarchy
LLVM's Intermediate Representation is organized in a strict hierarchy. A Module is the top-level container holding all IR objects for a compilation unit. Each Module contains:
- Functions – executable code units with a signature and body
- Global Variables – module-level data declarations
- Aliases and IFuncs – symbol references and indirect functions
- Metadata – debug info, profiling data, and annotations
Functions contain BasicBlocks, which are sequences of instructions that execute linearly. Each BasicBlock must end with a terminator instruction (branch, return, etc.). Instructions are the atomic operations: arithmetic, memory access, control flow, and intrinsics.
IR Representation
The IR uses a graph-based representation where:
- Values are the fundamental unit – instructions, constants, and arguments all produce values
- Uses track data dependencies between values
- Types constrain operations and enable optimization
- Metadata attaches auxiliary information without affecting semantics
This design enables efficient analysis and transformation passes to reason about program behavior.
CodeGen Pipeline
LLVM transforms IR to machine code through multiple stages:
- IR Preparation – Lowering high-level constructs (exceptions, intrinsics)
- Instruction Selection – Converting IR to target-specific instructions via SelectionDAG or GlobalISel
- Register Allocation – Assigning virtual registers to physical registers
- Machine Code Emission – Generating assembly or object code
SelectionDAG vs GlobalISel
SelectionDAG is the traditional approach: IR is converted to a directed acyclic graph of operations, then pattern-matched to machine instructions. It’s mature and handles complex lowering well.
GlobalISel is the modern alternative: IR translates to generic machine IR (gMIR), then passes through legalization, register bank selection, and instruction selection. It’s more modular and target-independent.
Register Allocation
The greedy allocator assigns virtual registers to physical registers by:
- Building live intervals for each virtual register
- Iterating through intervals in priority order
- Attempting direct assignment, splitting, or spilling as needed
- Recoloring when conflicts arise
This balances compile-time performance with code quality.
Pass Infrastructure
LLVM uses a hierarchical pass manager supporting:
- Function Passes – Operate on individual functions
- Loop Passes – Iterate over loops with nesting awareness
- Module Passes – Analyze or transform entire modules
- Machine Function Passes – Work on machine-level IR
Passes declare analysis dependencies and preservation guarantees, enabling the manager to cache results and invalidate only affected analyses.
Target Abstraction
The TargetMachine abstracts target-specific behavior:
- TargetLowering – Defines legal operations and calling conventions
- TargetInstrInfo – Describes instruction properties and patterns
- TargetRegisterInfo – Specifies register classes and constraints
- TargetFrameLowering – Handles stack frame layout
This separation allows backends to customize lowering while reusing common infrastructure.
Clang C/C++ Frontend
Relevant Files
clang/lib/Frontend- Frontend orchestration and compilation pipelineclang/lib/Parse- Syntax parsing and AST constructionclang/lib/Sema- Semantic analysis and type checkingclang/include/clang/Frontend/CompilerInstance.h- Main compiler coordinatorclang/include/clang/Parse/Parser.h- Parser interfaceclang/include/clang/Sema/Sema.h- Semantic analyzer interface
The Clang C/C++ frontend is the core compilation pipeline that transforms source code into an Abstract Syntax Tree (AST). It orchestrates lexical analysis, parsing, and semantic analysis to produce a fully-typed, semantically-valid AST ready for code generation.
Architecture Overview
The frontend follows a classic three-phase compiler design:
Loading diagram...
Key Components
CompilerInstance (clang/lib/Frontend/CompilerInstance.cpp) is the central coordinator that manages all compilation objects: the preprocessor, target information, diagnostics engine, and AST context. It owns the lifecycle of these components and provides factory methods for creating them.
Parser (clang/lib/Parse/Parser.cpp) performs recursive descent parsing of tokens produced by the lexer. It builds the initial AST structure by recognizing language grammar rules. The parser handles declarations, expressions, statements, and templates, delegating semantic validation to Sema.
Sema (clang/lib/Sema/Sema.cpp) performs semantic analysis on the parsed AST. It validates type correctness, resolves names, checks access control, instantiates templates, and performs overload resolution. Sema is the largest component, split across 100+ files for different language features (declarations, expressions, templates, etc.).
Compilation Flow
- Initialization:
CompilerInstancecreates thePreprocessor, which handles includes and macros - Parsing:
Parserconsumes preprocessed tokens and builds an initial AST - Semantic Analysis:
Semavalidates and enriches the AST with type information - Code Generation:
CodeGenconverts the typed AST to LLVM IR - Backend: LLVM optimizes and generates machine code
Frontend Actions
FrontendAction is the extensibility point for custom compilation behaviors. Subclasses like ParseSyntaxOnlyAction, EmitLLVMAction, and EmitAssemblyAction define what happens after parsing completes. This enables tools like clang-format, clang-tidy, and the static analyzer to reuse the frontend infrastructure.
Diagnostic System
The DiagnosticsEngine collects and reports errors, warnings, and notes throughout compilation. It integrates with SourceManager to provide precise source locations and context for each diagnostic, enabling helpful error messages with code snippets and fix-it suggestions.
Code Generation & Optimization
Relevant Files
llvm/lib/CodeGen- Machine code generation and optimizationllvm/lib/Transforms- IR-level transformation passesllvm/lib/Passes- Pass pipeline infrastructure and buildersllvm/lib/Analysis- Analysis passes providing optimization datallvm/lib/Target- Target-specific code generation
LLVM's code generation and optimization pipeline transforms high-level IR into efficient machine code through multiple stages. The process combines IR-level optimizations, target-independent transformations, and target-specific lowering.
Optimization Pipeline Architecture
The PassBuilder in llvm/lib/Passes orchestrates the entire optimization pipeline. It constructs different pass sequences based on optimization levels (O0, O1, O2, O3):
- Module Simplification Pipeline - Early IR canonicalization and cleanup
- Module Optimization Pipeline - Aggressive optimizations including vectorization
- Function Simplification Pipeline - Per-function IR improvements
- CGSCC Pipeline - Call-graph-driven interprocedural optimizations
Each pipeline is composed of individual passes that run in sequence, with analysis results cached and invalidated as needed.
Key Transformation Categories
IR-Level Transforms (llvm/lib/Transforms) include:
- Scalar Optimizations - Dead code elimination, constant propagation, loop unrolling
- Vectorization - Loop and SLP vectorization using cost models
- Interprocedural Optimization (IPO) - Inlining, function specialization, attribute inference
- Instrumentation - Sanitizers, profiling, coverage instrumentation
Code Generation (llvm/lib/CodeGen) handles:
- SelectionDAG - IR to DAG lowering with pattern matching and optimization
- Instruction Selection - DAG to machine instructions via target patterns
- Register Allocation - Live range analysis and register assignment
- Machine Scheduling - Instruction ordering for pipeline efficiency
- Assembly Emission - Final machine code or assembly output
Analysis Infrastructure
Analysis passes provide critical information for optimization decisions:
- Alias Analysis - Memory dependency tracking
- Dominance Analysis - Control flow structure
- Loop Analysis - Loop structure and properties
- Scalar Evolution - Induction variable analysis
- Target Transform Info - Target-specific cost modeling
Target-Specific Lowering
Each target in llvm/lib/Target implements:
- Calling Conventions - ABI-compliant parameter passing
- Instruction Lowering - Complex operations to target instructions
- Frame Lowering - Stack frame setup and management
- Register Info - Register constraints and allocation strategies
Loading diagram...
The pipeline preserves analysis results where possible to minimize recomputation. Passes declare which analyses they preserve, enabling the pass manager to avoid redundant computation and maintain correctness across the optimization sequence.
Linker & Binary Tools
Relevant Files
lld/README.mdlld/ELF/Driver.cpp,Writer.cpp,SymbolTable.hlld/COFF/,lld/MachO/,lld/wasm/llvm/lib/Object/(Binary.cpp, ObjectFile.cpp, ELFObjectFile.cpp, COFFObjectFile.cpp, MachOObjectFile.cpp, WasmObjectFile.cpp)llvm/lib/MC/(MCStreamer.cpp, MCAssembler.cpp, MCObjectWriter.cpp, ELFObjectWriter.cpp, WinCOFFObjectWriter.cpp, MachObjectWriter.cpp)
LLVM's linker and binary tools infrastructure consists of two main layers: the Object library for reading/writing binary formats, and the MC (Machine Code) layer for assembly and code emission.
Object Library (llvm/lib/Object)
The Object library provides a unified interface for handling multiple binary formats. The core abstraction is the Binary class hierarchy:
- Binary – Base class for all binary file types
- ObjectFile – Represents relocatable object files (ELF, COFF, MachO, Wasm, XCOFF, GOFF)
- Archive – Handles static libraries (.a, .lib files)
- SymbolicFile – Base for files containing symbols
Each format has specialized classes: ELFObjectFile, COFFObjectFile, MachOObjectFile, WasmObjectFile. These provide format-specific APIs while maintaining a common interface for symbol iteration, section access, and relocation handling.
MC Layer (llvm/lib/MC)
The MC layer handles assembly and object file generation:
- MCStreamer – Abstract interface for emitting assembly directives and machine code
- MCObjectStreamer – Concrete implementation that builds an in-memory representation
- MCAssembler – Manages sections, symbols, and fragments; performs layout and relaxation
- MCObjectWriter – Format-specific writer (ELFObjectWriter, WinCOFFObjectWriter, MachObjectWriter)
- MCAsmBackend – Target-specific assembly backend handling fixups and relaxation
- MCCodeEmitter – Encodes machine instructions to bytes
The pipeline: MCStreamer receives directives → MCAssembler builds fragments → layout phase computes offsets → MCObjectWriter emits final binary.
LLD Linker Architecture
LLD is a modular linker with format-specific drivers:
- lld/Common/ – Shared utilities (error handling, DWARF support, command-line parsing)
- lld/ELF/ – ELF linker with architecture-specific support (Arch/ subdirectory)
- lld/COFF/ – Windows PE/COFF linker
- lld/MachO/ – macOS Mach-O linker
- lld/wasm/ – WebAssembly linker
Each driver follows a similar pattern: Driver parses options → InputFiles reads objects → SymbolTable resolves symbols → Writer emits output. The ELF linker includes specialized passes: ICF (Identical Code Folding), MarkLive (garbage collection), Relocations (fixup application), and LTO integration.
Key Data Flow
Loading diagram...
Tools Built on These Libraries
- llvm-readobj – Inspects binary files using Object library
- llvm-objdump – Disassembles and displays object file contents
- llvm-link – Links LLVM bitcode modules
- llvm-jitlink – JIT linker for runtime code linking
- lld – Main linker (supports ELF, COFF, MachO, Wasm via format detection)
Runtime Libraries & Standard Library
Relevant Files
libcxx/include- C++ Standard Library headerslibcxx/src- C++ Standard Library implementationslibcxxabi/src- C++ ABI runtime supportcompiler-rt/lib- Compiler runtime support (builtins, sanitizers)libunwind/src- Stack unwinding and exception handlinglibc/src- LLVM C Standard Library implementation
Overview
The LLVM project provides a comprehensive suite of runtime libraries that form the foundation for compiled C and C++ programs. These libraries handle everything from basic arithmetic operations to exception handling, memory management, and standards compliance.
Core Components
libc++ (C++ Standard Library)
The C++ Standard Library provides containers, algorithms, iterators, and utilities required by the C++ standard. Located in libcxx/, it includes:
- Containers:
vector,map,set,deque,list,unordered_map,flat_map - Algorithms: Sorting, searching, transforming, and numeric operations
- Utilities: Smart pointers, memory management, type traits, function objects
- I/O Streams: File and string stream implementations
- Threading: Mutexes, condition variables, threads, atomic operations
- Ranges & Views: Modern C++20 range-based abstractions
libcxxabi (C++ ABI Runtime)
Provides low-level C++ runtime support in libcxxabi/src/:
- Exception handling (
cxa_exception.cpp,cxa_personality.cpp) - Type information and RTTI (
private_typeinfo.cpp,stdlib_typeinfo.cpp) - Guard variables for static initialization (
cxa_guard.cpp) - Name demangling (
cxa_demangle.cpp) - Memory management for exceptions (
fallback_malloc.cpp)
compiler-rt (Compiler Runtime)
Located in compiler-rt/lib/, provides essential compiler support:
- Builtins: Low-level arithmetic, bit manipulation, floating-point operations
- Sanitizers: AddressSanitizer (ASan), MemorySanitizer (MSan), ThreadSanitizer (TSan), UndefinedBehaviorSanitizer (UBSan)
- Profiling: Code coverage and instrumentation profiling
- CFI: Control Flow Integrity checking
libunwind (Stack Unwinding)
Implements stack unwinding for exception handling and debugging:
- DWARF-based unwinding (
DwarfParser.hpp,DwarfInstructions.hpp) - Compact unwinding support (
CompactUnwinder.hpp) - Platform-specific register handling (
Registers.hpp) - Exception handling personality routines
libc (C Standard Library)
LLVM's implementation of the C Standard Library in libc/src/:
- Math: Comprehensive math functions with GPU support
- String/Memory: String manipulation and memory operations
- I/O: File and stream operations
- Threading: POSIX threads and synchronization primitives
- System: File system, process, and signal handling
Architecture & Dependencies
Loading diagram...
Key Design Patterns
Header-Only Components: Many libc++ utilities are header-only for optimization and template instantiation.
Platform Abstraction: Runtime libraries use conditional compilation to support multiple platforms (Linux, macOS, Windows, Fuchsia).
Modular Sanitizers: Each sanitizer (ASan, MSan, TSan) can be independently enabled during compilation.
ABI Stability: libcxxabi maintains stable ABI for C++ exception handling across compiler versions.
Integration Points
These libraries integrate seamlessly with the LLVM toolchain:
- Clang Frontend: Generates calls to runtime functions
- Code Generation: Emits intrinsics for builtins
- Linker (lld): Links runtime libraries with user code
- Optimization Passes: Leverage runtime knowledge for optimizations
MLIR: Multi-Level Intermediate Representation
Relevant Files
mlir/include/mlir- Core IR headersmlir/lib/IR- IR implementationmlir/lib/Dialect- All dialect implementationsmlir/docs/LangRef.md- Language referencemlir/lib/RegisterAllDialects.cpp- Dialect registration
MLIR (Multi-Level Intermediate Representation) is a compiler infrastructure designed to represent, analyze, and transform computations at multiple abstraction levels. It bridges high-level dataflow graphs (like TensorFlow or PyTorch) down to target-specific machine code, making it ideal for deep learning and high-performance computing.
Core IR Structure
MLIR's fundamental building blocks form a hierarchical graph:
- Operations - The basic computational units. Every operation has operands (inputs), results (outputs), attributes (metadata), and optional regions (nested code).
- Values - Edges in the computation graph. Each value is produced by exactly one operation or block argument.
- Blocks - Ordered lists of operations. Blocks can have arguments (like function parameters) and successors for control flow.
- Regions - Ordered lists of blocks. Operations can contain multiple regions to represent nested structures (e.g., loop bodies, function definitions).
- Attributes - Compile-time metadata attached to operations (constants, names, configuration).
- Types - Value types in the type system (integers, floats, tensors, custom types).
Dialects: Extensibility Framework
Dialects are the mechanism for extending MLIR with domain-specific operations, types, and attributes. Each dialect represents a particular abstraction level or domain.
Key Dialects:
- Builtin - Core types and operations (implicitly loaded in every context)
- Func - Function definitions and calls
- Arith - Arithmetic operations (add, mul, div)
- MemRef - Memory reference operations (allocate, load, store)
- Affine - Affine loop nests and polyhedral optimizations
- Linalg - Linear algebra operations (matmul, conv)
- Vector - SIMD vector operations
- GPU - GPU-specific operations (kernels, synchronization)
- LLVM - LLVM IR operations for lowering to machine code
- SPIRV - SPIR-V operations for GPU portability
- SCF - Structured control flow (if, for, while)
- Tensor - Immutable tensor operations
- Transform - Rewrite and transformation operations
Dialect Initialization
Dialects are registered via the DialectRegistry. The RegisterAllDialects.cpp file registers 40+ dialects including AMDGPU, ArmNeon, ArmSME, Async, Bufferization, Complex, ControlFlow, DLTI, EmitC, Index, IRDL, MLProgram, MPI, Math, NVGPU, OpenACC, OpenMP, PDL, Ptr, Quant, Shape, Shard, SparseTensor, Tosa, UB, WasmSSA, X86Vector, and XeGPU.
Multi-Level Representation
MLIR's power lies in representing code at multiple abstraction levels simultaneously:
- High-level - Domain-specific operations (e.g.,
linalg.matmul) - Mid-level - Affine loops and memory operations
- Low-level - LLVM IR and target-specific instructions
Passes and transformations progressively lower code from one level to another, enabling domain-specific optimizations at each stage.
Passes and Transformations
The pass infrastructure enables composable transformations:
- Analysis passes - Compute properties without modifying IR
- Transform passes - Rewrite and optimize IR
- Conversion passes - Lower between dialects
- Canonicalization - Simplify operations to canonical forms
Interfaces and Traits
Operations can implement interfaces (like BufferizableOpInterface) and traits (like Commutative, IsTerminator) to enable generic transformations and analyses.
%result = arith.addi %lhs, %rhs : i32
%tensor = linalg.matmul ins(%A, %B : tensor<4x4xf32>, tensor<4x4xf32>)
outs(%C : tensor<4x4xf32>) -> tensor<4x4xf32>
MLIR's extensibility and multi-level design make it a powerful foundation for building compilers across diverse domains.
Specialized Frontends & Languages
Relevant Files
flang/lib/Parser– Fortran parser implementationflang/lib/Semantics– Semantic analysis for Fortranflang/lib/Lower– Lowering to FIR intermediate representationflang/lib/Optimizer– FIR dialect and optimization passesclang-tools-extra/clangd– C++ language server implementationclang-tools-extra/clangd/index– Symbol indexing and lookup
Flang: Fortran Compiler Frontend
Flang is a modern Fortran compiler frontend written in C++, designed to replace the legacy flang project. It provides a complete implementation of Fortran 2018 with support for OpenMP and OpenACC directives.
Compilation Pipeline:
-
Prescan & Parsing (
flang/lib/Parser) – Converts source code to a parse tree using parser combinators. Handles fixed and free-form Fortran, preprocessor directives, and language extensions. -
Semantic Analysis (
flang/lib/Semantics) – Validates the parse tree, resolves names, performs type checking, and builds symbol tables. Produces a semantically-correct program representation. -
Lowering to FIR (
flang/lib/Lower) – Transforms the parse tree into Fortran IR (FIR), an MLIR-based intermediate representation. Uses the PFT (Program Functional Tree) builder to structure the program. -
Optimization (
flang/lib/Optimizer) – Applies MLIR passes on FIR to optimize code before final code generation.
FIR: Fortran Intermediate Representation
FIR is a dialect of MLIR specifically designed for Fortran semantics. It provides operations for memory management, array operations, and runtime descriptors.
Key Components:
- FIR Dialect – Core operations like
fir.load,fir.store,fir.call,fir.do_loopfor low-level Fortran constructs - HLFIR Dialect – High-level operations for expressions and assignments without materializing temporaries, enabling better optimization
- Type System – Supports Fortran types:
!fir.ref<T>(reference),!fir.box<T>(descriptor),!fir.array<...>(arrays)
Clangd: C++ Language Server
Clangd provides IDE features for C++ via the Language Server Protocol (LSP), including code completion, hover information, diagnostics, and refactoring.
Architecture:
- ClangdLSPServer – Implements LSP protocol, translates JSON-RPC messages to internal operations
- ClangdServer – Core engine managing AST parsing, indexing, and feature computation
- TUScheduler – Schedules AST parsing and analysis tasks asynchronously per translation unit
- Symbol Index – Maintains searchable index of symbols using trigram-based Dex or in-memory MemIndex
Key Features:
- Code Completion – Fuzzy matching with ranking based on context and usage frequency
- Go-to Definition/Declaration – Cross-reference lookup via symbol index
- Diagnostics – Real-time error checking with clang-tidy integration
- Refactoring – Rename, extract function, and other code transformations
- Background Indexing – Maintains project-wide symbol index for workspace queries
Indexing Strategy:
Clangd uses a multi-level indexing approach: dynamic index for open files, background index for the entire project, and optional static index for system libraries. Symbol references are tracked with location, kind, and container information for precise cross-reference queries.
Debugging, Profiling & Analysis Tools
Relevant Files
lldb/source- LLDB debugger implementationllvm/lib/DebugInfo- Debug information handling (DWARF, CodeView, GSYM)llvm/lib/ProfileData- Instrumentation and profiling datallvm/lib/XRay- XRay function tracing systembolt/lib- BOLT binary optimization framework
LLDB: The Debugger
LLDB is a next-generation, high-performance debugger built as reusable components leveraging LLVM libraries. It provides:
- Multi-platform support: macOS, iOS, Linux, FreeBSD, Windows
- Language support: C, C++, Objective-C, Objective-C++
- Expression evaluation: Uses Clang compiler infrastructure for accurate expression parsing and JIT compilation
- Data formatters: Custom pretty-printing for complex types
- Remote debugging: Client-server architecture using gdb-remote protocol
Key components include breakpoints, watchpoints, stack frame inspection, variable tracking, and symbol resolution. LLDB exposes functionality through both CLI and C++ API.
Debug Information Tools
LLVM provides comprehensive debug information support:
- DWARF: Primary debug format for ELF binaries, handled by
llvm/lib/DebugInfo/DWARF - CodeView: Microsoft debug format for Windows binaries
- GSYM: Compact debug format for symbolication and lookup
- llvm-debuginfo-analyzer: Parses and displays debug info in human-readable logical view format
These tools support parsing, verification, and transformation of debug metadata across compilation units.
Profiling & Performance Analysis
XRay is a function call tracing system combining compiler-inserted instrumentation with runtime control:
- Nop-sled instrumentation points in binaries
- Flight Data Recorder (FDR) mode for fixed-memory circular buffering
- Profiling mode for latency analysis
- Tools:
llvm-xrayfor trace analysis and graph generation
ProfileData library handles multiple profiling formats:
- Instrumentation-based PGO: Compiler-inserted counters for profile-guided optimization
- Sample-based profiling: Integration with Linux
perftool - MemProf: Memory allocation profiling
- Coverage: Code coverage tracking
BOLT: Binary Optimization
BOLT is a post-link optimizer that improves application performance through code layout optimization:
- Reads execution profiles from
perftool - Reorders basic blocks and functions for better cache utilization
- Supports branch history (LBR/BRBE) for high-quality profiles
- Key optimizations: function reordering, block splitting, shrink-wrapping, indirect call promotion
Workflow: Collect profile with perf → Convert with perf2bolt → Optimize with llvm-bolt
BOLT Address Translation (BAT) enables profile collection from optimized binaries for iterative optimization.
Integration & Workflow
These tools work together in a complete debugging and optimization pipeline:
Loading diagram...
Debug information enables source-level debugging, while profiling tools identify optimization opportunities. BOLT applies those optimizations at the binary level without recompilation.
Parallel Execution & Offloading
Relevant Files
openmp/runtime/src/kmp_runtime.cpp- Thread team management and fork/joinopenmp/runtime/src/kmp_tasking.cpp- Task scheduling and executionopenmp/runtime/src/kmp_taskdeps.cpp- Task dependency trackingopenmp/device/src/Parallelism.cpp- Device-side parallel regionsoffload/libomptarget/omptarget.cpp- Target offloading orchestrationoffload/libomptarget/PluginManager.cpp- Plugin initialization and device managementoffload/plugins-nextgen/common/include/PluginInterface.h- Device abstraction layer
Thread-Level Parallelism
OpenMP achieves thread-level parallelism through a fork-join model managed by the runtime. When encountering a #pragma omp parallel region, the master thread forks a team of worker threads. The runtime maintains thread pools to avoid expensive thread creation/destruction. Each thread has a kmp_info_t structure tracking its state, team membership, and task queue.
Thread synchronization uses barriers at fork and join points. The runtime implements multiple barrier algorithms (distributed, centralized) selectable via environment variables. Threads coordinate through atomic operations and condition variables to minimize busy-waiting.
Task Scheduling & Dependencies
Tasks are queued in per-thread deques managed by __kmp_push_task(). When a thread's deque fills, tasks are either executed immediately or the deque is expanded. The runtime supports task stealing—idle threads can steal work from other threads' deques.
Task dependencies are tracked via a dependency graph (kmp_depnode_t). When a task with dependencies is created, the runtime links it to predecessor tasks. Tasks remain blocked until all dependencies complete. Taskgroups provide synchronization points where the master task waits for all child tasks to finish.
// Task dependency linking
__kmp_depnode_link_successor(gtid, thread, task, node, plist);
// Blocks until dependencies satisfied
__kmpc_omp_task_with_deps(loc, gtid, task, ndeps, dep_list, ...);
Device Offloading Architecture
The offloading system uses a plugin-based architecture. PluginManager loads device-specific plugins (CUDA, Level Zero, AMDGPU, Host) at runtime. Each plugin implements the GenericPluginTy interface, providing device allocation, kernel launch, and data transfer operations.
Data mapping is managed by MappingInfo in libomptarget. Host pointers are mapped to device pointers, with reference counting to track active mappings. The AsyncInfoTy structure wraps asynchronous operations, allowing non-blocking data transfers and kernel launches.
Loading diagram...
Asynchronous Operations
Offloading operations are queued asynchronously via AsyncInfoTy. Each device maintains a queue (e.g., CUDA stream) for pending operations. The runtime can synchronize explicitly or query completion non-blockingly. Post-processing functions registered on async objects execute after synchronization completes, enabling cleanup and dependent operations.
Device-Side Parallelism
On GPU devices, the OpenMP device runtime (openmp/device/src/) implements parallel regions using SPMD (Single Program Multiple Data) execution. All threads in a block execute the same code with different thread IDs. The runtime manages team state, synchronization, and task execution within the device kernel.