LLVM Compiler Infrastructure | Augment Code

Overview

Relevant Files

README.md
llvm/README.txt
llvm/include/llvm/IR
llvm/lib/IR
clang/README.md
llvm/tools/llc/llc.cpp

LLVM is a modular compiler infrastructure toolkit designed for constructing highly optimized compilers, optimizers, and runtime environments. The project is organized as a monorepo containing multiple interconnected components that work together to transform source code into efficient machine code.

Core Architecture

Loading diagram...

Key Components

LLVM Core (llvm/) is the foundation, containing:

IR (Intermediate Representation): A language-independent, low-level representation that serves as the central abstraction. Located in llvm/include/llvm/IR and llvm/lib/IR, it defines fundamental concepts like modules, functions, basic blocks, and instructions.
Optimization Passes: Transformations that improve code efficiency (scalar optimizations, vectorization, loop transformations).
Code Generation: Converts IR to target-specific machine code through instruction selection, register allocation, and scheduling.
Tools: Assembler, disassembler, bitcode analyzer, and optimizer utilities.

Clang (clang/) is the C-family frontend that compiles C, C++, Objective-C, and Objective-C++ into LLVM IR, enabling language-specific parsing and semantic analysis.

Supporting Libraries:

libc++: Modern C++ standard library implementation
LLD: Fast linker supporting ELF, Mach-O, COFF, and WebAssembly formats
Compiler-RT: Runtime support for sanitizers, profiling, and low-level operations
MLIR: Multi-Level IR for high-level compiler abstractions and domain-specific optimizations

Compilation Pipeline

The typical flow is: source code → frontend parsing → LLVM IR generation → optimization passes → target-specific code generation → assembly → linking → executable.

Each stage is modular, allowing custom frontends, optimization strategies, and backends. The IR serves as the universal intermediate format, enabling language-agnostic optimizations and cross-platform code generation.

Design Philosophy

LLVM emphasizes modularity, reusability, and extensibility. Components are designed as libraries rather than monolithic tools, allowing developers to embed LLVM in custom applications. The IR is human-readable (text format) and machine-readable (bitcode format), facilitating debugging and serialization.

Architecture & Core Components

Relevant Files

llvm/include/llvm/IR/Module.h
llvm/include/llvm/IR/Function.h
llvm/include/llvm/IR/BasicBlock.h
llvm/include/llvm/IR/Instruction.h
llvm/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp
llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp
llvm/lib/CodeGen/RegAllocGreedy.cpp
llvm/include/llvm/CodeGen/TargetPassConfig.h

LLVM IR Hierarchy

LLVM's Intermediate Representation is organized in a strict hierarchy. A Module is the top-level container holding all IR objects for a compilation unit. Each Module contains:

Functions – executable code units with a signature and body
Global Variables – module-level data declarations
Aliases and IFuncs – symbol references and indirect functions
Metadata – debug info, profiling data, and annotations

Functions contain BasicBlocks, which are sequences of instructions that execute linearly. Each BasicBlock must end with a terminator instruction (branch, return, etc.). Instructions are the atomic operations: arithmetic, memory access, control flow, and intrinsics.

IR Representation

The IR uses a graph-based representation where:

Values are the fundamental unit – instructions, constants, and arguments all produce values
Uses track data dependencies between values
Types constrain operations and enable optimization
Metadata attaches auxiliary information without affecting semantics

This design enables efficient analysis and transformation passes to reason about program behavior.

CodeGen Pipeline

LLVM transforms IR to machine code through multiple stages:

IR Preparation – Lowering high-level constructs (exceptions, intrinsics)
Instruction Selection – Converting IR to target-specific instructions via SelectionDAG or GlobalISel
Register Allocation – Assigning virtual registers to physical registers
Machine Code Emission – Generating assembly or object code

SelectionDAG vs GlobalISel

SelectionDAG is the traditional approach: IR is converted to a directed acyclic graph of operations, then pattern-matched to machine instructions. It’s mature and handles complex lowering well.

GlobalISel is the modern alternative: IR translates to generic machine IR (gMIR), then passes through legalization, register bank selection, and instruction selection. It’s more modular and target-independent.

Register Allocation

The greedy allocator assigns virtual registers to physical registers by:

Building live intervals for each virtual register
Iterating through intervals in priority order
Attempting direct assignment, splitting, or spilling as needed
Recoloring when conflicts arise

This balances compile-time performance with code quality.

Pass Infrastructure

LLVM uses a hierarchical pass manager supporting:

Function Passes – Operate on individual functions
Loop Passes – Iterate over loops with nesting awareness
Module Passes – Analyze or transform entire modules
Machine Function Passes – Work on machine-level IR

Passes declare analysis dependencies and preservation guarantees, enabling the manager to cache results and invalidate only affected analyses.

Target Abstraction

The TargetMachine abstracts target-specific behavior:

TargetLowering – Defines legal operations and calling conventions
TargetInstrInfo – Describes instruction properties and patterns
TargetRegisterInfo – Specifies register classes and constraints
TargetFrameLowering – Handles stack frame layout

This separation allows backends to customize lowering while reusing common infrastructure.

Clang C/C++ Frontend

Relevant Files

clang/lib/Frontend - Frontend orchestration and compilation pipeline
clang/lib/Parse - Syntax parsing and AST construction
clang/lib/Sema - Semantic analysis and type checking
clang/include/clang/Frontend/CompilerInstance.h - Main compiler coordinator
clang/include/clang/Parse/Parser.h - Parser interface
clang/include/clang/Sema/Sema.h - Semantic analyzer interface

The Clang C/C++ frontend is the core compilation pipeline that transforms source code into an Abstract Syntax Tree (AST). It orchestrates lexical analysis, parsing, and semantic analysis to produce a fully-typed, semantically-valid AST ready for code generation.

Architecture Overview

The frontend follows a classic three-phase compiler design:

Loading diagram...

Key Components

CompilerInstance (clang/lib/Frontend/CompilerInstance.cpp) is the central coordinator that manages all compilation objects: the preprocessor, target information, diagnostics engine, and AST context. It owns the lifecycle of these components and provides factory methods for creating them.

Parser (clang/lib/Parse/Parser.cpp) performs recursive descent parsing of tokens produced by the lexer. It builds the initial AST structure by recognizing language grammar rules. The parser handles declarations, expressions, statements, and templates, delegating semantic validation to Sema.

Sema (clang/lib/Sema/Sema.cpp) performs semantic analysis on the parsed AST. It validates type correctness, resolves names, checks access control, instantiates templates, and performs overload resolution. Sema is the largest component, split across 100+ files for different language features (declarations, expressions, templates, etc.).

Compilation Flow

Initialization: CompilerInstance creates the Preprocessor, which handles includes and macros
Parsing: Parser consumes preprocessed tokens and builds an initial AST
Semantic Analysis: Sema validates and enriches the AST with type information
Code Generation: CodeGen converts the typed AST to LLVM IR
Backend: LLVM optimizes and generates machine code

Frontend Actions

FrontendAction is the extensibility point for custom compilation behaviors. Subclasses like ParseSyntaxOnlyAction, EmitLLVMAction, and EmitAssemblyAction define what happens after parsing completes. This enables tools like clang-format, clang-tidy, and the static analyzer to reuse the frontend infrastructure.

Diagnostic System

The DiagnosticsEngine collects and reports errors, warnings, and notes throughout compilation. It integrates with SourceManager to provide precise source locations and context for each diagnostic, enabling helpful error messages with code snippets and fix-it suggestions.

Code Generation & Optimization

Relevant Files

llvm/lib/CodeGen - Machine code generation and optimization
llvm/lib/Transforms - IR-level transformation passes
llvm/lib/Passes - Pass pipeline infrastructure and builders
llvm/lib/Analysis - Analysis passes providing optimization data
llvm/lib/Target - Target-specific code generation

LLVM's code generation and optimization pipeline transforms high-level IR into efficient machine code through multiple stages. The process combines IR-level optimizations, target-independent transformations, and target-specific lowering.

Optimization Pipeline Architecture

The PassBuilder in llvm/lib/Passes orchestrates the entire optimization pipeline. It constructs different pass sequences based on optimization levels (O0, O1, O2, O3):

Module Simplification Pipeline - Early IR canonicalization and cleanup
Module Optimization Pipeline - Aggressive optimizations including vectorization
Function Simplification Pipeline - Per-function IR improvements
CGSCC Pipeline - Call-graph-driven interprocedural optimizations

Each pipeline is composed of individual passes that run in sequence, with analysis results cached and invalidated as needed.

Key Transformation Categories

IR-Level Transforms (llvm/lib/Transforms) include:

Scalar Optimizations - Dead code elimination, constant propagation, loop unrolling
Vectorization - Loop and SLP vectorization using cost models
Interprocedural Optimization (IPO) - Inlining, function specialization, attribute inference
Instrumentation - Sanitizers, profiling, coverage instrumentation

Code Generation (llvm/lib/CodeGen) handles:

SelectionDAG - IR to DAG lowering with pattern matching and optimization
Instruction Selection - DAG to machine instructions via target patterns
Register Allocation - Live range analysis and register assignment
Machine Scheduling - Instruction ordering for pipeline efficiency
Assembly Emission - Final machine code or assembly output

Analysis Infrastructure

Analysis passes provide critical information for optimization decisions:

Alias Analysis - Memory dependency tracking
Dominance Analysis - Control flow structure
Loop Analysis - Loop structure and properties
Scalar Evolution - Induction variable analysis
Target Transform Info - Target-specific cost modeling

Target-Specific Lowering

Each target in llvm/lib/Target implements:

Calling Conventions - ABI-compliant parameter passing
Instruction Lowering - Complex operations to target instructions
Frame Lowering - Stack frame setup and management
Register Info - Register constraints and allocation strategies

Loading diagram...

The pipeline preserves analysis results where possible to minimize recomputation. Passes declare which analyses they preserve, enabling the pass manager to avoid redundant computation and maintain correctness across the optimization sequence.

Linker & Binary Tools

Relevant Files

lld/README.md
lld/ELF/Driver.cpp, Writer.cpp, SymbolTable.h
lld/COFF/, lld/MachO/, lld/wasm/
llvm/lib/Object/ (Binary.cpp, ObjectFile.cpp, ELFObjectFile.cpp, COFFObjectFile.cpp, MachOObjectFile.cpp, WasmObjectFile.cpp)
llvm/lib/MC/ (MCStreamer.cpp, MCAssembler.cpp, MCObjectWriter.cpp, ELFObjectWriter.cpp, WinCOFFObjectWriter.cpp, MachObjectWriter.cpp)

LLVM's linker and binary tools infrastructure consists of two main layers: the Object library for reading/writing binary formats, and the MC (Machine Code) layer for assembly and code emission.

Object Library (llvm/lib/Object)

The Object library provides a unified interface for handling multiple binary formats. The core abstraction is the Binary class hierarchy:

Binary – Base class for all binary file types
ObjectFile – Represents relocatable object files (ELF, COFF, MachO, Wasm, XCOFF, GOFF)
Archive – Handles static libraries (.a, .lib files)
SymbolicFile – Base for files containing symbols

Each format has specialized classes: ELFObjectFile, COFFObjectFile, MachOObjectFile, WasmObjectFile. These provide format-specific APIs while maintaining a common interface for symbol iteration, section access, and relocation handling.

MC Layer (llvm/lib/MC)

The MC layer handles assembly and object file generation:

MCStreamer – Abstract interface for emitting assembly directives and machine code
MCObjectStreamer – Concrete implementation that builds an in-memory representation
MCAssembler – Manages sections, symbols, and fragments; performs layout and relaxation
MCObjectWriter – Format-specific writer (ELFObjectWriter, WinCOFFObjectWriter, MachObjectWriter)
MCAsmBackend – Target-specific assembly backend handling fixups and relaxation
MCCodeEmitter – Encodes machine instructions to bytes

The pipeline: MCStreamer receives directives → MCAssembler builds fragments → layout phase computes offsets → MCObjectWriter emits final binary.

LLD Linker Architecture

LLD is a modular linker with format-specific drivers:

lld/Common/ – Shared utilities (error handling, DWARF support, command-line parsing)
lld/ELF/ – ELF linker with architecture-specific support (Arch/ subdirectory)
lld/COFF/ – Windows PE/COFF linker
lld/MachO/ – macOS Mach-O linker
lld/wasm/ – WebAssembly linker

Each driver follows a similar pattern: Driver parses options → InputFiles reads objects → SymbolTable resolves symbols → Writer emits output. The ELF linker includes specialized passes: ICF (Identical Code Folding), MarkLive (garbage collection), Relocations (fixup application), and LTO integration.

Key Data Flow

Loading diagram...

Tools Built on These Libraries

llvm-readobj – Inspects binary files using Object library
llvm-objdump – Disassembles and displays object file contents
llvm-link – Links LLVM bitcode modules
llvm-jitlink – JIT linker for runtime code linking
lld – Main linker (supports ELF, COFF, MachO, Wasm via format detection)

Runtime Libraries & Standard Library

Relevant Files

libcxx/include - C++ Standard Library headers
libcxx/src - C++ Standard Library implementations
libcxxabi/src - C++ ABI runtime support
compiler-rt/lib - Compiler runtime support (builtins, sanitizers)
libunwind/src - Stack unwinding and exception handling
libc/src - LLVM C Standard Library implementation

Overview

The LLVM project provides a comprehensive suite of runtime libraries that form the foundation for compiled C and C++ programs. These libraries handle everything from basic arithmetic operations to exception handling, memory management, and standards compliance.

Core Components

libc++ (C++ Standard Library)

The C++ Standard Library provides containers, algorithms, iterators, and utilities required by the C++ standard. Located in libcxx/, it includes:

Containers: vector, map, set, deque, list, unordered_map, flat_map
Algorithms: Sorting, searching, transforming, and numeric operations
Utilities: Smart pointers, memory management, type traits, function objects
I/O Streams: File and string stream implementations
Threading: Mutexes, condition variables, threads, atomic operations
Ranges & Views: Modern C++20 range-based abstractions

libcxxabi (C++ ABI Runtime)

Provides low-level C++ runtime support in libcxxabi/src/:

Exception handling (cxa_exception.cpp, cxa_personality.cpp)
Type information and RTTI (private_typeinfo.cpp, stdlib_typeinfo.cpp)
Guard variables for static initialization (cxa_guard.cpp)
Name demangling (cxa_demangle.cpp)
Memory management for exceptions (fallback_malloc.cpp)

compiler-rt (Compiler Runtime)

Located in compiler-rt/lib/, provides essential compiler support:

Builtins: Low-level arithmetic, bit manipulation, floating-point operations
Sanitizers: AddressSanitizer (ASan), MemorySanitizer (MSan), ThreadSanitizer (TSan), UndefinedBehaviorSanitizer (UBSan)
Profiling: Code coverage and instrumentation profiling
CFI: Control Flow Integrity checking

libunwind (Stack Unwinding)

Implements stack unwinding for exception handling and debugging:

DWARF-based unwinding (DwarfParser.hpp, DwarfInstructions.hpp)
Compact unwinding support (CompactUnwinder.hpp)
Platform-specific register handling (Registers.hpp)
Exception handling personality routines

libc (C Standard Library)

LLVM's implementation of the C Standard Library in libc/src/:

Math: Comprehensive math functions with GPU support
String/Memory: String manipulation and memory operations
I/O: File and stream operations
Threading: POSIX threads and synchronization primitives
System: File system, process, and signal handling

Architecture & Dependencies

Loading diagram...

Key Design Patterns

Header-Only Components: Many libc++ utilities are header-only for optimization and template instantiation.

Platform Abstraction: Runtime libraries use conditional compilation to support multiple platforms (Linux, macOS, Windows, Fuchsia).

Modular Sanitizers: Each sanitizer (ASan, MSan, TSan) can be independently enabled during compilation.

ABI Stability: libcxxabi maintains stable ABI for C++ exception handling across compiler versions.

Integration Points

These libraries integrate seamlessly with the LLVM toolchain:

Clang Frontend: Generates calls to runtime functions
Code Generation: Emits intrinsics for builtins
Linker (lld): Links runtime libraries with user code
Optimization Passes: Leverage runtime knowledge for optimizations

MLIR: Multi-Level Intermediate Representation

Relevant Files

mlir/include/mlir - Core IR headers
mlir/lib/IR - IR implementation
mlir/lib/Dialect - All dialect implementations
mlir/docs/LangRef.md - Language reference
mlir/lib/RegisterAllDialects.cpp - Dialect registration

MLIR (Multi-Level Intermediate Representation) is a compiler infrastructure designed to represent, analyze, and transform computations at multiple abstraction levels. It bridges high-level dataflow graphs (like TensorFlow or PyTorch) down to target-specific machine code, making it ideal for deep learning and high-performance computing.

Core IR Structure

MLIR's fundamental building blocks form a hierarchical graph:

Operations - The basic computational units. Every operation has operands (inputs), results (outputs), attributes (metadata), and optional regions (nested code).
Values - Edges in the computation graph. Each value is produced by exactly one operation or block argument.
Blocks - Ordered lists of operations. Blocks can have arguments (like function parameters) and successors for control flow.
Regions - Ordered lists of blocks. Operations can contain multiple regions to represent nested structures (e.g., loop bodies, function definitions).
Attributes - Compile-time metadata attached to operations (constants, names, configuration).
Types - Value types in the type system (integers, floats, tensors, custom types).

Dialects: Extensibility Framework

Dialects are the mechanism for extending MLIR with domain-specific operations, types, and attributes. Each dialect represents a particular abstraction level or domain.

Key Dialects:

Builtin - Core types and operations (implicitly loaded in every context)
Func - Function definitions and calls
Arith - Arithmetic operations (add, mul, div)
MemRef - Memory reference operations (allocate, load, store)
Affine - Affine loop nests and polyhedral optimizations
Linalg - Linear algebra operations (matmul, conv)
Vector - SIMD vector operations
GPU - GPU-specific operations (kernels, synchronization)
LLVM - LLVM IR operations for lowering to machine code
SPIRV - SPIR-V operations for GPU portability
SCF - Structured control flow (if, for, while)
Tensor - Immutable tensor operations
Transform - Rewrite and transformation operations

Dialect Initialization

Dialects are registered via the DialectRegistry. The RegisterAllDialects.cpp file registers 40+ dialects including AMDGPU, ArmNeon, ArmSME, Async, Bufferization, Complex, ControlFlow, DLTI, EmitC, Index, IRDL, MLProgram, MPI, Math, NVGPU, OpenACC, OpenMP, PDL, Ptr, Quant, Shape, Shard, SparseTensor, Tosa, UB, WasmSSA, X86Vector, and XeGPU.

Multi-Level Representation

MLIR's power lies in representing code at multiple abstraction levels simultaneously:

High-level - Domain-specific operations (e.g., linalg.matmul)
Mid-level - Affine loops and memory operations
Low-level - LLVM IR and target-specific instructions

Passes and transformations progressively lower code from one level to another, enabling domain-specific optimizations at each stage.

Passes and Transformations

The pass infrastructure enables composable transformations:

Analysis passes - Compute properties without modifying IR
Transform passes - Rewrite and optimize IR
Conversion passes - Lower between dialects
Canonicalization - Simplify operations to canonical forms

Interfaces and Traits

Operations can implement interfaces (like BufferizableOpInterface) and traits (like Commutative, IsTerminator) to enable generic transformations and analyses.

%result = arith.addi %lhs, %rhs : i32
%tensor = linalg.matmul ins(%A, %B : tensor<4x4xf32>, tensor<4x4xf32>)
                        outs(%C : tensor<4x4xf32>) -> tensor<4x4xf32>

MLIR's extensibility and multi-level design make it a powerful foundation for building compilers across diverse domains.

Specialized Frontends & Languages

Relevant Files

flang/lib/Parser – Fortran parser implementation
flang/lib/Semantics – Semantic analysis for Fortran
flang/lib/Lower – Lowering to FIR intermediate representation
flang/lib/Optimizer – FIR dialect and optimization passes
clang-tools-extra/clangd – C++ language server implementation
clang-tools-extra/clangd/index – Symbol indexing and lookup

Flang: Fortran Compiler Frontend

Flang is a modern Fortran compiler frontend written in C++, designed to replace the legacy flang project. It provides a complete implementation of Fortran 2018 with support for OpenMP and OpenACC directives.

Compilation Pipeline:

Prescan & Parsing (flang/lib/Parser) – Converts source code to a parse tree using parser combinators. Handles fixed and free-form Fortran, preprocessor directives, and language extensions.
Semantic Analysis (flang/lib/Semantics) – Validates the parse tree, resolves names, performs type checking, and builds symbol tables. Produces a semantically-correct program representation.
Lowering to FIR (flang/lib/Lower) – Transforms the parse tree into Fortran IR (FIR), an MLIR-based intermediate representation. Uses the PFT (Program Functional Tree) builder to structure the program.
Optimization (flang/lib/Optimizer) – Applies MLIR passes on FIR to optimize code before final code generation.

FIR: Fortran Intermediate Representation

FIR is a dialect of MLIR specifically designed for Fortran semantics. It provides operations for memory management, array operations, and runtime descriptors.

Key Components:

FIR Dialect – Core operations like fir.load, fir.store, fir.call, fir.do_loop for low-level Fortran constructs
HLFIR Dialect – High-level operations for expressions and assignments without materializing temporaries, enabling better optimization
Type System – Supports Fortran types: !fir.ref<T> (reference), !fir.box<T> (descriptor), !fir.array<...> (arrays)

Clangd: C++ Language Server

Clangd provides IDE features for C++ via the Language Server Protocol (LSP), including code completion, hover information, diagnostics, and refactoring.

Architecture:

ClangdLSPServer – Implements LSP protocol, translates JSON-RPC messages to internal operations
ClangdServer – Core engine managing AST parsing, indexing, and feature computation
TUScheduler – Schedules AST parsing and analysis tasks asynchronously per translation unit
Symbol Index – Maintains searchable index of symbols using trigram-based Dex or in-memory MemIndex

Key Features:

Code Completion – Fuzzy matching with ranking based on context and usage frequency
Go-to Definition/Declaration – Cross-reference lookup via symbol index
Diagnostics – Real-time error checking with clang-tidy integration
Refactoring – Rename, extract function, and other code transformations
Background Indexing – Maintains project-wide symbol index for workspace queries

Indexing Strategy:

Clangd uses a multi-level indexing approach: dynamic index for open files, background index for the entire project, and optional static index for system libraries. Symbol references are tracked with location, kind, and container information for precise cross-reference queries.

Debugging, Profiling & Analysis Tools

Relevant Files

lldb/source - LLDB debugger implementation
llvm/lib/DebugInfo - Debug information handling (DWARF, CodeView, GSYM)
llvm/lib/ProfileData - Instrumentation and profiling data
llvm/lib/XRay - XRay function tracing system
bolt/lib - BOLT binary optimization framework

LLDB: The Debugger

LLDB is a next-generation, high-performance debugger built as reusable components leveraging LLVM libraries. It provides:

Multi-platform support: macOS, iOS, Linux, FreeBSD, Windows
Language support: C, C++, Objective-C, Objective-C++
Expression evaluation: Uses Clang compiler infrastructure for accurate expression parsing and JIT compilation
Data formatters: Custom pretty-printing for complex types
Remote debugging: Client-server architecture using gdb-remote protocol

Key components include breakpoints, watchpoints, stack frame inspection, variable tracking, and symbol resolution. LLDB exposes functionality through both CLI and C++ API.

Debug Information Tools

LLVM provides comprehensive debug information support:

DWARF: Primary debug format for ELF binaries, handled by llvm/lib/DebugInfo/DWARF
CodeView: Microsoft debug format for Windows binaries
GSYM: Compact debug format for symbolication and lookup
llvm-debuginfo-analyzer: Parses and displays debug info in human-readable logical view format

These tools support parsing, verification, and transformation of debug metadata across compilation units.

Profiling & Performance Analysis

XRay is a function call tracing system combining compiler-inserted instrumentation with runtime control:

Nop-sled instrumentation points in binaries
Flight Data Recorder (FDR) mode for fixed-memory circular buffering
Profiling mode for latency analysis
Tools: llvm-xray for trace analysis and graph generation

ProfileData library handles multiple profiling formats:

Instrumentation-based PGO: Compiler-inserted counters for profile-guided optimization
Sample-based profiling: Integration with Linux perf tool
MemProf: Memory allocation profiling
Coverage: Code coverage tracking

BOLT: Binary Optimization

BOLT is a post-link optimizer that improves application performance through code layout optimization:

Reads execution profiles from perf tool
Reorders basic blocks and functions for better cache utilization
Supports branch history (LBR/BRBE) for high-quality profiles
Key optimizations: function reordering, block splitting, shrink-wrapping, indirect call promotion

Workflow: Collect profile with perf → Convert with perf2bolt → Optimize with llvm-bolt

BOLT Address Translation (BAT) enables profile collection from optimized binaries for iterative optimization.

Integration & Workflow

These tools work together in a complete debugging and optimization pipeline:

Loading diagram...

Debug information enables source-level debugging, while profiling tools identify optimization opportunities. BOLT applies those optimizations at the binary level without recompilation.

Parallel Execution & Offloading

Relevant Files

openmp/runtime/src/kmp_runtime.cpp - Thread team management and fork/join
openmp/runtime/src/kmp_tasking.cpp - Task scheduling and execution
openmp/runtime/src/kmp_taskdeps.cpp - Task dependency tracking
openmp/device/src/Parallelism.cpp - Device-side parallel regions
offload/libomptarget/omptarget.cpp - Target offloading orchestration
offload/libomptarget/PluginManager.cpp - Plugin initialization and device management
offload/plugins-nextgen/common/include/PluginInterface.h - Device abstraction layer

Thread-Level Parallelism

OpenMP achieves thread-level parallelism through a fork-join model managed by the runtime. When encountering a #pragma omp parallel region, the master thread forks a team of worker threads. The runtime maintains thread pools to avoid expensive thread creation/destruction. Each thread has a kmp_info_t structure tracking its state, team membership, and task queue.

Thread synchronization uses barriers at fork and join points. The runtime implements multiple barrier algorithms (distributed, centralized) selectable via environment variables. Threads coordinate through atomic operations and condition variables to minimize busy-waiting.

Task Scheduling & Dependencies

Tasks are queued in per-thread deques managed by __kmp_push_task(). When a thread's deque fills, tasks are either executed immediately or the deque is expanded. The runtime supports task stealing—idle threads can steal work from other threads' deques.

Task dependencies are tracked via a dependency graph (kmp_depnode_t). When a task with dependencies is created, the runtime links it to predecessor tasks. Tasks remain blocked until all dependencies complete. Taskgroups provide synchronization points where the master task waits for all child tasks to finish.

// Task dependency linking
__kmp_depnode_link_successor(gtid, thread, task, node, plist);
// Blocks until dependencies satisfied
__kmpc_omp_task_with_deps(loc, gtid, task, ndeps, dep_list, ...);

Device Offloading Architecture

The offloading system uses a plugin-based architecture. PluginManager loads device-specific plugins (CUDA, Level Zero, AMDGPU, Host) at runtime. Each plugin implements the GenericPluginTy interface, providing device allocation, kernel launch, and data transfer operations.

Data mapping is managed by MappingInfo in libomptarget. Host pointers are mapped to device pointers, with reference counting to track active mappings. The AsyncInfoTy structure wraps asynchronous operations, allowing non-blocking data transfers and kernel launches.

Loading diagram...

Asynchronous Operations

Offloading operations are queued asynchronously via AsyncInfoTy. Each device maintains a queue (e.g., CUDA stream) for pending operations. The runtime can synchronize explicitly or query completion non-blockingly. Post-processing functions registered on async objects execute after synchronization completes, enabling cleanup and dependent operations.

Device-Side Parallelism

On GPU devices, the OpenMP device runtime (openmp/device/src/) implements parallel regions using SPMD (Single Program Multiple Data) execution. All threads in a block execute the same code with different thread IDs. The runtime manages team state, synchronization, and task execution within the device kernel.