HHVM: HipHop Virtual Machine | Augment Code

Overview

Relevant Files

README.md
hphp/hack/README.md
hphp/hhvm/main.cpp
hphp/doc/hackers-guide/README.md

HHVM (HipHop Virtual Machine) is an open-source virtual machine designed to execute programs written in Hack, a statically-typed language that interoperates seamlessly with PHP. The system uses just-in-time (JIT) compilation to achieve superior performance while maintaining the development flexibility of dynamic languages.

What is Hack?

Hack is a modern programming language that combines the fast development cycle of PHP with static typing and features found in contemporary languages. The Hack typechecker provides instantaneous type checking via a local server that watches the filesystem, typically completing in under 200 milliseconds. This enables developers to integrate static analysis into their workflow without noticeable delays.

System Architecture

HHVM operates as a multi-stage compilation pipeline:

Loading diagram...

The pipeline consists of:

Frontend (OCaml): Lexer, parser, and bytecode emitter convert source code to HHBC bytecode
HHBBC Optimizer: Performs whole-program analysis and optimization on bytecode
JIT Compiler (C++): Translates HHBC to machine code through HHIR and VASM intermediate representations
Runtime (C++): Executes bytecode or compiled machine code with sophisticated memory management

Key Components

Hack Typechecker: Static analysis engine providing fast, incremental type checking
Bytecode Compiler (hphpc): Translates source code to HHBC format
HHBBC: Whole-program bytecode optimizer using type inference and data flow analysis
JIT Compiler: Translates hot code paths to optimized machine code with runtime type specialization
Runtime VM: Executes bytecode with garbage collection, exception handling, and built-in functions
Hack Standard Library (HSL): Modern standard library with type-safe APIs
Server Infrastructure: Built-in Proxygen web server or FastCGI support for web hosting

Execution Modes

HHVM supports multiple execution modes:

Standalone scripts: Run Hack/PHP files directly with hhvm script.hack
Web server: Host applications via built-in Proxygen or FastCGI with nginx/Apache
Ahead-of-time compilation: Pre-compile to bytecode repo for deployment
JIT-enabled execution: Dynamically compile hot code paths to machine code

Development Workflow

The codebase is organized into distinct layers enabling clear separation of concerns. The compiler frontend handles parsing and type checking, the bytecode layer provides a portable intermediate format, the optimizer performs whole-program analysis, and the JIT compiler applies sophisticated runtime optimizations. This architecture enables HHVM to deliver both the flexibility of dynamic languages and the performance of statically-compiled systems.

Architecture & Compilation Pipeline

Relevant Files

hphp/compiler/compiler.h
hphp/hhbbc/README
hphp/doc/hackers-guide/jit-core.md
hphp/runtime/vm/jit/translate-region.h
hphp/runtime/vm/jit/irlower.cpp
hphp/doc/bytecode.specification

HHVM's architecture is organized into distinct compilation and execution layers, each optimized for specific tasks. The system transforms source code through multiple intermediate representations before reaching machine code execution.

Compilation Pipeline Overview

The compilation pipeline consists of four major stages:

Frontend (Hack Compiler): Parses Hack/PHP source code and emits HHBC bytecode
HHBBC Optimizer: Performs whole-program bytecode analysis and optimization
JIT Compiler: Translates hot code paths to machine code via HHIR and VASM
Runtime Execution: Executes bytecode or compiled machine code

Loading diagram...

HHBC: Bytecode Intermediate Format

HHBC (HipHop Bytecode) is a stack-based intermediate representation designed for both interpretation and JIT compilation. Each source file compiles to a separate unit containing bytecode instructions and metadata. The bytecode specification defines over 200 opcodes organized into categories: basic operations, literals, arithmetic, control flow, member operations, and function calls.

Key characteristics:

Stack-based execution model with explicit stack operations
Metadata tables for functions, classes, and exception handlers
Bytecode offsets for precise source location tracking
Support for dynamic typing with runtime type information

HHBBC: Whole-Program Optimizer

HHBBC performs sophisticated analysis on complete programs before JIT compilation. It uses a fixed-point iteration algorithm that refines type information across multiple passes until convergence.

The optimizer:

Builds an Index structure containing resolved function and class information
Performs type inference via abstract interpretation on control flow graphs
Tracks only-growing and only-shrinking type sets to enable safe optimizations
Runs analysis passes in parallel on work units (functions/classes)
Applies optimizations like constant propagation, dead code elimination, and function inlining

JIT Compilation: Region Selection to Machine Code

The JIT compiler translates selected bytecode regions to machine code through multiple lowering stages:

Region Selection: Identifies which bytecode to compile. The tracelet selector chooses regions based on current VM state and type information. Profile-guided optimization (PGO) enables selecting larger regions after profiling runs.

HHIR Generation: Bytecode regions are lowered to HHIR, an SSA-form intermediate representation with strong typing. Each HHIR value has a precise type (e.g., Int, Obj<=Class, Str). The irgen module contains emitters for each bytecode instruction, producing HHIR instructions that the IRBuilder optimizes during construction.

VASM Lowering: HHIR is lowered to VASM (virtual assembly), an architecture-neutral instruction set. This stage performs register allocation, instruction selection, and architecture-specific optimizations. Separate lowering passes handle x86-64 and ARM64 specifics.

Code Emission: VASM is finally emitted to native machine code, with relocation and metadata generation for debugging and profiling.

Type System

HHIR's type system represents sets of values with precise distinctions:

Primitive types: Int, Bool, Dbl, Str, Arr, Obj, Cls, Func
Specialized types: Obj<=Class (object of specific class or subclass), Arr(kind) (array of specific kind)
Constant types: Int<5>, Str("hello")
Reference-counted variants: CountedStr, PersistentStr, BoxedT, PtrToT

Type comparisons use set semantics: S <= T means S is a subtype of T. This enables precise type-based optimizations while maintaining soundness.

Execution Modes

HHVM supports multiple execution strategies:

Bytecode Interpretation: Direct HHBC execution for rarely-used code
JIT Compilation: Translates hot code to optimized machine code
Repo Authoritative Mode: Pre-compiles entire programs for deployment

Hack Typechecker & Static Analysis

Relevant Files

hphp/hack/src/hh_server.ml - Main typechecker daemon
hphp/hack/src/hh_client.ml - Client interface to typechecker
hphp/hack/src/hh_single_type_check.ml - Single-file type checking
hphp/hack/src/parser/ - Full-fidelity parser (Rust & OCaml)
hphp/hack/src/naming/ - Naming phase and NAST
hphp/hack/src/decl/ - Declaration phase
hphp/hack/src/typing/ - Type checking and TAST
hphp/hack/hhi/ - Hack header interface files

The Hack typechecker is a sophisticated static analysis system that enforces Hack's type system with sub-200ms latency. It operates as a daemon (hh_server) that watches the filesystem and performs incremental type checking, making it practical for real-time IDE integration.

Architecture Overview

Loading diagram...

The typechecker pipeline consists of four main phases:

Parsing: Full-fidelity parser (written in Rust) converts source code to an Abstract Syntax Tree (AST)
Naming: Elaboration phase resolves all names and produces a Named AST (NAST)
Declaration: Extracts type signatures for classes, functions, and constants
Typing: Performs type inference and checking, producing a Typed AST (TAST)

Key Components

hh_server is the daemon process that maintains an in-memory representation of the codebase. It uses Watchman to monitor file changes and performs incremental rechecking of affected files. The server maintains a dependency graph to determine which files need rechecking when a file changes.

hh_client is the user-facing interface. It communicates with hh_server via socket, sending commands like check, hover, find-refs, and autocomplete. For batch operations, hh_single_type_check performs standalone type checking without a daemon.

Parser is a full-fidelity recursive descent parser that preserves all source information including whitespace and comments. It's implemented in Rust for performance and can operate in declaration mode (fast, extracts only signatures) or full mode (complete AST).

Naming resolves all identifiers to their definitions. It handles namespace resolution, validates that all referenced symbols exist, and produces the NAST which is the input to type checking.

Declaration Phase extracts type information from class and function definitions without analyzing their bodies. This allows the typechecker to understand the public API of all files before type checking any function bodies.

Typing is the core type checking engine. It performs bidirectional type inference, constraint solving, and error reporting. The TAST (Typed AST) annotates every expression with its inferred type.

Data Structures

AST: Raw syntax tree from parser, preserves all source structure
NAST: Named AST with resolved identifiers, input to type checking
TAST: Typed AST with inferred types on every expression, enables IDE features
Shallow Decls: Fast type signatures extracted during declaration phase
Folded Decls: Complete class information including inherited members

Incremental Checking

The typechecker uses a dependency graph to track which files depend on which declarations. When a file changes, only files that transitively depend on its declarations are rechecked. This enables fast incremental updates even in large codebases.

Configuration

Type checking behavior is controlled via .hhconfig files in the project root. Options include strict mode enforcement, experimental features, and various type system strictness levels. The typechecker supports multiple modes: strict (most restrictive), partial (mixed typed/untyped), and decl (declaration-only).

Bytecode Compiler (hphpc)

Relevant Files

hphp/compiler/compiler.cpp
hphp/compiler/package.h
hphp/compiler/option.h
hphp/hhbbc/hhbbc.h
hphp/runtime/vm/unit-emitter.h
hphp/runtime/vm/hackc-translator.cpp

The bytecode compiler (hphpc) is the core component that transforms PHP/Hack source code into HHBC (HipHop Bytecode), an intermediate representation that HHVM executes. It orchestrates parsing, code generation, optimization, and serialization.

Compilation Pipeline

The compilation process follows these stages:

File Discovery & Packaging - The Package class scans directories and collects source files based on include patterns, static files, and exclusion rules. It manages file metadata and symbol references for on-demand parsing.
Parsing & Bytecode Generation - HackC (the Rust-based Hack compiler) parses source code and generates HHBC. The hackc_compile function invokes HackC, which produces a hhbc::Unit containing functions, classes, constants, and type definitions.
Translation to UnitEmitter - The hackc-translator.cpp converts HackC's internal representation into UnitEmitter objects. UnitEmitter is the pre-runtime representation that holds bytecode, metadata, and symbol information before runtime instantiation.
Whole-Program Optimization (HHBBC) - If enabled, the HHBBC optimizer performs whole-program analysis and optimization on all UnitEmitter objects. It analyzes type information, eliminates dead code, and optimizes bytecode across unit boundaries.
Emission & Serialization - Optimized UnitEmitter objects are either serialized to a bytecode repository or used to create runtime Unit objects.

Key Components

UnitEmitter - Represents a single compilation unit (typically one PHP file). It contains:

Bytecode for functions and methods
Class and type definitions
Constants and literals
Symbol references and dependencies
SHA1 hashes for source and bytecode

Package - Manages file discovery and compilation configuration:

Scans directories for source files
Applies include/exclude patterns
Tracks symbol references for parse-on-demand
Coordinates with extern_worker for distributed parsing

HHBBC (HipHop Bytecode to Bytecode Compiler) - Performs whole-program optimization:

Analyzes type information across all units
Eliminates unreachable code
Optimizes function calls and property access
Refines type inference based on global analysis

Compilation Modes

The compiler supports different output modes controlled by Option flags:

GenerateBinaryHHBC - Produces a serialized bytecode repository
GenerateTextHHBC - Outputs human-readable bytecode dumps
GenerateHhasHHBC - Generates HHAS (HipHop Assembly) text format
NoOutputHHBC - Performs compilation without output (for validation)

Distributed Compilation

For large codebases, hphpc uses extern_worker for distributed parsing and indexing:

Files are grouped and sent to worker processes
Each worker parses independently and returns UnitEmitter objects
Results are aggregated and passed to HHBBC for optimization
Configurable thread count via ParserThreadCount option

HHBBC: Bytecode Optimizer

Relevant Files

hphp/hhbbc/main.cpp
hphp/hhbbc/analyze.cpp
hphp/hhbbc/optimize.cpp
hphp/hhbbc/dce.cpp
hphp/hhbbc/index.h
hphp/hhbbc/whole-program.cpp

HHBBC (HipHop Bytecode to Bytecode Compiler) is a whole-program bytecode optimizer that runs after the Hack compiler emits HHBC. It performs sophisticated type inference and optimization passes to improve runtime performance.

Architecture Overview

HHBBC operates in three main phases:

Parse Phase: Converts UnitEmitters into an internal representation (php::Func, php::Class, php::Block structures)
Analysis Phase: Performs iterative type inference and dependency tracking
Optimization Phase: Applies bytecode transformations based on inferred types

Loading diagram...

Whole-Program Analysis

The core algorithm uses fixed-point iteration to refine type information:

Initial Pass: Analyze all functions and classes in parallel, recording dependencies on Index queries
Update Step: Single-threaded update of Index with newly inferred types
Dependency Scheduling: Re-analyze any function that queried information that changed
Repeat: Continue until Index reaches a fixed point (no new information discovered)

This approach ensures type information is never incorrect—only progressively refined. Functions that only grow types (like return types) are analyzed in context, while types that shrink (like property types) are stored in the Index.

Optimization Passes

After reaching a fixed point, the optimizer applies per-function transformations:

Iterator Optimization: Converts normal iterators to local iterators (liters) when the iterator base is stored in a local that isn't modified across the iteration loop.

Local DCE (Dead Code Elimination): Within each block, removes instructions whose results aren't used, assuming all variables are live at block exit.

Global DCE: Across all blocks, removes dead code using liveness analysis. Can change local types, triggering re-analysis.

Control Flow Optimization: Simplifies the CFG by merging blocks, removing unreachable code, and converting conditional jumps to unconditional ones when branches are identical.

Constant Propagation: Replaces instructions with constant values when types are narrowed to specific constants.

Type System

The type inference engine uses a forward dataflow analysis on an abstract interpreter:

Tracks types of locals and eval stack values through each block
Handles exceptional control flow by propagating pre-instruction state to throw edges
Supports constant values in types, enabling constant propagation
Distinguishes reference-counted vs. non-reference-counted values (important for copy-on-write optimization)

Key Data Structures

Index: Central repository of whole-program information. Stores inferred return types, class constants, property types, and dependency metadata. Thread-safe for concurrent reads during analysis.

FuncAnalysis: Per-function analysis result containing inferred types for each block's entry state, bytecode updates, and dependency information.

BlockData: Per-block state tracking, including reverse-post-order numbering and input/output type states.

Parallelization

Analysis passes run in parallel across all functions and classes—no thread synchronization needed since analysis only reads the immutable php representation and queries the thread-safe Index. The update step is single-threaded to safely modify the Index without locks.

JIT Compiler & Code Generation

Relevant Files

hphp/runtime/vm/jit/mcgen-translate.cpp
hphp/runtime/vm/jit/vasm-internal-inl.h
hphp/runtime/vm/jit/region-tracelet.cpp
hphp/runtime/vm/jit/vasm-x64.cpp
hphp/runtime/vm/jit/vasm-xls.cpp

The JIT compiler transforms selected bytecode regions into optimized machine code through a multi-stage pipeline. This process balances compilation speed with runtime performance through careful region selection, intermediate representation optimization, and architecture-specific code generation.

Region Selection & Tracelet Formation

The tracelet selector identifies which bytecode sequences to compile. It uses selectTracelet() to form regions based on current VM state, live type information, and execution frequency. Regions are bounded by configuration limits (MaxRegionInstrs, MaxLiveRegionInstrs) to control compilation time. The selector performs eager or lazy type guarding depending on the number of live locations, always eagerly guarding MBase (memory base) to catch type mismatches early.

HHIR Generation Pipeline

Bytecode regions are lowered to HHIR (HipHop Intermediate Representation), a strongly-typed SSA-form IR. The irgen module contains emitters for each bytecode instruction, producing HHIR instructions that the IRBuilder optimizes during construction. Key structures include:

IRUnit: Container for all IR blocks and instructions for a compilation unit
IRGS: IR generation state tracking the current block, stack state, and type information
IRBuilder: Constructs IR incrementally, performing parse-time optimizations and type analysis

The IR represents all operations with precise types (e.g., Int, Obj<=Class, Str), enabling type-safe optimizations and eliminating runtime type checks where possible.

Vasm: Virtual Assembly

HHIR is lowered to Vasm, a virtual assembly layer abstracting architecture differences. Vasm uses virtual registers (Vreg) and architecture-neutral instructions (Vinstr). A Vunit contains blocks of Vasm instructions organized into three code areas: main, cold, and frozen. This separation enables profile-guided code layout optimization.

Register Allocation with XLS

The Extended Linear Scan (XLS) algorithm allocates physical registers to virtual registers. The process:

Liveness Analysis: Computes which values are live at each instruction
Interval Building: Creates lifetime intervals for each Vreg with use positions
Register Assignment: Greedily assigns physical registers, splitting intervals when necessary
Spill Resolution: Inserts spill/reload instructions for values exceeding available registers

XLS handles register constraints, hints from copy instructions, and SIMD values. Spill slots are allocated in multiples of 16 bytes to maintain alignment.

Architecture-Specific Lowering & Emission

For x64, lowerForX64() transforms abstract Vasm instructions into concrete x64 operations. The Vgen template emits actual machine code, handling calling conventions, memory addressing modes, and instruction selection. Block layout is optimized using profile-guided ordering (pgoLayout) for Optimize translations, or RPO ordering otherwise.

Loading diagram...

Optimization Phases

The pipeline includes multiple optimization passes: IR-level optimizations during generation, Vasm-level copy optimization and dead code elimination, and post-register-allocation peephole optimization. Profile-guided retranslation (retranslateAll) recompiles hot functions with larger regions and aggressive optimizations after profiling data is collected.

Runtime & Virtual Machine

Relevant Files

hphp/runtime/vm/bytecode.h
hphp/runtime/vm/class.h
hphp/runtime/vm/func.h
hphp/runtime/base/execution-context.h
hphp/runtime/vm/act-rec.h
hphp/runtime/vm/unit.h

The HHVM runtime executes Hack/PHP code through a sophisticated virtual machine that combines bytecode interpretation with JIT compilation. The system is organized around three core concepts: Units (compilation units), Functions (Func), and Classes, all coordinated by the ExecutionContext.

Bytecode Execution Model

HHVM uses HHBC (HipHop Bytecode), a stack-based instruction set. The interpreter processes bytecode sequentially, with each opcode manipulating a value stack. The VmStack class manages this stack, providing methods like push(), pop(), and allocTV() for TypedValue manipulation. Bytecode execution can be triggered via enterVMAtCurPC(), which transitions control to the JIT compiler when available, or falls back to the interpreter.

Activation Records (ActRec)

Function calls are managed through ActRec structures, which represent call frames on the VM stack. Each ActRec contains:

m_sfp: Previous frame pointer (for RBP chaining)
m_savedRip: Return address (native code pointer)
m_funcId: Identifier of the executing function
m_callOffAndFlags: Bytecode offset and flags (LocalsDecRefd, IsInlined, AsyncEagerRet)
m_thisUnsafe / m_clsUnsafe: Instance or late-bound class context

The ActRec lifecycle has three states: pre-live (during FCall setup), live (executing), and post-live (after cleanup).

Units, Functions, and Classes

A Unit represents a compiled PHP file or eval block, containing:

m_funcs: Vector of global functions
m_preClasses: Vector of class definitions
m_litstrs, m_arrays: Literal string and array pools
Metadata for source locations and type information

Func objects describe function signatures, including parameter types, return types, exception handlers, and native function pointers. Class objects define class structure, methods, properties, and inheritance relationships.

ExecutionContext and VMState

The ExecutionContext (per-request) maintains the current VMState:

pc: Program counter (bytecode offset)
fp: Frame pointer (current ActRec)
sp: Stack pointer
jitReturnAddr: Return address from JIT code

This state is thread-local and synchronized between the interpreter and JIT compiler.

JIT Integration

The JIT compiler (jit::enterTC()) translates hot bytecode paths to native machine code. When the interpreter encounters a translation request, it calls handleTranslate() to compile the bytecode at the current offset. The JIT maintains a translation cache (TC) and uses service requests to handle runtime events like cache misses or type guard failures.

Function Calls

When executing an FCall opcode, doFCall() performs:

Argument arity and type checking
Generics validation
Coeffect verification
ActRec initialization with function metadata
Local variable initialization

The function prologue then executes, either as JIT-compiled code or interpreted bytecode.

Server Infrastructure & Web Hosting

Relevant Files

hphp/runtime/server/http-server.h
hphp/runtime/server/server.h
hphp/runtime/server/http-request-handler.h
hphp/runtime/server/proxygen/proxygen-server.h
hphp/runtime/server/fastcgi/fastcgi-server.h
hphp/runtime/server/transport.h

HHVM's server infrastructure provides a pluggable, high-performance HTTP hosting layer that supports multiple server backends and protocols. The architecture separates concerns between protocol handling, request processing, and application execution.

Core Architecture

The server infrastructure is built on a factory pattern that allows different server implementations to be registered and instantiated dynamically. The ServerFactoryRegistry maintains a mapping of server type names to factory objects, enabling new server types to be plugged in without modifying core code.

Loading diagram...

Request Handling Pipeline

The request handling flow follows a consistent pattern across all server implementations:

Connection Acceptance: A server backend (Proxygen or FastCGI) accepts incoming connections
Transport Creation: Each connection gets a Transport object that abstracts protocol details
Handler Instantiation: A RequestHandler is created via the registered factory
Request Execution: The handler processes the request through setup, execution, and teardown phases
Response Delivery: The transport sends the response back to the client

The RequestHandler base class defines the minimal interface: setupRequest(), handleRequest(), abortRequest(), and teardownRequest(). The HttpRequestHandler implementation handles PHP request execution, including URI resolution, static file serving, and proxy routing.

Server Backends

Proxygen Server is the modern, high-performance HTTP server built on Facebook's Proxygen library. It uses:

Multiple worker threads with event-driven I/O (libevent)
HPHPSessionAcceptor to handle HTTP/1.1 and HTTP/2 connections
ProxygenTransport for protocol abstraction
Connection pooling and efficient memory management

FastCGI Server enables HHVM to run as a FastCGI application server behind a web server (nginx, Apache):

FastCGIAcceptor listens for FastCGI protocol connections
FastCGISession manages individual FastCGI connections
FastCGITransport implements the FastCGI protocol
Supports multiplexing multiple requests over a single connection

Configuration & Lifecycle

The HttpServer class orchestrates the overall server lifecycle:

Initializes primary and optional secondary page servers
Manages graceful shutdown with PrepareToStop() and StopOldServer()
Tracks server statistics and shutdown events
Handles SSL/TLS configuration and certificate reloading

Server options include thread counts, queue sizes, socket configuration, SSL settings, and request timeouts. The warmup phase can throttle requests during startup to allow JIT compilation before full load.

Virtual Hosts & Routing

The VirtualHost system allows configuration of multiple virtual hosts with different document roots, path translations, and URL routing rules. Request URI resolution determines which virtual host handles a request and translates logical paths to filesystem paths.

Static content can be served directly from disk when enabled, bypassing PHP execution. Proxy routing allows unmatched URLs to be forwarded to upstream servers, enabling hybrid deployments.

Extensions & Standard Library

Relevant Files

hphp/runtime/ext/extension.h
hphp/runtime/ext/extension-registry.h
hphp/runtime/ext/extension-registry.cpp
hphp/hsl/src
hphp/runtime/ext/hsl
hphp/runtime/ext/std

Extension System Architecture

HHVM's extension system provides a modular way to add native functionality to the runtime. Extensions are C++ modules that register native functions, classes, and constants with the VM. The base Extension class in extension.h defines the lifecycle and interface for all extensions.

Each extension implements virtual methods for different initialization phases: moduleLoad() (configuration), moduleInit() (startup), moduleRegisterNative() (function registration), and moduleShutdown() (cleanup). Extensions can also define thread-local and request-local initialization via threadInit() and requestInit().

The ExtensionRegistry manages all loaded extensions, handling dependency ordering, initialization sequencing, and providing lookup functions like extension_loaded() and get_loaded_extensions().

Hack Standard Library (HSL)

The Hack Standard Library is a comprehensive, well-typed standard library built into HHVM since version 4.108. It provides consistent APIs organized into namespaces for common operations.

Core HSL Namespaces:

HH\Lib\C – Container operations (count, filter, map, reduce)
HH\Lib\Vec, HH\Lib\Dict, HH\Lib\Keyset – Hack array transformations
HH\Lib\Str – String manipulation with locale support
HH\Lib\Math – Mathematical functions and constants
HH\Lib\Async – Async utilities (Poll, Semaphore, LowPri)
HH\Lib\IO – File and stream I/O abstractions
HH\Lib\OS – Low-level POSIX operations (sockets, file descriptors)
HH\Lib\Network – TCP and Unix socket abstractions
HH\Lib\Regex – Regular expression utilities
HH\Lib\Random – Cryptographically secure random number generation
HH\Lib\Locale – Locale-aware string operations

HSL Implementation

HSL modules are implemented as extensions in hphp/runtime/ext/hsl/. Each namespace typically has a corresponding extension (e.g., hsl_io, hsl_os, hsl_random) that provides native implementations for performance-critical operations.

HSL code is embedded in the binary via systemlib files (.php and .hack files compiled into the executable). The hsl_systemlib extension loads the pure Hack implementations, while specialized extensions like hsl_os provide C++ bindings to system calls.

// Example: Using HSL for array operations
$numbers = vec[1, 2, 3, 4, 5];
$squared = Vec\map($numbers, $x ==> $x * $x);
$evens = Vec\filter($squared, $x ==> $x % 2 == 0);

Standard Extension

The standard extension provides core PHP-compatible functions for strings, arrays, files, math, and process control. It registers hundreds of native functions through registerNativeStandard() and related methods, maintaining backward compatibility with PHP while leveraging Hack's type system.

Extension Registration & Lifecycle

Loading diagram...

Extensions are registered globally via ExtensionRegistry::registerExtension(). The registry respects dependency ordering through getDeps(), ensuring extensions initialize in the correct sequence. Native functions are registered in a function table and resolved at runtime through the native function dispatch mechanism.

Testing, Debugging & Tools

Relevant Files

hphp/test/run.php - Main test runner orchestrating test execution
hphp/runtime/debugger/debugger.h - Debugger core infrastructure
hphp/tools/hhvm_wrapper.php - Development wrapper with debugging flags
hphp/test/README.md - Test suite organization and conventions
hphp/runtime/vm/debugger-hook.cpp - VM integration for debugging

Test Infrastructure

HHVM uses a comprehensive test suite organized into multiple suites: quick (high-quality, high-signal tests), slow (full-featured tests), and zend (PHP compatibility tests). Tests follow a file-based format where each test consists of a source file (.php, .hack, or .hhas) paired with expected output files (.expect or .expectf).

The test runner (hphp/test/run.php) orchestrates execution across multiple configurations including JIT mode, interpreter mode, and RepoAuthoritative mode. Tests can be executed with various options:

test/run test/quick                    # Quick tests with JIT
test/run -m interp -r test/slow        # Slow tests in interpreter + repo mode
test/run --list-tests test/quick       # List tests without running

Debugging Capabilities

HHVM provides multiple debugging interfaces. The hphpd debugger enables local and remote debugging with breakpoint support:

hhvm -m debug myscript.php             # Local debugging
hhvm -m debug -h mymachine.com         # Remote debugging

Within the debugger, you can set breakpoints, step through code, inspect variables, and evaluate expressions. The VSDebug extension provides IDE integration via the Debug Adapter Protocol, enabling debugging in VS Code and other compatible editors.

Diagnostic & Profiling Tools

The hhvm_wrapper.php script provides convenient shortcuts for common development tasks:

hhvm_wrapper.php -i test.php           # Run with interpreter (JIT disabled)
hhvm_wrapper.php -v test.php           # Dump bytecode (HHBC)
hhvm_wrapper.php --dump-hhas test.php  # Dump HHAS (human-readable assembly)
hhvm_wrapper.php -g --compile test.php # Run under GDB with repo compilation
TRACE=hhir:2 hhvm_wrapper.php test.php # Trace IR generation

Tracing & IR Inspection

The TRACE environment variable enables detailed logging of internal operations. Key modules include printir (JIT IR output), hhir (high-level IR), and bcinterp (bytecode interpreter):

TRACE=printir:1,hhir:2 hhvm script.php # Trace multiple modules
HPHP_TRACE_FILE=/tmp/trace.log hhvm script.php  # Write to file

Bytecode and IR can be dumped using runtime options:

hhvm -vEval.DumpBytecode=1 script.php  # Dump HHBC
hhvm -vEval.DumpHhas=1 script.php      # Dump HHAS
hhvm -vEval.DumpIR=2 script.php        # Dump JIT IR

Performance Analysis

HHVM integrates with jemalloc profiling for memory analysis and perf for CPU profiling. The admin server provides profiling endpoints when enabled:

hhvm -m server -vHHProf.Enabled=true -vHHProf.Active=true
jeprof 'localhost:8088/pprof/heap' > profile.raw

Test execution supports JIT serialization for profiling-guided optimization and retranslation testing to validate optimization correctness.