Overview
Relevant Files
README.mdsrc/jq.hsrc/main.csrc/execute.csrc/compile.csrc/jv.h
jq is a lightweight, portable command-line JSON processor written in C with zero runtime dependencies. It enables users to slice, filter, map, and transform JSON data using a powerful query language, similar to how sed, awk, and grep work for text.
Core Architecture
jq operates through a three-stage pipeline:
- Parsing: The jq filter expression is parsed into an abstract syntax tree (AST)
- Compilation: The AST is compiled into bytecode for efficient execution
- Execution: The bytecode is executed against JSON input values
Key Components
jq_state (src/execute.c)
The central execution context that maintains:
- Compiled bytecode (
bc) - Execution stack for managing frames and variables
- Error handling callbacks
- Input/output callbacks for reading JSON and writing results
- Debug and halt state
jv (JSON Value) (src/jv.h)
A reference-counted value type representing JSON data:
- Supports all JSON types: null, boolean, number, string, array, object
- Implements copy-on-write semantics for efficiency
- All
jv_*functions consume input and produce output (exceptjv_copy)
Processing Pipeline (src/main.c)
The main entry point orchestrates:
- Command-line argument parsing (options like
-r,-s,-c, etc.) - Filter compilation via
jq_compile_args() - Input reading and JSON parsing
- Iterative execution via
jq_start()andjq_next() - Output formatting and error handling
Execution Model
Loading diagram...
Command-Line Interface
jq supports extensive options for controlling input/output behavior:
- Input modes:
-n(null input),-R(raw input),-s(slurp array) - Output modes:
-r(raw output),-c(compact),-C(color),-S(sort keys) - Arguments:
--arg,--argjson,--slurpfilefor passing data to filters - Advanced:
--stream(streaming parser),--seq(JSON sequence format)
Error Handling
Errors are managed through callbacks registered on jq_state:
- Compilation errors are reported during
jq_compile_args() - Runtime errors are captured as invalid
jvvalues with messages - The
haltandhalt_errorbuiltins allow programs to control exit codes
Architecture & Data Flow
Relevant Files
src/execute.c– Runtime execution engine and VM interpretersrc/bytecode.h– Bytecode structure and opcode definitionssrc/compile.h– Compilation IR and code generationsrc/jv.h– JSON value type system and operationssrc/parser.y– Bison grammar for jq filter syntaxsrc/main.c– CLI entry point and I/O handling
jq follows a classic three-stage pipeline: parse → compile → execute. Each stage transforms the input into a more machine-friendly representation.
Parse Stage
The lexer and Bison parser (defined in parser.y) tokenize and parse jq filter expressions into an abstract syntax tree. The parser generates a block structure—a doubly-linked list of intermediate instructions (struct inst). These instructions may contain free variables that are resolved later during compilation.
Compile Stage
The compiler (compile.c) transforms the instruction blocks into bytecode. Key steps include:
- Binding resolution: Free variables are matched with their definitions (functions, parameters, closures).
- Bytecode generation: Instructions are converted to a compact 16-bit opcode sequence stored in
struct bytecode. - Symbol table creation: Built-in C functions and user-defined functions are registered in a symbol table.
The bytecode structure contains:
code– Array of 16-bit opcodesconstants– JSON array of literal valuessubfunctions– Nested function definitionsglobals– Symbol table with C function pointers
Execute Stage
The runtime (execute.c) interprets bytecode using a stack-based virtual machine. The jq_state structure maintains:
- Data stack (
stk) – Holds JSON values during computation - Call frames – Track function calls with closures and local variables
- Fork points – Save state for backtracking (multiple outputs)
- Path tracking – Records the path to the current value for
getpath()andsetpath()
Loading diagram...
JSON Value System
All values are represented as jv structs (defined in jv.h). The type system includes:
- Primitive types: null, boolean, number, string
- Composite types: array, object
- Special types: invalid (for errors)
Values use reference counting for memory management. Most jv_* functions consume their inputs and produce new outputs, requiring careful lifetime management.
Bytecode Execution Model
The VM executes opcodes sequentially, manipulating the data stack. Key opcodes include:
- Stack ops:
LOADK(push constant),DUP(duplicate),POP(discard) - Data access:
INDEX(array/object access),EACH(iterate) - Control flow:
JUMP,FORK(for multiple outputs),TRY_BEGIN/TRY_END - Functions:
CALL_JQ(user function),CALL_BUILTIN(C function),RET(return) - Closures:
CLOSURE_CREATE,CLOSURE_REF(capture variables)
The FORK opcode is critical for jq's multiple-output semantics. It saves the current stack state and allows backtracking to explore alternative execution paths.
Parsing Pipeline
Relevant Files
src/lexer.l- Flex lexer definitionsrc/parser.y- Bison parser grammarsrc/parser.h- Generated parser headersrc/locfile.h- Location tracking for error reportingsrc/compile.h- Bytecode generation interface
The parsing pipeline transforms jq filter expressions into executable bytecode through three main stages: lexical analysis, syntax analysis, and code generation.
Lexical Analysis (Lexer)
The lexer (src/lexer.l) is built with Flex and tokenizes the input stream. It recognizes:
- Keywords:
def,if,then,else,and,or,try,catch,reduce,foreach, etc. - Operators: Arithmetic (
+,-,*,/,%), comparison (==,!=,<,>,<=,>=), and assignment operators (|=,+=,-=, etc.) - Literals: Numbers, strings (with interpolation support via
\(...)) - Identifiers: Function names, field accessors (
.field), and variable bindings ($var) - Structural tokens: Parentheses, brackets, and braces with state tracking
The lexer maintains a state machine to handle nested structures and string interpolation. Location tracking is performed via YY_USER_ACTION to record token positions for error reporting.
Syntax Analysis (Parser)
The parser (src/parser.y) uses Bison to build an abstract syntax tree (AST) from tokens. Key grammar rules include:
- Query: Pipes (
|), comma operators (,), and function definitions - Expr: Binary operations, assignments, conditionals, and try-catch blocks
- Term: Literals, identifiers, array/object construction, and function calls
- Patterns: Destructuring patterns for
asbindings and function parameters
Operator precedence is carefully defined to resolve shift-reduce conflicts. The parser generates block structures (intermediate representation) during reduction, enabling early optimization like constant folding.
Code Generation
As the parser reduces rules, it calls code generation functions from src/compile.h:
gen_const()- Emit constant valuesgen_call()- Generate function callsgen_binop()- Handle binary operations with constant foldinggen_cond()- Conditional branchinggen_reduce(),gen_foreach()- Loop constructs
These functions build a linked list of instructions that form the bytecode.
Error Handling & Location Tracking
The locfile structure tracks source file metadata and line mappings. When parsing errors occur, locfile_locate() reports precise error locations using the location struct (start/end byte offsets), enabling helpful error messages.
Loading diagram...
JSON Value System
Relevant Files
src/jv.hsrc/jv.csrc/jv_parse.csrc/jv_print.csrc/jv_alloc.h
The JSON Value System is the core data representation in jq. Every JSON value is represented as a jv struct, a compact 24-byte value type that can hold any JSON data type with automatic memory management through reference counting.
Core Data Structure
The jv struct is defined as:
typedef struct {
unsigned char kind_flags;
unsigned char pad_;
unsigned short offset;
int size;
union {
struct jv_refcnt* ptr;
double number;
} u;
} jv;
The kind_flags field encodes both the JSON type (lower 4 bits) and payload metadata (upper 4 bits). This compact design allows the entire value to fit in a CPU register on 64-bit systems, enabling efficient passing by value.
Supported JSON Types
The system supports eight JSON value kinds:
- Primitives:
null,true,false(stored inline, no allocation) - Numbers: Double-precision floats, with optional literal preservation for arbitrary precision
- Strings: UTF-8 encoded, reference-counted, with length tracking
- Arrays: Ordered collections with dynamic sizing
- Objects: Key-value maps with string keys
- Invalid: Error values with optional error messages
Memory Management
All jv functions follow a consistent ownership model: functions consume (decref) inputs and produce (incref) outputs, except jv_copy() which duplicates without consuming. This prevents accidental double-frees and simplifies resource tracking.
Reference counting is managed through jv_refcnt structures embedded in allocated payloads. When a reference count reaches zero, the memory is automatically freed.
Key Operations
jv jv_copy(jv); // Duplicate without consuming
void jv_free(jv); // Decrement refcount, free if zero
jv jv_array_append(jv, jv); // Add element to array
jv jv_object_set(jv, jv, jv); // Set object key-value pair
jv jv_string_concat(jv, jv); // Concatenate strings
Parsing and Printing
The jv_parse() function converts JSON text into jv values, supporting streaming mode for large inputs. The jv_print() family outputs values with configurable formatting (pretty-print, colors, ASCII escaping, sorted keys).
Number Precision
When USE_DECNUM is enabled, numbers preserve their original literal representation using the decNumber library, enabling arbitrary-precision arithmetic while maintaining compatibility with IEEE 754 doubles for standard operations.
Compilation & Bytecode Generation
Relevant Files
src/compile.c- Core compilation logic and IR generationsrc/compile.h- Compilation API and block operationssrc/bytecode.c- Bytecode serialization and disassemblysrc/bytecode.h- Bytecode structures and opcode definitionssrc/opcode_list.h- Complete opcode enumeration
Overview
The jq compiler transforms parsed filter expressions into executable bytecode through a two-stage process: Intermediate Representation (IR) generation and bytecode serialization. The IR uses a linked-list of instructions (struct inst) that are progressively refined through binding and optimization passes before final code generation.
Intermediate Representation (IR)
The IR is built as a doubly-linked list of struct inst nodes, where each instruction represents an operation in the filter pipeline. Key IR properties:
- Blocks: Sequences of instructions grouped as
struct blockwithfirstandlastpointers - Binding: Instructions track whether they reference free variables (unbound), define variables (self-bound), or use bound variables
- Subfunctions: Closures and function definitions are stored as nested blocks within
CLOSURE_CREATEinstructions - Constants: Literal values are embedded directly in instructions with the
OP_HAS_CONSTANTflag
The parser generates IR bottom-up, creating blocks that may contain unbound references. These are resolved through binding passes that match variable uses to their definitions.
Compilation Pipeline
Loading diagram...
Key Compilation Stages
1. IR Generation - Parser calls gen_* functions to create instruction blocks:
gen_const()- Load constant valuesgen_op_simple()- Simple operations (DUP, POP, etc.)gen_function()- Function definitions with parameters and bodygen_call()- Function calls with argument listsgen_reduce(),gen_foreach()- Complex control flow
2. Binding Resolution - block_bind() and related functions match unbound references to definitions:
- Scans for unbound instructions with matching symbol names
- Links them to their binding instruction via
bound_bypointer - Handles nested scopes (closures, subfunctions)
- Drops unreferenced definitions via
block_drop_unreferenced()
3. Call Expansion - expand_call_arglist() transforms high-level calls into bytecode sequences:
- Converts
CALL_JQinstructions into argument setup code - Handles builtin C functions vs. user-defined jq functions
- Resolves special variables like
$ENV
4. Code Generation - compile() produces final bytecode:
- Assigns bytecode positions to each instruction
- Allocates variable frame indices for local variables
- Builds constant pool for all embedded values
- Generates 16-bit instruction sequences with operands
Bytecode Structure
The bytecode is a flat array of uint16_t values where each instruction occupies a variable number of slots:
- Opcode (1 slot) - The operation to execute
- Operands (0-3 slots) - Depends on opcode flags:
OP_HAS_CONSTANT- Index into constant poolOP_HAS_VARIABLE- Nesting level & variable indexOP_HAS_BRANCH- Relative jump offsetOP_HAS_CFUNC- Builtin function index & argument count
The struct bytecode container holds:
code- Instruction arrayconstants- JSON array of literal valuessubfunctions- Nested bytecode for closuresdebuginfo- Function names, parameters, local variables
Opcode Categories
Stack Operations: LOADK, DUP, POP, PUSHK_UNDER - Manipulate the execution stack
Variables: LOADV, STOREV, LOADVN - Access local and closure variables
Control Flow: FORK, JUMP, JUMP_F, BACKTRACK - Branching and backtracking
Functions: CALL_JQ, CALL_BUILTIN, CLOSURE_CREATE - Function calls and definitions
Data: INDEX, EACH, INSERT, APPEND - JSON data manipulation
Error Handling: TRY_BEGIN, TRY_END, ERRORK - Exception handling
Execution Engine & VM
Relevant Files
src/execute.csrc/exec_stack.hsrc/bytecode.hsrc/opcode_list.hsrc/jq.h
The jq execution engine is a bytecode interpreter that processes compiled jq programs. It manages execution state, function calls, backtracking, and error handling through a sophisticated stack-based virtual machine.
Core Architecture
The execution engine centers on the jq_state structure, which maintains:
- Bytecode: The compiled program to execute
- Data Stack: Holds JSON values being processed
- Frame Stack: Manages function call frames with closures and local variables
- Fork Stack: Saves execution state for backtracking (supporting jq's multiple-output semantics)
- Error State: Tracks exceptions and error messages
The main execution loop is jq_next(), which interprets bytecode instructions one at a time, returning values as they are produced.
Stack Management
The execution engine uses a custom memory-efficient stack implementation (exec_stack.h) that manages variable-sized blocks:
- Data Stack (
stk_top): Holds JSON values (jv) being processed - Frame Stack (
curr_frame): Tracks function call frames containing closures and local variables - Fork Stack (
fork_top): Saves execution checkpoints for backtracking
Each stack operation is O(1), and the stack grows dynamically with alignment-aware memory management.
Bytecode Interpretation
The interpreter executes opcodes in a large switch statement. Key instruction categories:
- Stack Operations:
LOADK,DUP,POPmanipulate the data stack - Variables:
LOADV,STOREVaccess local variables and closures - Indexing:
INDEX,EACHaccess JSON structure elements - Control Flow:
JUMP,JUMP_F,BACKTRACKmanage execution paths - Functions:
CALL_JQ,TAIL_CALL_JQ,REThandle function calls - Error Handling:
TRY_BEGIN,TRY_ENDimplement try-catch semantics
Backtracking & Multiple Outputs
jq's ability to produce multiple outputs is implemented via backtracking:
stack_save(jq, pc, stack_get_pos(jq)); // Save checkpoint
// ... execute forward ...
pc = stack_restore(jq); // Restore to checkpoint
When an operation can produce multiple values (like [] on an array), the engine saves the current state, executes one path, then backtracks to explore alternatives.
Function Calls & Closures
Functions are represented as closures capturing their defining environment:
struct closure {
struct bytecode* bc; // Function bytecode
stack_ptr env; // Captured frame reference
};
Calls push new frames with closure parameters and local variables. Tail calls optimize recursion by reusing the current frame.
Error Handling
Errors are propagated via the error field in jq_state. When an error occurs:
set_error()records the error- Execution backtracks to the nearest
TRY_BEGINor exits jq_next()returns the error value to the caller
Initialization & Lifecycle
jq_state *jq = jq_init(); // Create VM
jq_compile(jq, ".foo | .bar"); // Compile program
jq_start(jq, input_value, flags); // Initialize execution
jv result = jq_next(jq); // Get next output
jq_teardown(&jq); // Clean up
The VM processes one input value at a time, yielding results via repeated jq_next() calls until exhausted.
Builtin Functions & Operations
Relevant Files
src/builtin.csrc/builtin.hsrc/builtin.jq
jq provides a comprehensive set of builtin functions and operations that form the core of the language. These are implemented across three layers: C-based primitives for performance-critical operations, jq-coded functions for higher-level abstractions, and bytecode-compiled builtins for control flow.
Arithmetic & Comparison Operations
Binary operators handle arithmetic and comparisons across multiple types:
+ (addition/concatenation)
- (subtraction/array difference)
* (multiplication/string repetition/object merge)
/ (division/string split)
% (modulo)
== (equality)
!= (inequality)
< > <= >= (ordering comparisons)
These operators are polymorphic—+ concatenates strings, merges objects, and combines arrays. The * operator performs recursive object merging when both operands are objects.
Type Conversion & Inspection
Core type functions enable runtime type checking and conversion:
type # Returns type name: "null", "boolean", "number", "string", "array", "object"
length # Array/object/string length; absolute value for numbers
tonumber # Parse string to number
tostring # Convert to string representation
toboolean # Parse "true"/"false" strings to booleans
tojson/fromjson # JSON serialization/deserialization
String Operations
String manipulation functions support common text processing:
startswith(s) # Check if string starts with s
endswith(s) # Check if string ends with s
split(sep) # Split string by separator
join(sep) # Join array elements with separator
ltrimstr(s) # Remove prefix
rtrimstr(s) # Remove suffix
ascii_upcase # Convert ASCII a-z to A-Z
ascii_downcase # Convert ASCII A-Z to a-z
explode/implode # Convert between strings and codepoint arrays
Array & Object Operations
Collection manipulation is central to jq:
keys/keys_unsorted # Get object keys or array indices
has(key) # Check key existence
contains(x) # Deep containment check
sort/sort_by(f) # Sort arrays
group_by(f) # Group by expression result
unique/unique_by(f) # Remove duplicates
min/max/min_by/max_by # Find extrema
reverse # Reverse arrays
flatten(depth) # Flatten nested arrays
add # Sum array elements or concatenate
Path Operations
Path functions enable dynamic data access and modification:
path(expr) # Get path(s) to expression results
getpath(p) # Retrieve value at path
setpath(p; v) # Set value at path
delpaths(paths) # Delete multiple paths
Advanced Features
Regex & Pattern Matching:
test(regex; flags) # Test if string matches pattern
match(regex; flags) # Extract match objects with captures
capture(regex; flags) # Extract named capture groups
scan(regex; flags) # Find all matches
split/sub/gsub # Pattern-based string operations
Date/Time Functions:
now # Current Unix timestamp
gmtime/localtime # Convert timestamp to broken-down time
mktime # Convert broken-down time to timestamp
strftime/strflocaltime # Format time with format string
strptime # Parse time string
Math Functions:
The system includes standard libm functions: floor, ceil, round, sqrt, log, exp, sin, cos, tan, asin, acos, atan, pow, fma, and many others. These operate on numeric inputs and return numeric results.
Control Flow & Generators
Bytecode-compiled builtins provide efficient control flow:
empty # Produce no output
not # Logical negation
range(n) # Generate integers 0 to n-1
limit(n; expr) # Limit output count
first/last/nth # Extract specific outputs
recurse(f) # Recursively apply function
while/until # Conditional iteration
Implementation Architecture
The builtin system uses three mechanisms:
- C Functions (
builtin.c): Performance-critical operations like arithmetic, type conversion, and string manipulation - jq-Coded Functions (
builtin.jq): Higher-level abstractions likemap,select,sort_bydefined in jq itself - Bytecode Builtins (
builtin.c): Control flow primitives compiled directly to bytecode for efficiency
The builtins_bind() function integrates all three layers, making them available to the jq runtime.
Module System & Linking
Relevant Files
src/linker.csrc/linker.hsrc/jv_file.csrc/compile.csrc/compile.h
jq's module system enables code reuse through import and include directives. The linker resolves module paths, loads files, and binds definitions into the program's namespace.
Core Concepts
Modules are .jq files containing function definitions and optional metadata. They can be imported with a namespace prefix (import "foo" as bar) or included directly (include "foo"). Data files (.json) can also be imported as constants.
Module Metadata is an optional object at the start of a module file that describes the module's properties. It's accessible via the modulemeta builtin.
Module Resolution
The linker searches for modules using a configurable search path. Resolution follows this order:
${search_dir}/${rel_path}.jq– Direct file match${search_dir}/${rel_path}/jq/main.jq– Directory with main entry${search_dir}/${rel_path}/${basename}.jq– Directory with same-named file
Search paths support substitutions:
~/→ User's home directory$ORIGIN/→ jq executable directory./→ Importing file's directory (or current directory for top-level)
Key Functions
load_program() is the entry point. It parses the main program, validates it has a top-level expression, and processes all dependencies recursively.
process_dependencies() extracts import statements from a block and resolves each one. It handles both code modules and data imports, managing a lib_loading_state to cache already-loaded libraries and prevent duplicate loading.
load_library() loads a single module file. For code modules, it parses the jq source; for data modules, it loads JSON. It recursively processes the module's own dependencies.
find_lib() locates a module file by searching the path chain. It validates relative paths (no .. or backslashes) and returns the resolved absolute path or an error.
Binding and Namespacing
After loading, modules are bound into the program using block_bind_library(). This function prefixes function names with the module's namespace (e.g., foo::bar), allowing multiple modules to coexist without name collisions.
Data imports create global constants. For example, import "data" as $d binds the JSON data to $d.
Include directives merge the module's definitions directly into the current namespace without prefixing.
Dependency Caching
The lib_loading_state structure tracks loaded libraries by filename. When a module is imported multiple times, the linker reuses the cached definition instead of reloading and reparsing, improving performance and ensuring consistent behavior.
Error Handling
Module resolution errors are reported with context. Optional imports (marked with optional: true in metadata) fail silently. Syntax errors in modules are caught during parsing and reported with the module's filename and line number.
CLI Interface & Main Entry Point
Relevant Files
src/main.csrc/util.csrc/util.h
The CLI entry point in main.c orchestrates the entire jq workflow: parsing command-line arguments, initializing the jq state machine, compiling the filter program, and processing input data through the execution engine.
Execution Flow
Loading diagram...
Argument Parsing
The CLI supports three invocation modes:
- Standard:
jq [options] <filter> [files...]– Process files with the filter - String args:
jq [options] --args <filter> [strings...]– Pass remaining args as$ARGS.positional[] - JSON args:
jq [options] --jsonargs <filter> [JSON_TEXTS...]– Parse remaining args as JSON values
Key options include:
- Input modes:
-n(null input),-R(raw input),-s(slurp) - Output formatting:
-c(compact),-r(raw),-j(join),-S(sort keys),-C/-M(color) - Program source:
-f(from file),--arg name value(set variables) - Advanced:
--stream(streaming parser),--seq(JSON sequence mode)
Core Processing Loop
After compilation, the main loop processes each input:
while (jq_util_input_errors(input_state) == 0 &&
(jv_is_valid((value = jq_util_input_next_input(input_state))))) {
ret = process(jq, value, jq_flags, dumpopts, options);
if (jq_halted(jq)) break;
}
The process() function calls jq_start() to initialize execution, then repeatedly calls jq_next() to retrieve results until the filter completes.
Input Management
The jq_util_input_state (defined in util.c) manages file I/O and buffering:
- Maintains a file list and current file pointer
- Handles stdin (
-) and regular files transparently - Buffers input with UTF-8 boundary awareness
- Supports both JSON parsing and raw line reading modes
- Tracks current filename and line number for error reporting
Output Handling
Output formatting respects multiple options:
- Raw strings (
-r): Output string values without JSON escaping - Compact (
-c): Single-line JSON without pretty-printing - Sorted keys (
-S): Alphabetically sort object keys - Color (
-C/-M): Enable/disable ANSI color codes - Streaming (
--seq): Insert ASCII record separator between outputs
Windows console output bypasses stdio to ensure proper UTF-8 handling via WriteFile().
Error Handling & Exit Codes
Exit codes reflect execution status:
0– Success (orfalse/nullif-enot set)2– System error (file I/O, memory)3– Compilation error4– No output produced (with-e)5– Runtime error
The -e flag enables exit-status mode, where the final result determines the exit code.
Platform-Specific Handling
Windows: The entry point uses wmain() to receive wide-character arguments, converting them to UTF-8 before passing to umain(). Console mode is set to binary UTF-8 text to prevent encoding issues.
OpenBSD: Uses pledge() to restrict system calls to stdio and rpath (file reading).
Locale: Initializes locale support via setlocale() when available.
Testing & Quality Assurance
Relevant Files
tests/jq.test– Main test suite with 2500+ test casessrc/jq_test.c– C implementation of test runnertests/setup– Shared test environment configurationtests/jqtest– Shell script entry point for main teststests/shtest– Shell integration and regression teststests/jq_fuzz_*.c– Fuzzing harnesses for security testingMakefile.am– Test targets and CI configuration
Test Suite Architecture
jq uses a multi-layered testing approach combining declarative test cases, shell scripts, and fuzzing. The primary test suite is defined in tests/jq.test, a simple text format with groups of three lines: program, input, and expected output. Blank lines and comments (starting with #) are ignored.
# Example test case
.foo
{"foo": 42}
42
The test runner (src/jq_test.c) parses this format and validates each program against its input, comparing actual output to expected results. Tests can be marked with %%FAIL to verify error handling.
Running Tests
Execute tests via make check, which runs multiple test suites:
- jqtest – Core functionality (2500+ cases from
jq.test) - mantest – Examples from the manual documentation
- shtest – Shell integration, regression tests, and edge cases
- utf8test, base64test, uritest – Format-specific tests
- onigtest – Regex functionality (when Oniguruma is enabled)
Use ENABLE_VALGRIND=1 make check to run with memory leak detection.
Test Infrastructure
The tests/setup script provides shared configuration: it sets locale to C, configures Valgrind if enabled, and defines helper variables like $JQ (path to binary) and $mods (module test directory).
Individual test scripts (e.g., jqtest, shtest) source this setup and invoke the jq binary with appropriate flags. The shtest script is particularly comprehensive, testing constant folding, JSON sequences, streaming, error injection, module loading, and color output.
Fuzzing
jq includes LLVM libFuzzer harnesses in tests/jq_fuzz_*.c for continuous fuzzing:
jq_fuzz_compile.c– Fuzzes the parser and compilerjq_fuzz_execute.cpp– Fuzzes program executionjq_fuzz_parse.c– Fuzzes JSON parsingjq_fuzz_parse_stream.c– Fuzzes streaming JSON parser
These harnesses help discover edge cases and security vulnerabilities in parsing and execution paths.
Quality Assurance Features
- Error message validation – Tests verify exact error output for invalid programs
- Regression tests – Specific tests for fixed bugs (e.g.,
#951,#2146) - Locale testing – Validates behavior across different locales
- Memory safety – Valgrind integration detects leaks and invalid memory access
- Sanitizers – Build with
--enable-asanor--enable-ubsanfor AddressSanitizer and UndefinedBehaviorSanitizer