Install

jqlang/jq

jq - Lightweight JSON Query Language

Last updated on Dec 17, 2025 (Commit: ea9e41f)

Overview

Relevant Files
  • README.md
  • src/jq.h
  • src/main.c
  • src/execute.c
  • src/compile.c
  • src/jv.h

jq is a lightweight, portable command-line JSON processor written in C with zero runtime dependencies. It enables users to slice, filter, map, and transform JSON data using a powerful query language, similar to how sed, awk, and grep work for text.

Core Architecture

jq operates through a three-stage pipeline:

  1. Parsing: The jq filter expression is parsed into an abstract syntax tree (AST)
  2. Compilation: The AST is compiled into bytecode for efficient execution
  3. Execution: The bytecode is executed against JSON input values

Key Components

jq_state (src/execute.c) The central execution context that maintains:

  • Compiled bytecode (bc)
  • Execution stack for managing frames and variables
  • Error handling callbacks
  • Input/output callbacks for reading JSON and writing results
  • Debug and halt state

jv (JSON Value) (src/jv.h) A reference-counted value type representing JSON data:

  • Supports all JSON types: null, boolean, number, string, array, object
  • Implements copy-on-write semantics for efficiency
  • All jv_* functions consume input and produce output (except jv_copy)

Processing Pipeline (src/main.c) The main entry point orchestrates:

  • Command-line argument parsing (options like -r, -s, -c, etc.)
  • Filter compilation via jq_compile_args()
  • Input reading and JSON parsing
  • Iterative execution via jq_start() and jq_next()
  • Output formatting and error handling

Execution Model

Loading diagram...

Command-Line Interface

jq supports extensive options for controlling input/output behavior:

  • Input modes: -n (null input), -R (raw input), -s (slurp array)
  • Output modes: -r (raw output), -c (compact), -C (color), -S (sort keys)
  • Arguments: --arg, --argjson, --slurpfile for passing data to filters
  • Advanced: --stream (streaming parser), --seq (JSON sequence format)

Error Handling

Errors are managed through callbacks registered on jq_state:

  • Compilation errors are reported during jq_compile_args()
  • Runtime errors are captured as invalid jv values with messages
  • The halt and halt_error builtins allow programs to control exit codes

Architecture & Data Flow

Relevant Files
  • src/execute.c – Runtime execution engine and VM interpreter
  • src/bytecode.h – Bytecode structure and opcode definitions
  • src/compile.h – Compilation IR and code generation
  • src/jv.h – JSON value type system and operations
  • src/parser.y – Bison grammar for jq filter syntax
  • src/main.c – CLI entry point and I/O handling

jq follows a classic three-stage pipeline: parsecompileexecute. Each stage transforms the input into a more machine-friendly representation.

Parse Stage

The lexer and Bison parser (defined in parser.y) tokenize and parse jq filter expressions into an abstract syntax tree. The parser generates a block structure—a doubly-linked list of intermediate instructions (struct inst). These instructions may contain free variables that are resolved later during compilation.

Compile Stage

The compiler (compile.c) transforms the instruction blocks into bytecode. Key steps include:

  • Binding resolution: Free variables are matched with their definitions (functions, parameters, closures).
  • Bytecode generation: Instructions are converted to a compact 16-bit opcode sequence stored in struct bytecode.
  • Symbol table creation: Built-in C functions and user-defined functions are registered in a symbol table.

The bytecode structure contains:

  • code – Array of 16-bit opcodes
  • constants – JSON array of literal values
  • subfunctions – Nested function definitions
  • globals – Symbol table with C function pointers

Execute Stage

The runtime (execute.c) interprets bytecode using a stack-based virtual machine. The jq_state structure maintains:

  • Data stack (stk) – Holds JSON values during computation
  • Call frames – Track function calls with closures and local variables
  • Fork points – Save state for backtracking (multiple outputs)
  • Path tracking – Records the path to the current value for getpath() and setpath()
Loading diagram...

JSON Value System

All values are represented as jv structs (defined in jv.h). The type system includes:

  • Primitive types: null, boolean, number, string
  • Composite types: array, object
  • Special types: invalid (for errors)

Values use reference counting for memory management. Most jv_* functions consume their inputs and produce new outputs, requiring careful lifetime management.

Bytecode Execution Model

The VM executes opcodes sequentially, manipulating the data stack. Key opcodes include:

  • Stack ops: LOADK (push constant), DUP (duplicate), POP (discard)
  • Data access: INDEX (array/object access), EACH (iterate)
  • Control flow: JUMP, FORK (for multiple outputs), TRY_BEGIN/TRY_END
  • Functions: CALL_JQ (user function), CALL_BUILTIN (C function), RET (return)
  • Closures: CLOSURE_CREATE, CLOSURE_REF (capture variables)

The FORK opcode is critical for jq's multiple-output semantics. It saves the current stack state and allows backtracking to explore alternative execution paths.

Parsing Pipeline

Relevant Files
  • src/lexer.l - Flex lexer definition
  • src/parser.y - Bison parser grammar
  • src/parser.h - Generated parser header
  • src/locfile.h - Location tracking for error reporting
  • src/compile.h - Bytecode generation interface

The parsing pipeline transforms jq filter expressions into executable bytecode through three main stages: lexical analysis, syntax analysis, and code generation.

Lexical Analysis (Lexer)

The lexer (src/lexer.l) is built with Flex and tokenizes the input stream. It recognizes:

  • Keywords: def, if, then, else, and, or, try, catch, reduce, foreach, etc.
  • Operators: Arithmetic (+, -, *, /, %), comparison (==, !=, <, >, <=, >=), and assignment operators (|=, +=, -=, etc.)
  • Literals: Numbers, strings (with interpolation support via \(...))
  • Identifiers: Function names, field accessors (.field), and variable bindings ($var)
  • Structural tokens: Parentheses, brackets, and braces with state tracking

The lexer maintains a state machine to handle nested structures and string interpolation. Location tracking is performed via YY_USER_ACTION to record token positions for error reporting.

Syntax Analysis (Parser)

The parser (src/parser.y) uses Bison to build an abstract syntax tree (AST) from tokens. Key grammar rules include:

  • Query: Pipes (|), comma operators (,), and function definitions
  • Expr: Binary operations, assignments, conditionals, and try-catch blocks
  • Term: Literals, identifiers, array/object construction, and function calls
  • Patterns: Destructuring patterns for as bindings and function parameters

Operator precedence is carefully defined to resolve shift-reduce conflicts. The parser generates block structures (intermediate representation) during reduction, enabling early optimization like constant folding.

Code Generation

As the parser reduces rules, it calls code generation functions from src/compile.h:

  • gen_const() - Emit constant values
  • gen_call() - Generate function calls
  • gen_binop() - Handle binary operations with constant folding
  • gen_cond() - Conditional branching
  • gen_reduce(), gen_foreach() - Loop constructs

These functions build a linked list of instructions that form the bytecode.

Error Handling & Location Tracking

The locfile structure tracks source file metadata and line mappings. When parsing errors occur, locfile_locate() reports precise error locations using the location struct (start/end byte offsets), enabling helpful error messages.

Loading diagram...

JSON Value System

Relevant Files
  • src/jv.h
  • src/jv.c
  • src/jv_parse.c
  • src/jv_print.c
  • src/jv_alloc.h

The JSON Value System is the core data representation in jq. Every JSON value is represented as a jv struct, a compact 24-byte value type that can hold any JSON data type with automatic memory management through reference counting.

Core Data Structure

The jv struct is defined as:

typedef struct {
  unsigned char kind_flags;
  unsigned char pad_;
  unsigned short offset;
  int size;
  union {
    struct jv_refcnt* ptr;
    double number;
  } u;
} jv;

The kind_flags field encodes both the JSON type (lower 4 bits) and payload metadata (upper 4 bits). This compact design allows the entire value to fit in a CPU register on 64-bit systems, enabling efficient passing by value.

Supported JSON Types

The system supports eight JSON value kinds:

  • Primitives: null, true, false (stored inline, no allocation)
  • Numbers: Double-precision floats, with optional literal preservation for arbitrary precision
  • Strings: UTF-8 encoded, reference-counted, with length tracking
  • Arrays: Ordered collections with dynamic sizing
  • Objects: Key-value maps with string keys
  • Invalid: Error values with optional error messages

Memory Management

All jv functions follow a consistent ownership model: functions consume (decref) inputs and produce (incref) outputs, except jv_copy() which duplicates without consuming. This prevents accidental double-frees and simplifies resource tracking.

Reference counting is managed through jv_refcnt structures embedded in allocated payloads. When a reference count reaches zero, the memory is automatically freed.

Key Operations

jv jv_copy(jv);           // Duplicate without consuming
void jv_free(jv);         // Decrement refcount, free if zero
jv jv_array_append(jv, jv);    // Add element to array
jv jv_object_set(jv, jv, jv);  // Set object key-value pair
jv jv_string_concat(jv, jv);   // Concatenate strings

Parsing and Printing

The jv_parse() function converts JSON text into jv values, supporting streaming mode for large inputs. The jv_print() family outputs values with configurable formatting (pretty-print, colors, ASCII escaping, sorted keys).

Number Precision

When USE_DECNUM is enabled, numbers preserve their original literal representation using the decNumber library, enabling arbitrary-precision arithmetic while maintaining compatibility with IEEE 754 doubles for standard operations.

Compilation & Bytecode Generation

Relevant Files
  • src/compile.c - Core compilation logic and IR generation
  • src/compile.h - Compilation API and block operations
  • src/bytecode.c - Bytecode serialization and disassembly
  • src/bytecode.h - Bytecode structures and opcode definitions
  • src/opcode_list.h - Complete opcode enumeration

Overview

The jq compiler transforms parsed filter expressions into executable bytecode through a two-stage process: Intermediate Representation (IR) generation and bytecode serialization. The IR uses a linked-list of instructions (struct inst) that are progressively refined through binding and optimization passes before final code generation.

Intermediate Representation (IR)

The IR is built as a doubly-linked list of struct inst nodes, where each instruction represents an operation in the filter pipeline. Key IR properties:

  • Blocks: Sequences of instructions grouped as struct block with first and last pointers
  • Binding: Instructions track whether they reference free variables (unbound), define variables (self-bound), or use bound variables
  • Subfunctions: Closures and function definitions are stored as nested blocks within CLOSURE_CREATE instructions
  • Constants: Literal values are embedded directly in instructions with the OP_HAS_CONSTANT flag

The parser generates IR bottom-up, creating blocks that may contain unbound references. These are resolved through binding passes that match variable uses to their definitions.

Compilation Pipeline

Loading diagram...

Key Compilation Stages

1. IR Generation - Parser calls gen_* functions to create instruction blocks:

  • gen_const() - Load constant values
  • gen_op_simple() - Simple operations (DUP, POP, etc.)
  • gen_function() - Function definitions with parameters and body
  • gen_call() - Function calls with argument lists
  • gen_reduce(), gen_foreach() - Complex control flow

2. Binding Resolution - block_bind() and related functions match unbound references to definitions:

  • Scans for unbound instructions with matching symbol names
  • Links them to their binding instruction via bound_by pointer
  • Handles nested scopes (closures, subfunctions)
  • Drops unreferenced definitions via block_drop_unreferenced()

3. Call Expansion - expand_call_arglist() transforms high-level calls into bytecode sequences:

  • Converts CALL_JQ instructions into argument setup code
  • Handles builtin C functions vs. user-defined jq functions
  • Resolves special variables like $ENV

4. Code Generation - compile() produces final bytecode:

  • Assigns bytecode positions to each instruction
  • Allocates variable frame indices for local variables
  • Builds constant pool for all embedded values
  • Generates 16-bit instruction sequences with operands

Bytecode Structure

The bytecode is a flat array of uint16_t values where each instruction occupies a variable number of slots:

  • Opcode (1 slot) - The operation to execute
  • Operands (0-3 slots) - Depends on opcode flags:
    • OP_HAS_CONSTANT - Index into constant pool
    • OP_HAS_VARIABLE - Nesting level & variable index
    • OP_HAS_BRANCH - Relative jump offset
    • OP_HAS_CFUNC - Builtin function index & argument count

The struct bytecode container holds:

  • code - Instruction array
  • constants - JSON array of literal values
  • subfunctions - Nested bytecode for closures
  • debuginfo - Function names, parameters, local variables

Opcode Categories

Stack Operations: LOADK, DUP, POP, PUSHK_UNDER - Manipulate the execution stack

Variables: LOADV, STOREV, LOADVN - Access local and closure variables

Control Flow: FORK, JUMP, JUMP_F, BACKTRACK - Branching and backtracking

Functions: CALL_JQ, CALL_BUILTIN, CLOSURE_CREATE - Function calls and definitions

Data: INDEX, EACH, INSERT, APPEND - JSON data manipulation

Error Handling: TRY_BEGIN, TRY_END, ERRORK - Exception handling

Execution Engine & VM

Relevant Files
  • src/execute.c
  • src/exec_stack.h
  • src/bytecode.h
  • src/opcode_list.h
  • src/jq.h

The jq execution engine is a bytecode interpreter that processes compiled jq programs. It manages execution state, function calls, backtracking, and error handling through a sophisticated stack-based virtual machine.

Core Architecture

The execution engine centers on the jq_state structure, which maintains:

  • Bytecode: The compiled program to execute
  • Data Stack: Holds JSON values being processed
  • Frame Stack: Manages function call frames with closures and local variables
  • Fork Stack: Saves execution state for backtracking (supporting jq's multiple-output semantics)
  • Error State: Tracks exceptions and error messages

The main execution loop is jq_next(), which interprets bytecode instructions one at a time, returning values as they are produced.

Stack Management

The execution engine uses a custom memory-efficient stack implementation (exec_stack.h) that manages variable-sized blocks:

  • Data Stack (stk_top): Holds JSON values (jv) being processed
  • Frame Stack (curr_frame): Tracks function call frames containing closures and local variables
  • Fork Stack (fork_top): Saves execution checkpoints for backtracking

Each stack operation is O(1), and the stack grows dynamically with alignment-aware memory management.

Bytecode Interpretation

The interpreter executes opcodes in a large switch statement. Key instruction categories:

  • Stack Operations: LOADK, DUP, POP manipulate the data stack
  • Variables: LOADV, STOREV access local variables and closures
  • Indexing: INDEX, EACH access JSON structure elements
  • Control Flow: JUMP, JUMP_F, BACKTRACK manage execution paths
  • Functions: CALL_JQ, TAIL_CALL_JQ, RET handle function calls
  • Error Handling: TRY_BEGIN, TRY_END implement try-catch semantics

Backtracking & Multiple Outputs

jq's ability to produce multiple outputs is implemented via backtracking:

stack_save(jq, pc, stack_get_pos(jq));  // Save checkpoint
// ... execute forward ...
pc = stack_restore(jq);  // Restore to checkpoint

When an operation can produce multiple values (like [] on an array), the engine saves the current state, executes one path, then backtracks to explore alternatives.

Function Calls & Closures

Functions are represented as closures capturing their defining environment:

struct closure {
  struct bytecode* bc;  // Function bytecode
  stack_ptr env;        // Captured frame reference
};

Calls push new frames with closure parameters and local variables. Tail calls optimize recursion by reusing the current frame.

Error Handling

Errors are propagated via the error field in jq_state. When an error occurs:

  1. set_error() records the error
  2. Execution backtracks to the nearest TRY_BEGIN or exits
  3. jq_next() returns the error value to the caller

Initialization & Lifecycle

jq_state *jq = jq_init();           // Create VM
jq_compile(jq, ".foo | .bar");      // Compile program
jq_start(jq, input_value, flags);   // Initialize execution
jv result = jq_next(jq);            // Get next output
jq_teardown(&jq);                   // Clean up

The VM processes one input value at a time, yielding results via repeated jq_next() calls until exhausted.

Builtin Functions & Operations

Relevant Files
  • src/builtin.c
  • src/builtin.h
  • src/builtin.jq

jq provides a comprehensive set of builtin functions and operations that form the core of the language. These are implemented across three layers: C-based primitives for performance-critical operations, jq-coded functions for higher-level abstractions, and bytecode-compiled builtins for control flow.

Arithmetic & Comparison Operations

Binary operators handle arithmetic and comparisons across multiple types:

+ (addition/concatenation)
- (subtraction/array difference)
* (multiplication/string repetition/object merge)
/ (division/string split)
% (modulo)
== (equality)
!= (inequality)
< > <= >= (ordering comparisons)

These operators are polymorphic—+ concatenates strings, merges objects, and combines arrays. The * operator performs recursive object merging when both operands are objects.

Type Conversion & Inspection

Core type functions enable runtime type checking and conversion:

type              # Returns type name: "null", "boolean", "number", "string", "array", "object"
length            # Array/object/string length; absolute value for numbers
tonumber          # Parse string to number
tostring          # Convert to string representation
toboolean         # Parse "true"/"false" strings to booleans
tojson/fromjson   # JSON serialization/deserialization

String Operations

String manipulation functions support common text processing:

startswith(s)     # Check if string starts with s
endswith(s)       # Check if string ends with s
split(sep)        # Split string by separator
join(sep)         # Join array elements with separator
ltrimstr(s)       # Remove prefix
rtrimstr(s)       # Remove suffix
ascii_upcase      # Convert ASCII a-z to A-Z
ascii_downcase    # Convert ASCII A-Z to a-z
explode/implode   # Convert between strings and codepoint arrays

Array & Object Operations

Collection manipulation is central to jq:

keys/keys_unsorted    # Get object keys or array indices
has(key)              # Check key existence
contains(x)           # Deep containment check
sort/sort_by(f)       # Sort arrays
group_by(f)           # Group by expression result
unique/unique_by(f)   # Remove duplicates
min/max/min_by/max_by # Find extrema
reverse               # Reverse arrays
flatten(depth)        # Flatten nested arrays
add                   # Sum array elements or concatenate

Path Operations

Path functions enable dynamic data access and modification:

path(expr)        # Get path(s) to expression results
getpath(p)        # Retrieve value at path
setpath(p; v)     # Set value at path
delpaths(paths)   # Delete multiple paths

Advanced Features

Regex & Pattern Matching:

test(regex; flags)      # Test if string matches pattern
match(regex; flags)     # Extract match objects with captures
capture(regex; flags)   # Extract named capture groups
scan(regex; flags)      # Find all matches
split/sub/gsub          # Pattern-based string operations

Date/Time Functions:

now                     # Current Unix timestamp
gmtime/localtime        # Convert timestamp to broken-down time
mktime                  # Convert broken-down time to timestamp
strftime/strflocaltime  # Format time with format string
strptime                # Parse time string

Math Functions: The system includes standard libm functions: floor, ceil, round, sqrt, log, exp, sin, cos, tan, asin, acos, atan, pow, fma, and many others. These operate on numeric inputs and return numeric results.

Control Flow & Generators

Bytecode-compiled builtins provide efficient control flow:

empty               # Produce no output
not                 # Logical negation
range(n)            # Generate integers 0 to n-1
limit(n; expr)      # Limit output count
first/last/nth      # Extract specific outputs
recurse(f)          # Recursively apply function
while/until         # Conditional iteration

Implementation Architecture

The builtin system uses three mechanisms:

  1. C Functions (builtin.c): Performance-critical operations like arithmetic, type conversion, and string manipulation
  2. jq-Coded Functions (builtin.jq): Higher-level abstractions like map, select, sort_by defined in jq itself
  3. Bytecode Builtins (builtin.c): Control flow primitives compiled directly to bytecode for efficiency

The builtins_bind() function integrates all three layers, making them available to the jq runtime.

Module System & Linking

Relevant Files
  • src/linker.c
  • src/linker.h
  • src/jv_file.c
  • src/compile.c
  • src/compile.h

jq's module system enables code reuse through import and include directives. The linker resolves module paths, loads files, and binds definitions into the program's namespace.

Core Concepts

Modules are .jq files containing function definitions and optional metadata. They can be imported with a namespace prefix (import "foo" as bar) or included directly (include "foo"). Data files (.json) can also be imported as constants.

Module Metadata is an optional object at the start of a module file that describes the module's properties. It's accessible via the modulemeta builtin.

Module Resolution

The linker searches for modules using a configurable search path. Resolution follows this order:

  1. ${search_dir}/${rel_path}.jq – Direct file match
  2. ${search_dir}/${rel_path}/jq/main.jq – Directory with main entry
  3. ${search_dir}/${rel_path}/${basename}.jq – Directory with same-named file

Search paths support substitutions:

  • ~/ → User's home directory
  • $ORIGIN/ → jq executable directory
  • ./ → Importing file's directory (or current directory for top-level)

Key Functions

load_program() is the entry point. It parses the main program, validates it has a top-level expression, and processes all dependencies recursively.

process_dependencies() extracts import statements from a block and resolves each one. It handles both code modules and data imports, managing a lib_loading_state to cache already-loaded libraries and prevent duplicate loading.

load_library() loads a single module file. For code modules, it parses the jq source; for data modules, it loads JSON. It recursively processes the module's own dependencies.

find_lib() locates a module file by searching the path chain. It validates relative paths (no .. or backslashes) and returns the resolved absolute path or an error.

Binding and Namespacing

After loading, modules are bound into the program using block_bind_library(). This function prefixes function names with the module's namespace (e.g., foo::bar), allowing multiple modules to coexist without name collisions.

Data imports create global constants. For example, import "data" as $d binds the JSON data to $d.

Include directives merge the module's definitions directly into the current namespace without prefixing.

Dependency Caching

The lib_loading_state structure tracks loaded libraries by filename. When a module is imported multiple times, the linker reuses the cached definition instead of reloading and reparsing, improving performance and ensuring consistent behavior.

Error Handling

Module resolution errors are reported with context. Optional imports (marked with optional: true in metadata) fail silently. Syntax errors in modules are caught during parsing and reported with the module's filename and line number.

CLI Interface & Main Entry Point

Relevant Files
  • src/main.c
  • src/util.c
  • src/util.h

The CLI entry point in main.c orchestrates the entire jq workflow: parsing command-line arguments, initializing the jq state machine, compiling the filter program, and processing input data through the execution engine.

Execution Flow

Loading diagram...

Argument Parsing

The CLI supports three invocation modes:

  1. Standard: jq [options] <filter> [files...] – Process files with the filter
  2. String args: jq [options] --args <filter> [strings...] – Pass remaining args as $ARGS.positional[]
  3. JSON args: jq [options] --jsonargs <filter> [JSON_TEXTS...] – Parse remaining args as JSON values

Key options include:

  • Input modes: -n (null input), -R (raw input), -s (slurp)
  • Output formatting: -c (compact), -r (raw), -j (join), -S (sort keys), -C/-M (color)
  • Program source: -f (from file), --arg name value (set variables)
  • Advanced: --stream (streaming parser), --seq (JSON sequence mode)

Core Processing Loop

After compilation, the main loop processes each input:

while (jq_util_input_errors(input_state) == 0 &&
       (jv_is_valid((value = jq_util_input_next_input(input_state))))) {
  ret = process(jq, value, jq_flags, dumpopts, options);
  if (jq_halted(jq)) break;
}

The process() function calls jq_start() to initialize execution, then repeatedly calls jq_next() to retrieve results until the filter completes.

Input Management

The jq_util_input_state (defined in util.c) manages file I/O and buffering:

  • Maintains a file list and current file pointer
  • Handles stdin (-) and regular files transparently
  • Buffers input with UTF-8 boundary awareness
  • Supports both JSON parsing and raw line reading modes
  • Tracks current filename and line number for error reporting

Output Handling

Output formatting respects multiple options:

  • Raw strings (-r): Output string values without JSON escaping
  • Compact (-c): Single-line JSON without pretty-printing
  • Sorted keys (-S): Alphabetically sort object keys
  • Color (-C/-M): Enable/disable ANSI color codes
  • Streaming (--seq): Insert ASCII record separator between outputs

Windows console output bypasses stdio to ensure proper UTF-8 handling via WriteFile().

Error Handling & Exit Codes

Exit codes reflect execution status:

  • 0 – Success (or false/null if -e not set)
  • 2 – System error (file I/O, memory)
  • 3 – Compilation error
  • 4 – No output produced (with -e)
  • 5 – Runtime error

The -e flag enables exit-status mode, where the final result determines the exit code.

Platform-Specific Handling

Windows: The entry point uses wmain() to receive wide-character arguments, converting them to UTF-8 before passing to umain(). Console mode is set to binary UTF-8 text to prevent encoding issues.

OpenBSD: Uses pledge() to restrict system calls to stdio and rpath (file reading).

Locale: Initializes locale support via setlocale() when available.

Testing & Quality Assurance

Relevant Files
  • tests/jq.test – Main test suite with 2500+ test cases
  • src/jq_test.c – C implementation of test runner
  • tests/setup – Shared test environment configuration
  • tests/jqtest – Shell script entry point for main tests
  • tests/shtest – Shell integration and regression tests
  • tests/jq_fuzz_*.c – Fuzzing harnesses for security testing
  • Makefile.am – Test targets and CI configuration

Test Suite Architecture

jq uses a multi-layered testing approach combining declarative test cases, shell scripts, and fuzzing. The primary test suite is defined in tests/jq.test, a simple text format with groups of three lines: program, input, and expected output. Blank lines and comments (starting with #) are ignored.

# Example test case
.foo
{"foo": 42}
42

The test runner (src/jq_test.c) parses this format and validates each program against its input, comparing actual output to expected results. Tests can be marked with %%FAIL to verify error handling.

Running Tests

Execute tests via make check, which runs multiple test suites:

  • jqtest – Core functionality (2500+ cases from jq.test)
  • mantest – Examples from the manual documentation
  • shtest – Shell integration, regression tests, and edge cases
  • utf8test, base64test, uritest – Format-specific tests
  • onigtest – Regex functionality (when Oniguruma is enabled)

Use ENABLE_VALGRIND=1 make check to run with memory leak detection.

Test Infrastructure

The tests/setup script provides shared configuration: it sets locale to C, configures Valgrind if enabled, and defines helper variables like $JQ (path to binary) and $mods (module test directory).

Individual test scripts (e.g., jqtest, shtest) source this setup and invoke the jq binary with appropriate flags. The shtest script is particularly comprehensive, testing constant folding, JSON sequences, streaming, error injection, module loading, and color output.

Fuzzing

jq includes LLVM libFuzzer harnesses in tests/jq_fuzz_*.c for continuous fuzzing:

  • jq_fuzz_compile.c – Fuzzes the parser and compiler
  • jq_fuzz_execute.cpp – Fuzzes program execution
  • jq_fuzz_parse.c – Fuzzes JSON parsing
  • jq_fuzz_parse_stream.c – Fuzzes streaming JSON parser

These harnesses help discover edge cases and security vulnerabilities in parsing and execution paths.

Quality Assurance Features

  • Error message validation – Tests verify exact error output for invalid programs
  • Regression tests – Specific tests for fixed bugs (e.g., #951, #2146)
  • Locale testing – Validates behavior across different locales
  • Memory safety – Valgrind integration detects leaks and invalid memory access
  • Sanitizers – Build with --enable-asan or --enable-ubsan for AddressSanitizer and UndefinedBehaviorSanitizer