ripgrep Wiki | Augment Code

Overview

Relevant Files

README.md
Cargo.toml
crates/core/README.md
crates/grep/README.md
crates/searcher/README.md
crates/matcher/src/lib.rs
crates/ignore/src/lib.rs

ripgrep (rg) is a blazingly fast line-oriented search tool that recursively searches directories for regex patterns. It combines intelligent filtering, high-performance regex matching, and a modular architecture to deliver exceptional speed while respecting .gitignore rules and automatically skipping hidden files and binary data.

What is ripgrep?

ripgrep is a command-line utility that searches files for patterns you specify. By default, it respects .gitignore, .ignore, and .rgignore files, automatically filters out hidden files and directories, and skips binary files. These defaults can be disabled with rg -uuu for unrestricted searching. The tool is available on Windows, macOS, and Linux with precompiled binaries for every release.

Why ripgrep is fast

ripgrep achieves exceptional performance through several key optimizations:

Rust's regex engine – Uses finite automata, SIMD instructions, and aggressive literal optimizations
UTF-8 aware – Decoding is built directly into the deterministic finite automaton, maintaining Unicode support without sacrificing speed
Smart I/O strategy – Automatically chooses between memory mapping (for single files) and incremental buffering (for large directories)
Parallel directory traversal – Lock-free recursive directory iterator via crossbeam and the ignore crate
Efficient ignore matching – Uses RegexSet to match file paths against multiple glob patterns simultaneously

Architecture overview

ripgrep is organized as a Rust workspace with specialized crates, each handling a distinct responsibility:

Loading diagram...

Core crates:

crates/core – Entry point with CLI interface definition and glue code orchestrating the search pipeline
crates/grep – High-level facade combining all grep-related crates into a unified library API
crates/matcher – Abstract trait-based interface for regex implementations (supports Rust regex and PCRE2)
crates/searcher – High-level search execution handling context lines, binary detection, UTF-16 transcoding, and I/O strategy selection
crates/printer – Formats and outputs results in human-readable, aggregate, or JSON Lines format
crates/ignore – Fast recursive directory iterator with .gitignore and file type filtering
crates/regex – Matcher implementation using Rust's regex engine
crates/pcre2 – Optional matcher implementation using PCRE2 for advanced features like lookaround and backreferences
crates/globset – Efficient glob pattern matching for file filtering
crates/cli – Shared CLI utilities and helpers

Key design principles

Modularity – Each crate is independently reusable. The grep crate provides a library interface for embedding ripgrep's search capabilities in other projects.

Pluggable regex engines – The Matcher trait allows swapping between Rust's regex engine (default) and PCRE2 (optional) without changing search logic.

Automatic optimization – ripgrep intelligently selects I/O strategies, binary detection modes, and encoding handling based on file characteristics.

Filtering by default – Respects version control ignore files and skips hidden/binary files automatically, reducing noise in search results.

Architecture & Crate Organization

Relevant Files

crates/grep/src/lib.rs
crates/core/main.rs
crates/core/search.rs
crates/searcher/src/lib.rs
crates/matcher/README.md
Cargo.toml

ripgrep is organized as a modular workspace of specialized crates, each handling a distinct responsibility in the search pipeline. This architecture enables code reuse, testability, and pluggable regex engines.

Core Crate Organization

The workspace consists of 10 member crates:

grep - High-level facade that re-exports all grep-related crates as a unified library interface
grep-matcher - Low-level trait for pluggable regex engines (Rust regex or PCRE2)
grep-regex - Matcher implementation using Rust's regex crate
grep-pcre2 - Optional matcher implementation using PCRE2 (feature-gated)
grep-searcher - Core search engine: reads bytes, applies matchers, reports results via sinks
grep-printer - Output formatters (standard grep-like, summary, JSON Lines)
grep-cli - CLI utilities (command execution, decompression, binary detection)
globset - Glob pattern matching for file filtering
ignore - Gitignore-aware directory traversal
core - ripgrep's CLI binary (not a published crate)

Data Flow Architecture

Loading diagram...

Search Pipeline

The search execution follows this sequence:

Argument Parsing (crates/core/flags) - Convert CLI flags into high-level HiArgs configuration
Matcher Selection - Choose between Rust regex or PCRE2 based on features and flags
Directory Traversal - Use ignore crate to walk directories respecting .gitignore rules
File Processing - For each file, determine if preprocessing or decompression is needed
Search Execution - Searcher reads bytes and applies Matcher using memory maps when possible
Result Formatting - Printer formats matches (standard, summary, or JSON) and writes to output

Single vs. Multi-threaded Search

Single-threaded (search()) - Sequential file processing, simpler control flow
Multi-threaded (search_parallel()) - Parallelism via ignore crate's build_parallel(), with buffered output to prevent interleaving

Both paths use the same SearchWorker abstraction, ensuring consistent behavior.

Key Abstractions

Matcher Trait - Enables pluggable regex engines. Implementations must support pattern matching, finding, and iteration over matches.

Sink Trait - Abstracts result reporting. Callers receive callbacks for match lines, context lines, and search completion. Enables flexible output handling.

SearchWorker - Orchestrates matcher, searcher, and printer. Handles preprocessor execution, decompression, and binary detection logic.

Feature Flags

pcre2 - Enables PCRE2 regex engine as an alternative to Rust regex (default: disabled)

Pattern Matching & Regex Engines

Relevant Files

crates/matcher/src/lib.rs - Core Matcher trait interface
crates/regex/src/matcher.rs - Rust regex engine implementation
crates/pcre2/src/matcher.rs - PCRE2 engine implementation
crates/regex/src/literal.rs - Literal extraction optimization

ripgrep supports multiple regex engines through a pluggable architecture. The core abstraction is the Matcher trait, which defines a low-level interface for pattern matching. This design allows different regex engines to be swapped without changing the search logic.

The Matcher Trait

The Matcher trait in grep-matcher provides an abstract interface for regex implementations. Key methods include:

find_at(haystack, at) - Find the first match starting from position at
find_iter(haystack, callback) - Iterate over all non-overlapping matches
captures_at(haystack, at, caps) - Extract capturing groups from a match
is_match(haystack) - Quick boolean check for match existence
shortest_match(haystack) - Return earliest possible match end position

The trait uses internal iteration (push model) rather than external iteration. This design choice enables support for regex engines that require internal iteration and avoids Rust's type system limitations with generic external iterators.

Regex Engine Implementations

Rust Regex Engine (grep-regex): Uses regex-automata for fast, memory-safe pattern matching. Supports Unicode by default and includes optimizations like literal extraction. The builder applies transformations like word boundaries (\b) and whole-line anchors (^...$) before compilation.

PCRE2 Engine (grep-pcre2): Wraps the PCRE2 C library for Perl-compatible regex syntax. Useful when patterns require features not available in Rust's regex engine. Includes JIT compilation support for performance.

Optimization Strategies

// Literal extraction: for patterns like \w+foo\w+,
// extract "foo" and search for it first
let fast_line_regex = InnerLiterals::new(&chir, &regex).one_regex()?;

ripgrep extracts literals from patterns to create fast candidate matchers. When a line terminator is set, the engine can search for literal substrings first, then verify full pattern matches only on candidate lines. This dramatically speeds up searches like \w+foo\w+.

Capturing Groups & Replacements

Both engines support capturing groups through the Captures trait. The interpolate method handles replacement string expansion with $1, $name syntax. Matchers can report non-matching bytes and line terminators to enable further optimizations.

Line-Oriented Matching

The find_candidate_line method enables fast line-level filtering. Matchers can return confirmed matches or candidate lines that need verification. This is crucial for ripgrep's line-oriented search model, where most operations work on complete lines rather than arbitrary byte ranges.

File Filtering & Ignore Rules

Relevant Files

crates/ignore/src/lib.rs
crates/ignore/src/gitignore.rs
crates/ignore/src/overrides.rs
crates/ignore/src/walk.rs
crates/globset/src/lib.rs
crates/ignore/src/types.rs

The ripgrep project uses a sophisticated multi-layered filtering system to determine which files and directories to include or exclude during recursive directory traversal. This system combines glob patterns, gitignore semantics, file type matching, and explicit overrides.

Core Components

Globset Crate provides cross-platform glob pattern matching. It converts glob patterns to regular expressions for efficient matching of multiple patterns simultaneously. Key features include support for * (any characters), ? (single character), ** (recursive directory matching), character classes [abc], and alternation {a,b}.

Ignore Crate implements the main filtering logic through three primary modules:

Gitignore Module — Parses and matches patterns from .gitignore, .ignore, and .git/info/exclude files. It respects gitignore semantics including negation patterns (prefixed with !) and directory-only patterns (suffixed with /).
Overrides Module — Manages explicit include/exclude patterns provided via command-line flags like --include and --exclude. These take precedence over gitignore rules.
Types Module — Matches files by type using glob patterns. Default types include rust (*.rs), c (*.{c,h}), and many others. Users can select or negate specific types.

Matching Precedence

The WalkBuilder applies filtering rules in this strict order:

Glob Overrides — Explicit patterns from --include/--exclude flags
Ignore Files — .ignore files override .gitignore files; more nested files override less nested ones
File Type Matching — Applied only to files, not directories
Hidden Files — Skipped by default unless explicitly whitelisted
File Size Limits — Compared against max_filesize setting
Yield — If all checks pass, the entry is yielded

Usage Example

use ignore::WalkBuilder;

let walker = WalkBuilder::new("./")
    .hidden(false)           // Include hidden files
    .max_depth(Some(3))      // Limit recursion depth
    .build();

for result in walker {
    match result {
        Ok(entry) => println!("{}", entry.path().display()),
        Err(err) => eprintln!("Error: {}", err),
    }
}

Performance Optimization

The globset crate achieves high performance by matching multiple glob patterns simultaneously using compiled regex automata, rather than testing each pattern sequentially. This is particularly beneficial when many patterns must be evaluated against each path.

Loading diagram...

Search Execution & Performance

Relevant Files

crates/searcher/src/lib.rs
crates/searcher/src/searcher/mod.rs
crates/searcher/src/searcher/core.rs
crates/searcher/src/searcher/mmap.rs
crates/searcher/src/sink.rs
crates/printer/src/lib.rs
crates/printer/src/standard.rs

Core Search Pipeline

The search execution follows a push-based model where the Searcher drives execution and pushes results to a Sink implementation. This architecture separates concerns: the searcher handles byte consumption and pattern matching, while sinks handle result reporting.

The principal components are:

Searcher - Reads bytes from a source (file, stdin, memory slice) and applies a Matcher (regex pattern) over those bytes
Matcher - A trait for pattern matching (implemented by grep-regex using Rust's regex crate)
Sink - A trait that receives callbacks for matches, context lines, and search completion

// Example: Basic search execution
let matcher = RegexMatcher::new(r"pattern")?;
let mut sink = UTF8(|line_num, line| {
    println!("{}: {}", line_num, line);
    Ok(true)  // Continue searching
});
Searcher::new().search_slice(&matcher, data, sink)?;

Performance Optimizations

Memory Mapping

The searcher can optionally use memory maps for faster file access when enabled via MmapChoice::auto(). This is disabled by default due to safety concerns (potential SIGBUS on file mutations). When enabled, the searcher uses heuristics to determine if mmap is advantageous based on file size and platform.

// Memory mapping is NOT used on macOS by default
// Falls back to standard OS read calls if mmap fails

Buffering Strategy

The searcher uses a 64 KB default buffer for incremental searching. This prevents excessive memory usage when processing large files or files with very long lines. The BufferAllocation strategy controls how the buffer expands:

Eager (default) - Expand buffer until the next line fits or memory is exhausted
Error(limit) - Fail if additional allocation exceeds the specified limit

Binary Detection

Three strategies for handling binary data:

none() - No detection (default)
quit(byte) - Stop searching when the byte is found
convert(byte) - Replace the byte with the line terminator

Binary detection is applied differently depending on the search method: continuously for buffered searches, and selectively for memory-mapped searches.

Output Handling

The grep-printer crate provides multiple output formats:

Standard - Human-readable grep-like output with colors, line numbers, and context
JSON - Machine-readable JSON Lines format for programmatic consumption
Summary - Aggregate statistics (match counts, file paths)

Each printer implements the Sink trait, receiving callbacks as matches are found. The Standard printer supports advanced features like search-and-replace, column limiting, and hyperlink generation.

Sink Lifecycle

A Sink receives callbacks in this order:

begin() - Search starts
matched() - Match found (required to implement)
context() - Context line found (optional)
context_break() - Gap between context groups (optional)
binary_data() - Binary data detected (optional)
finish() - Search completes successfully

If an error occurs or the sink returns false, the search stops immediately. The finish() callback is not called if an error occurs.

Line-by-Line vs. Multi-Line Search

The searcher automatically selects between two execution paths:

Line-by-line (fast path) - Optimized for single-line patterns, processes one line at a time
Multi-line (slow path) - Handles patterns spanning multiple lines, uses more complex buffering

The choice is made automatically based on the matcher's capabilities and configuration.

CLI Interface & Configuration

Relevant Files

crates/core/flags/defs.rs
crates/core/flags/parse.rs
crates/core/flags/lowargs.rs
crates/core/flags/hiargs.rs
GUIDE.md

Ripgrep's CLI is built on a sophisticated flag system that transforms raw command-line arguments into structured, type-safe representations. The architecture uses a two-level parsing strategy: low-level arguments capture raw flag values, while high-level arguments construct complex objects needed for search execution.

Flag Categories

Flags are organized into seven logical categories that structure the help output and documentation:

Input: Pattern and file source flags (-e, -f)
Search: Pattern matching behavior (-i, -S, -F, --engine)
Filter: File selection and ignore rules (--type, --glob, --no-ignore)
Output: Result formatting (-n, -c, --color, --context)
Output Modes: Mutually exclusive output formats (--count, --json, --vimgrep)
Logging: Debug and trace output (--debug, --trace)
Other Behaviors: Miscellaneous options (--version, --help, --type-list)

Flag Traits and Variants

Each flag implements the Flag trait, which defines its behavior. A single logical flag can have multiple manifestations:

Long form: --encoding
Short form: -E
Negation: --no-encoding
Aliases: Alternative names for compatibility

For example, the encoding flag supports -E, --encoding, and --no-encoding, all manipulating the same internal state.

Two-Level Parsing Architecture

Raw CLI Arguments
        ↓
    Parser (lexopt)
        ↓
    LowArgs (raw values)
        ↓
    HiArgs (constructed objects)
        ↓
    Search Execution

LowArgs stores raw flag values directly from parsing. It contains simple types like bool, String, and Vec<T>.

HiArgs constructs complex objects from LowArgs. For instance, glob patterns are combined into a single matcher, and color specifications are parsed into a ColorSpecs object.

Configuration File Support

Ripgrep reads configuration from RIPGREP_CONFIG_PATH. The config file format is simple: each line is a single argument, with # comments supported. Arguments are prepended to CLI arguments, allowing command-line flags to override config settings.

# Example config file
--max-columns=150
--smart-case
--type-add=web:*.{html,css,js}
--hidden

Special Modes

Beyond normal search, ripgrep supports special modes triggered by specific flags:

--help / -h: Display help documentation
--version: Show version information
--type-list: List all supported file types
--generate: Generate shell completions (bash, zsh, fish, powershell)

These modes bypass normal search execution and instead generate output or exit early.

Flag Validation and Interaction

The parser enforces flag interactions and constraints:

Mutually exclusive flags (e.g., --count vs --count-matches) use a Mode enum
Negation flags explicitly disable features (e.g., --no-ignore disables ignore file processing)
Later flags override earlier ones, enabling config file overrides
Some flags partially override others (e.g., -A and -B partially override -C)

Advanced Features

Relevant Files

GUIDE.md - Comprehensive user guide covering advanced features
FAQ.md - Frequently asked questions with detailed explanations
crates/searcher/src/searcher/mod.rs - Encoding and transcoding implementation
crates/cli/src/decompress.rs - Compression support implementation

PCRE2 Regex Engine

ripgrep supports switching to the PCRE2 regex engine via the -P/--pcre2 flag. While the default regex engine uses finite automata for guaranteed linear-time complexity, PCRE2 enables advanced features like lookaround assertions and backreferences.

Key trade-offs:

PCRE2 enables (?=...) lookahead, (?<=...) lookbehind, and backreferences like (\w{10})\1
Performance may degrade due to line-by-line searching and UTF-8 validation overhead
Use --no-pcre2-unicode to disable Unicode mode for better performance when searching ASCII-compatible data
Combine with -U/--multiline to avoid line-by-line searching penalties

# Find palindromes using backreferences
rg -P '(\w{10})\1'

# Optimize PCRE2 for ASCII-only searches
rg -P -U '^\w{42}$' --no-pcre2-unicode file.txt

Multiline Search

The -U/--multiline flag allows patterns to match across line boundaries. This is useful for finding patterns that span multiple lines, such as function definitions or code blocks.

# Match text spanning multiple lines
rg -U 'function.*\{.*\}'

# Combine with context to see surrounding lines
rg -U -C2 'pattern.*spanning.*lines'

Important: By default, . does not match newlines even in multiline mode. Use \p{Any}+? for lazy matching across any characters including newlines.

File Encoding Support

ripgrep automatically detects UTF-16 via BOM sniffing and supports explicit encoding specification with -E/--encoding. This enables searching files in UTF-8, UTF-16, latin-1, GBK, EUC-JP, Shift_JIS, and many other encodings.

# Search UTF-16 files automatically
rg 'pattern' file.utf16

# Explicitly specify encoding
rg -E shift_jis 'パターン' file.sjis

# Disable all encoding logic to search raw bytes
rg -E none 'pattern' binary-file

Compressed File Search

The -z/--search-zip flag enables searching compressed files without manual decompression. Supported formats include gzip, bzip2, xz, lzma, lz4, Brotli, and Zstandard.

# Search inside compressed archives
rg -z 'pattern' archive.tar.gz

# Requires corresponding decompression binaries (gzip, bzip2, etc.)

Preprocessor Support

The --pre flag runs a custom command to transform file contents before searching. This enables searching non-text formats like PDFs, compressed archives, or binary files.

# Search PDFs using pdftotext
rg --pre pdftotext 'text' document.pdf

# Optimize with glob patterns to reduce overhead
rg --pre ./preprocess --pre-glob '*.pdf' 'pattern'

Output Replacements

The --replace flag rewrites matched text in output (without modifying files). Supports capturing groups and named captures for flexible transformations.

# Replace matched text in output
rg 'fast' README.md --replace FAST

# Use capturing groups
rg 'fast\s+(\w+)' README.md -r 'fast-$1'

# Named captures
rg 'fast\s+(?P<word>\w+)' README.md -r 'fast-$word'

Performance Optimization

ripgrep uses intelligent search strategies: memory mapping for single files, incremental buffering for large directories, and SIMD optimizations for UTF-8 handling. The --trace flag reveals which strategy is active.

# View search strategy details
rg --trace 'pattern' | grep -i strategy

# Disable memory mapping if needed
rg --no-mmap 'pattern'

Testing & Benchmarking

Relevant Files

tests/tests.rs
tests/feature.rs
tests/util.rs
tests/macros.rs
benchsuite/benchsuite

Test Infrastructure

Ripgrep uses a comprehensive integration test suite organized into logical modules. The main test entry point is tests/tests.rs, which imports specialized test modules:

feature.rs - Tests for core ripgrep features (encoding, patterns, output formats)
binary.rs - Binary file handling tests
json.rs - JSON output format validation
multiline.rs - Multiline search support
regression.rs - Regression tests for fixed issues
misc.rs - Miscellaneous tests

Tests are written using the rgtest! macro, which automatically sets up a temporary directory and creates a ripgrep command instance. The macro also runs each test twice when the pcre2 feature is enabled, testing both the default regex engine and PCRE2.

Test Utilities

The util.rs module provides the testing infrastructure:

Dir - Manages temporary test directories with unique IDs to avoid conflicts
TestCommand - Wraps the ripgrep binary with methods for setting arguments, pipes, and assertions
Macros - eqnice! and eqnice_repr! provide readable output diffs on test failures

Tests create files using dir.create() or dir.create_bytes(), then execute ripgrep with various flags and patterns. The sort_lines() utility helps test unordered output.

Benchmark Suite

The benchsuite/benchsuite Python script compares ripgrep's performance against other search tools (grep, ag, git grep, ugrep). It defines 20+ benchmarks across two corpora:

Linux kernel - Tests on large source tree with many files
Subtitles - Tests on large single files (English and Russian UTF-8)

Each benchmark runs multiple iterations with warmup rounds, collecting timing statistics and line counts. Results show mean +/- standard deviation, with asterisks marking fastest commands.

Running Tests

cargo test --test integration
cargo test --test integration -- --nocapture
cargo test --test integration feature::

For benchmarks:

./benchsuite/benchsuite --download all
./benchsuite/benchsuite --list
./benchsuite/benchsuite linux_literal