Overview
Relevant Files
README.mdCargo.tomlcrates/core/README.mdcrates/grep/README.mdcrates/searcher/README.mdcrates/matcher/src/lib.rscrates/ignore/src/lib.rs
ripgrep (rg) is a blazingly fast line-oriented search tool that recursively searches directories for regex patterns. It combines intelligent filtering, high-performance regex matching, and a modular architecture to deliver exceptional speed while respecting .gitignore rules and automatically skipping hidden files and binary data.
What is ripgrep?
ripgrep is a command-line utility that searches files for patterns you specify. By default, it respects .gitignore, .ignore, and .rgignore files, automatically filters out hidden files and directories, and skips binary files. These defaults can be disabled with rg -uuu for unrestricted searching. The tool is available on Windows, macOS, and Linux with precompiled binaries for every release.
Why ripgrep is fast
ripgrep achieves exceptional performance through several key optimizations:
- Rust's regex engine – Uses finite automata, SIMD instructions, and aggressive literal optimizations
- UTF-8 aware – Decoding is built directly into the deterministic finite automaton, maintaining Unicode support without sacrificing speed
- Smart I/O strategy – Automatically chooses between memory mapping (for single files) and incremental buffering (for large directories)
- Parallel directory traversal – Lock-free recursive directory iterator via
crossbeamand theignorecrate - Efficient ignore matching – Uses
RegexSetto match file paths against multiple glob patterns simultaneously
Architecture overview
ripgrep is organized as a Rust workspace with specialized crates, each handling a distinct responsibility:
Loading diagram...
Core crates:
crates/core– Entry point with CLI interface definition and glue code orchestrating the search pipelinecrates/grep– High-level facade combining all grep-related crates into a unified library APIcrates/matcher– Abstract trait-based interface for regex implementations (supports Rust regex and PCRE2)crates/searcher– High-level search execution handling context lines, binary detection, UTF-16 transcoding, and I/O strategy selectioncrates/printer– Formats and outputs results in human-readable, aggregate, or JSON Lines formatcrates/ignore– Fast recursive directory iterator with.gitignoreand file type filteringcrates/regex– Matcher implementation using Rust's regex enginecrates/pcre2– Optional matcher implementation using PCRE2 for advanced features like lookaround and backreferencescrates/globset– Efficient glob pattern matching for file filteringcrates/cli– Shared CLI utilities and helpers
Key design principles
Modularity – Each crate is independently reusable. The grep crate provides a library interface for embedding ripgrep's search capabilities in other projects.
Pluggable regex engines – The Matcher trait allows swapping between Rust's regex engine (default) and PCRE2 (optional) without changing search logic.
Automatic optimization – ripgrep intelligently selects I/O strategies, binary detection modes, and encoding handling based on file characteristics.
Filtering by default – Respects version control ignore files and skips hidden/binary files automatically, reducing noise in search results.
Architecture & Crate Organization
Relevant Files
crates/grep/src/lib.rscrates/core/main.rscrates/core/search.rscrates/searcher/src/lib.rscrates/matcher/README.mdCargo.toml
ripgrep is organized as a modular workspace of specialized crates, each handling a distinct responsibility in the search pipeline. This architecture enables code reuse, testability, and pluggable regex engines.
Core Crate Organization
The workspace consists of 10 member crates:
grep- High-level facade that re-exports all grep-related crates as a unified library interfacegrep-matcher- Low-level trait for pluggable regex engines (Rust regex or PCRE2)grep-regex- Matcher implementation using Rust'sregexcrategrep-pcre2- Optional matcher implementation using PCRE2 (feature-gated)grep-searcher- Core search engine: reads bytes, applies matchers, reports results via sinksgrep-printer- Output formatters (standard grep-like, summary, JSON Lines)grep-cli- CLI utilities (command execution, decompression, binary detection)globset- Glob pattern matching for file filteringignore- Gitignore-aware directory traversalcore- ripgrep's CLI binary (not a published crate)
Data Flow Architecture
Loading diagram...
Search Pipeline
The search execution follows this sequence:
- Argument Parsing (
crates/core/flags) - Convert CLI flags into high-levelHiArgsconfiguration - Matcher Selection - Choose between Rust regex or PCRE2 based on features and flags
- Directory Traversal - Use
ignorecrate to walk directories respecting.gitignorerules - File Processing - For each file, determine if preprocessing or decompression is needed
- Search Execution -
Searcherreads bytes and appliesMatcherusing memory maps when possible - Result Formatting -
Printerformats matches (standard, summary, or JSON) and writes to output
Single vs. Multi-threaded Search
- Single-threaded (
search()) - Sequential file processing, simpler control flow - Multi-threaded (
search_parallel()) - Parallelism viaignorecrate'sbuild_parallel(), with buffered output to prevent interleaving
Both paths use the same SearchWorker abstraction, ensuring consistent behavior.
Key Abstractions
Matcher Trait - Enables pluggable regex engines. Implementations must support pattern matching, finding, and iteration over matches.
Sink Trait - Abstracts result reporting. Callers receive callbacks for match lines, context lines, and search completion. Enables flexible output handling.
SearchWorker - Orchestrates matcher, searcher, and printer. Handles preprocessor execution, decompression, and binary detection logic.
Feature Flags
pcre2- Enables PCRE2 regex engine as an alternative to Rust regex (default: disabled)
Pattern Matching & Regex Engines
Relevant Files
crates/matcher/src/lib.rs- CoreMatchertrait interfacecrates/regex/src/matcher.rs- Rust regex engine implementationcrates/pcre2/src/matcher.rs- PCRE2 engine implementationcrates/regex/src/literal.rs- Literal extraction optimization
ripgrep supports multiple regex engines through a pluggable architecture. The core abstraction is the Matcher trait, which defines a low-level interface for pattern matching. This design allows different regex engines to be swapped without changing the search logic.
The Matcher Trait
The Matcher trait in grep-matcher provides an abstract interface for regex implementations. Key methods include:
find_at(haystack, at)- Find the first match starting from positionatfind_iter(haystack, callback)- Iterate over all non-overlapping matchescaptures_at(haystack, at, caps)- Extract capturing groups from a matchis_match(haystack)- Quick boolean check for match existenceshortest_match(haystack)- Return earliest possible match end position
The trait uses internal iteration (push model) rather than external iteration. This design choice enables support for regex engines that require internal iteration and avoids Rust's type system limitations with generic external iterators.
Regex Engine Implementations
Rust Regex Engine (grep-regex): Uses regex-automata for fast, memory-safe pattern matching. Supports Unicode by default and includes optimizations like literal extraction. The builder applies transformations like word boundaries (\b) and whole-line anchors (^...$) before compilation.
PCRE2 Engine (grep-pcre2): Wraps the PCRE2 C library for Perl-compatible regex syntax. Useful when patterns require features not available in Rust's regex engine. Includes JIT compilation support for performance.
Optimization Strategies
// Literal extraction: for patterns like \w+foo\w+,
// extract "foo" and search for it first
let fast_line_regex = InnerLiterals::new(&chir, ®ex).one_regex()?;
ripgrep extracts literals from patterns to create fast candidate matchers. When a line terminator is set, the engine can search for literal substrings first, then verify full pattern matches only on candidate lines. This dramatically speeds up searches like \w+foo\w+.
Capturing Groups & Replacements
Both engines support capturing groups through the Captures trait. The interpolate method handles replacement string expansion with $1, $name syntax. Matchers can report non-matching bytes and line terminators to enable further optimizations.
Line-Oriented Matching
The find_candidate_line method enables fast line-level filtering. Matchers can return confirmed matches or candidate lines that need verification. This is crucial for ripgrep's line-oriented search model, where most operations work on complete lines rather than arbitrary byte ranges.
File Filtering & Ignore Rules
Relevant Files
crates/ignore/src/lib.rscrates/ignore/src/gitignore.rscrates/ignore/src/overrides.rscrates/ignore/src/walk.rscrates/globset/src/lib.rscrates/ignore/src/types.rs
The ripgrep project uses a sophisticated multi-layered filtering system to determine which files and directories to include or exclude during recursive directory traversal. This system combines glob patterns, gitignore semantics, file type matching, and explicit overrides.
Core Components
Globset Crate provides cross-platform glob pattern matching. It converts glob patterns to regular expressions for efficient matching of multiple patterns simultaneously. Key features include support for * (any characters), ? (single character), ** (recursive directory matching), character classes [abc], and alternation {a,b}.
Ignore Crate implements the main filtering logic through three primary modules:
-
Gitignore Module — Parses and matches patterns from
.gitignore,.ignore, and.git/info/excludefiles. It respects gitignore semantics including negation patterns (prefixed with!) and directory-only patterns (suffixed with/). -
Overrides Module — Manages explicit include/exclude patterns provided via command-line flags like
--includeand--exclude. These take precedence over gitignore rules. -
Types Module — Matches files by type using glob patterns. Default types include
rust(*.rs),c(*.{c,h}), and many others. Users can select or negate specific types.
Matching Precedence
The WalkBuilder applies filtering rules in this strict order:
- Glob Overrides — Explicit patterns from
--include/--excludeflags - Ignore Files —
.ignorefiles override.gitignorefiles; more nested files override less nested ones - File Type Matching — Applied only to files, not directories
- Hidden Files — Skipped by default unless explicitly whitelisted
- File Size Limits — Compared against
max_filesizesetting - Yield — If all checks pass, the entry is yielded
Usage Example
use ignore::WalkBuilder;
let walker = WalkBuilder::new("./")
.hidden(false) // Include hidden files
.max_depth(Some(3)) // Limit recursion depth
.build();
for result in walker {
match result {
Ok(entry) => println!("{}", entry.path().display()),
Err(err) => eprintln!("Error: {}", err),
}
}
Performance Optimization
The globset crate achieves high performance by matching multiple glob patterns simultaneously using compiled regex automata, rather than testing each pattern sequentially. This is particularly beneficial when many patterns must be evaluated against each path.
Loading diagram...
Search Execution & Performance
Relevant Files
crates/searcher/src/lib.rscrates/searcher/src/searcher/mod.rscrates/searcher/src/searcher/core.rscrates/searcher/src/searcher/mmap.rscrates/searcher/src/sink.rscrates/printer/src/lib.rscrates/printer/src/standard.rs
Core Search Pipeline
The search execution follows a push-based model where the Searcher drives execution and pushes results to a Sink implementation. This architecture separates concerns: the searcher handles byte consumption and pattern matching, while sinks handle result reporting.
The principal components are:
Searcher- Reads bytes from a source (file, stdin, memory slice) and applies aMatcher(regex pattern) over those bytesMatcher- A trait for pattern matching (implemented bygrep-regexusing Rust's regex crate)Sink- A trait that receives callbacks for matches, context lines, and search completion
// Example: Basic search execution
let matcher = RegexMatcher::new(r"pattern")?;
let mut sink = UTF8(|line_num, line| {
println!("{}: {}", line_num, line);
Ok(true) // Continue searching
});
Searcher::new().search_slice(&matcher, data, sink)?;
Performance Optimizations
Memory Mapping
The searcher can optionally use memory maps for faster file access when enabled via MmapChoice::auto(). This is disabled by default due to safety concerns (potential SIGBUS on file mutations). When enabled, the searcher uses heuristics to determine if mmap is advantageous based on file size and platform.
// Memory mapping is NOT used on macOS by default
// Falls back to standard OS read calls if mmap fails
Buffering Strategy
The searcher uses a 64 KB default buffer for incremental searching. This prevents excessive memory usage when processing large files or files with very long lines. The BufferAllocation strategy controls how the buffer expands:
Eager(default) - Expand buffer until the next line fits or memory is exhaustedError(limit)- Fail if additional allocation exceeds the specified limit
Binary Detection
Three strategies for handling binary data:
none()- No detection (default)quit(byte)- Stop searching when the byte is foundconvert(byte)- Replace the byte with the line terminator
Binary detection is applied differently depending on the search method: continuously for buffered searches, and selectively for memory-mapped searches.
Output Handling
The grep-printer crate provides multiple output formats:
Standard- Human-readable grep-like output with colors, line numbers, and contextJSON- Machine-readable JSON Lines format for programmatic consumptionSummary- Aggregate statistics (match counts, file paths)
Each printer implements the Sink trait, receiving callbacks as matches are found. The Standard printer supports advanced features like search-and-replace, column limiting, and hyperlink generation.
Sink Lifecycle
A Sink receives callbacks in this order:
begin()- Search startsmatched()- Match found (required to implement)context()- Context line found (optional)context_break()- Gap between context groups (optional)binary_data()- Binary data detected (optional)finish()- Search completes successfully
If an error occurs or the sink returns false, the search stops immediately. The finish() callback is not called if an error occurs.
Line-by-Line vs. Multi-Line Search
The searcher automatically selects between two execution paths:
- Line-by-line (fast path) - Optimized for single-line patterns, processes one line at a time
- Multi-line (slow path) - Handles patterns spanning multiple lines, uses more complex buffering
The choice is made automatically based on the matcher's capabilities and configuration.
CLI Interface & Configuration
Relevant Files
crates/core/flags/defs.rscrates/core/flags/parse.rscrates/core/flags/lowargs.rscrates/core/flags/hiargs.rsGUIDE.md
Ripgrep's CLI is built on a sophisticated flag system that transforms raw command-line arguments into structured, type-safe representations. The architecture uses a two-level parsing strategy: low-level arguments capture raw flag values, while high-level arguments construct complex objects needed for search execution.
Flag Categories
Flags are organized into seven logical categories that structure the help output and documentation:
- Input: Pattern and file source flags (
-e,-f) - Search: Pattern matching behavior (
-i,-S,-F,--engine) - Filter: File selection and ignore rules (
--type,--glob,--no-ignore) - Output: Result formatting (
-n,-c,--color,--context) - Output Modes: Mutually exclusive output formats (
--count,--json,--vimgrep) - Logging: Debug and trace output (
--debug,--trace) - Other Behaviors: Miscellaneous options (
--version,--help,--type-list)
Flag Traits and Variants
Each flag implements the Flag trait, which defines its behavior. A single logical flag can have multiple manifestations:
- Long form:
--encoding - Short form:
-E - Negation:
--no-encoding - Aliases: Alternative names for compatibility
For example, the encoding flag supports -E, --encoding, and --no-encoding, all manipulating the same internal state.
Two-Level Parsing Architecture
Raw CLI Arguments
↓
Parser (lexopt)
↓
LowArgs (raw values)
↓
HiArgs (constructed objects)
↓
Search Execution
LowArgs stores raw flag values directly from parsing. It contains simple types like bool, String, and Vec<T>.
HiArgs constructs complex objects from LowArgs. For instance, glob patterns are combined into a single matcher, and color specifications are parsed into a ColorSpecs object.
Configuration File Support
Ripgrep reads configuration from RIPGREP_CONFIG_PATH. The config file format is simple: each line is a single argument, with # comments supported. Arguments are prepended to CLI arguments, allowing command-line flags to override config settings.
# Example config file
--max-columns=150
--smart-case
--type-add=web:*.{html,css,js}
--hidden
Special Modes
Beyond normal search, ripgrep supports special modes triggered by specific flags:
--help/-h: Display help documentation--version: Show version information--type-list: List all supported file types--generate: Generate shell completions (bash, zsh, fish, powershell)
These modes bypass normal search execution and instead generate output or exit early.
Flag Validation and Interaction
The parser enforces flag interactions and constraints:
- Mutually exclusive flags (e.g.,
--countvs--count-matches) use aModeenum - Negation flags explicitly disable features (e.g.,
--no-ignoredisables ignore file processing) - Later flags override earlier ones, enabling config file overrides
- Some flags partially override others (e.g.,
-Aand-Bpartially override-C)
Advanced Features
Relevant Files
GUIDE.md- Comprehensive user guide covering advanced featuresFAQ.md- Frequently asked questions with detailed explanationscrates/searcher/src/searcher/mod.rs- Encoding and transcoding implementationcrates/cli/src/decompress.rs- Compression support implementation
PCRE2 Regex Engine
ripgrep supports switching to the PCRE2 regex engine via the -P/--pcre2 flag. While the default regex engine uses finite automata for guaranteed linear-time complexity, PCRE2 enables advanced features like lookaround assertions and backreferences.
Key trade-offs:
- PCRE2 enables
(?=...)lookahead,(?<=...)lookbehind, and backreferences like(\w{10})\1 - Performance may degrade due to line-by-line searching and UTF-8 validation overhead
- Use
--no-pcre2-unicodeto disable Unicode mode for better performance when searching ASCII-compatible data - Combine with
-U/--multilineto avoid line-by-line searching penalties
# Find palindromes using backreferences
rg -P '(\w{10})\1'
# Optimize PCRE2 for ASCII-only searches
rg -P -U '^\w{42}$' --no-pcre2-unicode file.txt
Multiline Search
The -U/--multiline flag allows patterns to match across line boundaries. This is useful for finding patterns that span multiple lines, such as function definitions or code blocks.
# Match text spanning multiple lines
rg -U 'function.*\{.*\}'
# Combine with context to see surrounding lines
rg -U -C2 'pattern.*spanning.*lines'
Important: By default, . does not match newlines even in multiline mode. Use \p{Any}+? for lazy matching across any characters including newlines.
File Encoding Support
ripgrep automatically detects UTF-16 via BOM sniffing and supports explicit encoding specification with -E/--encoding. This enables searching files in UTF-8, UTF-16, latin-1, GBK, EUC-JP, Shift_JIS, and many other encodings.
# Search UTF-16 files automatically
rg 'pattern' file.utf16
# Explicitly specify encoding
rg -E shift_jis 'パターン' file.sjis
# Disable all encoding logic to search raw bytes
rg -E none 'pattern' binary-file
Compressed File Search
The -z/--search-zip flag enables searching compressed files without manual decompression. Supported formats include gzip, bzip2, xz, lzma, lz4, Brotli, and Zstandard.
# Search inside compressed archives
rg -z 'pattern' archive.tar.gz
# Requires corresponding decompression binaries (gzip, bzip2, etc.)
Preprocessor Support
The --pre flag runs a custom command to transform file contents before searching. This enables searching non-text formats like PDFs, compressed archives, or binary files.
# Search PDFs using pdftotext
rg --pre pdftotext 'text' document.pdf
# Optimize with glob patterns to reduce overhead
rg --pre ./preprocess --pre-glob '*.pdf' 'pattern'
Output Replacements
The --replace flag rewrites matched text in output (without modifying files). Supports capturing groups and named captures for flexible transformations.
# Replace matched text in output
rg 'fast' README.md --replace FAST
# Use capturing groups
rg 'fast\s+(\w+)' README.md -r 'fast-$1'
# Named captures
rg 'fast\s+(?P<word>\w+)' README.md -r 'fast-$word'
Performance Optimization
ripgrep uses intelligent search strategies: memory mapping for single files, incremental buffering for large directories, and SIMD optimizations for UTF-8 handling. The --trace flag reveals which strategy is active.
# View search strategy details
rg --trace 'pattern' | grep -i strategy
# Disable memory mapping if needed
rg --no-mmap 'pattern'
Testing & Benchmarking
Relevant Files
tests/tests.rstests/feature.rstests/util.rstests/macros.rsbenchsuite/benchsuite
Test Infrastructure
Ripgrep uses a comprehensive integration test suite organized into logical modules. The main test entry point is tests/tests.rs, which imports specialized test modules:
- feature.rs - Tests for core ripgrep features (encoding, patterns, output formats)
- binary.rs - Binary file handling tests
- json.rs - JSON output format validation
- multiline.rs - Multiline search support
- regression.rs - Regression tests for fixed issues
- misc.rs - Miscellaneous tests
Tests are written using the rgtest! macro, which automatically sets up a temporary directory and creates a ripgrep command instance. The macro also runs each test twice when the pcre2 feature is enabled, testing both the default regex engine and PCRE2.
Test Utilities
The util.rs module provides the testing infrastructure:
- Dir - Manages temporary test directories with unique IDs to avoid conflicts
- TestCommand - Wraps the ripgrep binary with methods for setting arguments, pipes, and assertions
- Macros -
eqnice!andeqnice_repr!provide readable output diffs on test failures
Tests create files using dir.create() or dir.create_bytes(), then execute ripgrep with various flags and patterns. The sort_lines() utility helps test unordered output.
Benchmark Suite
The benchsuite/benchsuite Python script compares ripgrep's performance against other search tools (grep, ag, git grep, ugrep). It defines 20+ benchmarks across two corpora:
- Linux kernel - Tests on large source tree with many files
- Subtitles - Tests on large single files (English and Russian UTF-8)
Each benchmark runs multiple iterations with warmup rounds, collecting timing statistics and line counts. Results show mean +/- standard deviation, with asterisks marking fastest commands.
Running Tests
cargo test --test integration
cargo test --test integration -- --nocapture
cargo test --test integration feature::
For benchmarks:
./benchsuite/benchsuite --download all
./benchsuite/benchsuite --list
./benchsuite/benchsuite linux_literal