OpenCV: Open Source Computer Vision Library | Augment Code

Overview

Relevant Files

README.md
modules/core/doc/intro.markdown
include/opencv2/opencv.hpp
doc/root.markdown.in

OpenCV (Open Source Computer Vision Library) is a comprehensive, modular C++ library containing hundreds of computer vision algorithms. It provides a modern C++ API (OpenCV 2.x+) with automatic memory management, efficient data structures, and cross-platform support for image processing, video analysis, machine learning, and deep neural networks.

Core Architecture

OpenCV is organized as a collection of specialized modules, each addressing specific computer vision domains:

Foundation & Data Structures

core - Dense multi-dimensional arrays (Mat), basic functions, and automatic memory management
imgproc - Image filtering, transformations, color space conversion, and histograms

Input/Output & Visualization

imgcodecs - Reading and writing image files in multiple formats
videoio - Video capture and codec interfaces
highgui - Simple UI capabilities and window management

Analysis & Detection

video - Motion estimation, background subtraction, object tracking
features2d - Feature detection, descriptors, and matching
objdetect - Object and face detection using cascades and deep learning
calib3d - Camera calibration, 3D reconstruction, stereo vision

Advanced Processing

dnn - Deep Neural Network inference and training
ml - Machine learning (classification, regression, clustering)
photo - Advanced photo processing (denoising, inpainting)
stitching - Image stitching and panorama creation
gapi - Graph-based image processing pipeline

Key Design Principles

Automatic Memory Management - All data structures use reference counting. Copying a Mat is O(1); use .clone() for deep copies. Memory is automatically deallocated when reference count reaches zero.

Fixed Pixel Types - Arrays support 8 fundamental types (8-bit unsigned/signed, 16-bit, 32-bit int, 32-bit/64-bit float) and multi-channel variants. This design balances flexibility with performance and language binding support.

Saturation Arithmetic - Operations on 8-bit and 16-bit images automatically clamp results to valid ranges, preventing overflow artifacts.

Flexible Input/Output - Functions accept InputArray and OutputArray proxies, allowing seamless use of Mat, std::vector, Matx, or Scalar without API duplication.

Language Support

OpenCV provides bindings for multiple languages:

C++ - Primary implementation with full feature access
Python - Popular for research and rapid prototyping
Java - Android and desktop applications
JavaScript - Web-based computer vision
Objective-C - iOS development

Build & Deployment

The library uses CMake for cross-platform builds with optional support for GPU acceleration (CUDA, OpenCL), hardware optimization (IPP, TBB), and specialized backends (OpenVX, Vulkan). Modular design allows selective compilation of only needed components.

Architecture & Module System

Relevant Files

CMakeLists.txt
cmake/OpenCVModule.cmake
modules/core/include/opencv2/core.hpp
modules/core/include/opencv2/core/hal/interface.h
modules/core/src/hal_replacement.hpp

OpenCV is organized as a modular architecture where functionality is divided into independent, reusable components. Each module encapsulates specific computer vision capabilities and manages its own dependencies, sources, and tests.

Module Organization

The repository contains 20+ core modules located in modules/, each with a consistent structure:

include/ – Public API headers (installed with the library)
src/ – Implementation files (C++, CUDA, OpenCL kernels)
test/ – Accuracy tests
perf/ – Performance benchmarks
CMakeLists.txt – Module build configuration

Key modules include core (fundamental data structures), imgproc (image processing), dnn (deep learning), video (video analysis), and highgui (GUI/display).

CMake Module System

The build system uses a two-pass approach defined in cmake/OpenCVModule.cmake:

First Pass (Information Collection): CMake scans all module directories, collects metadata (dependencies, descriptions, class type), and builds a dependency graph. Modules are classified as PUBLIC (exported), INTERNAL (private), or BINDINGS (language wrappers).

Second Pass (Target Creation): After dependency resolution, CMake creates actual build targets. The system automatically disables modules with unresolved dependencies and propagates transitive dependencies.

ocv_add_module(modulename [INTERNAL|BINDINGS] [REQUIRED deps] [OPTIONAL deps])
ocv_glob_module_sources()
ocv_module_include_directories()
ocv_create_module()

Dependency Resolution

The system performs sophisticated dependency management:

Required dependencies must be available or the module is disabled
Optional dependencies are included if available
Transitive dependencies are automatically propagated
Circular dependencies are detected and reported as errors
World build mode combines all modules into a single library for efficiency

Hardware Abstraction Layer (HAL)

The HAL provides a pluggable interface for platform-specific optimizations. It sits between high-level algorithms and low-level implementations:

Algorithm Layer (e.g., cv::add)
        &darr;
HAL Interface (cv_hal_add8u)
        &darr;
HAL Implementations (IPP, OpenVX, FastCV, KleidiCV, etc.)
        &darr;
CPU/GPU Hardware

The HAL interface is defined in modules/core/include/opencv2/core/hal/interface.h with return codes (CV_HAL_ERROR_OK, CV_HAL_ERROR_NOT_IMPLEMENTED). Implementations can be swapped at build time via CMake options like WITH_IPP, WITH_OPENVX, or WITH_FASTCV.

Module Dependencies Graph

Loading diagram...

Build Configuration

The root CMakeLists.txt orchestrates the entire build:

Detects compiler, platform, and optional dependencies
Configures HAL backends (IPP, OpenVX, FastCV, etc.)
Calls ocv_register_modules() to process all modules
Generates configuration headers and CMake config files
Optionally builds documentation, tests, samples, and language bindings

Key options include BUILD_LIST (whitelist specific modules), BUILD_SHARED_LIBS (shared vs. static), and BUILD_opencv_world (monolithic library).

Core Module & Data Structures

Relevant Files

modules/core/include/opencv2/core/mat.hpp
modules/core/include/opencv2/core/types.hpp
modules/core/src/precomp.hpp
modules/core/src/matrix.cpp

OpenCV's core module provides the fundamental data structures and algorithms for image processing. The module is built around dense n-dimensional arrays and geometric primitives that enable efficient computation across CPU, GPU, and specialized hardware backends.

Mat: The Central Data Structure

The Mat class is the cornerstone of OpenCV, representing an n-dimensional dense numerical array. It can store single or multi-channel data (grayscale images, color images, tensors, etc.) with automatic memory management through reference counting.

Key characteristics:

Flexible dimensionality: Supports 2D matrices, 3D volumes, and higher-dimensional tensors
Reference-counted memory: Shallow copies share data; use clone() for deep copies
Continuous memory layout: Uses step arrays for efficient element access across dimensions
Type system: Supports all primitive types (uint8, int32, float32, float64, etc.) with channel counts

Core member variables:

int flags;           // Magic signature, continuity flag, depth, channels
int dims;            // Number of dimensions (>= 2)
int rows, cols;      // 2D dimensions (-1 if >2D)
uchar* data;         // Pointer to pixel data
MatSize size;        // Dimensional sizes
MatStep step;        // Byte offsets between rows/planes
UMatData* u;         // Unified memory data (GPU/CPU)

Element access patterns:

M.at<double>(i, j)           // Type-safe element access
M.ptr<uchar>(row)            // Raw pointer to row
M(Range(0, 10), Range::all()) // ROI slicing

Geometric Primitives

OpenCV provides template classes for common geometric types, all supporting arithmetic operations and type conversions:

Point_ <T>: 2D coordinates with dot product, cross product, and norm operations. Aliases: Point2i, Point2f, Point2d.

Point3_ <T>: 3D coordinates with 3D-specific operations like cross product.

Size_ <T>: Width and height for image/rectangle dimensions. Aliases: Size2i, Size2f.

Rect_ <T>: Axis-aligned rectangles defined by top-left corner and dimensions. Supports intersection, union, and containment checks. Aliases: Rect2i, Rect2f.

Range: Half-open interval [start, end) for slicing matrices. Range::all() represents the entire dimension.

Scalar_ <T>: 4-element vector for pixel values. Derived from Vec<T, 4>. Default type is Scalar (double-precision).

Vector Types

The Vec<T, n> template represents fixed-size vectors allocated on the stack. Common aliases include Vec2f, Vec3b, Vec4d. Vectors support element-wise operations, norms, and conversions to/from Points and Scalars.

Input/Output Array Abstraction

_InputArray and _OutputArray are proxy classes enabling polymorphic function signatures. They accept Mat, vector<T>, Matx, UMat, GpuMat, and other array types without explicit conversion.

void processImage(InputArray src, OutputArray dst);
// Can be called with Mat, vector, UMat, etc.

Unified Memory (UMat)

UMat provides a unified interface for CPU and GPU memory. The UMatData structure manages reference counting, memory flags, and synchronization between host and device copies. This enables transparent GPU acceleration without changing algorithm code.

Memory Management

OpenCV uses custom allocators (MatAllocator) for flexible memory strategies. The standard allocator handles CPU memory; specialized allocators support GPU, OpenCL, and other backends. Reference counting prevents memory leaks while enabling efficient shallow copies.

Loading diagram...

Type System

OpenCV encodes data types as integers: CV_MAKETYPE(depth, channels). Depths include CV_8U, CV_32F, CV_64F, etc. This enables runtime type checking and automatic format conversions across functions and backends.

Image Processing & Filtering

Relevant Files

modules/imgproc/include/opencv2/imgproc.hpp
modules/imgproc/CMakeLists.txt
modules/imgproc/src/filter.dispatch.cpp
modules/imgproc/src/morph.dispatch.cpp
modules/imgproc/src/smooth.dispatch.cpp

The imgproc module provides a comprehensive suite of image filtering and processing functions. Each pixel's output is computed from its neighborhood using kernels, with support for multi-channel images processed independently.

Linear Filtering

Linear filters apply weighted sums of pixel values. The module supports several approaches:

filter2D() - Applies arbitrary 2D convolution kernels (correlation, not true convolution)
sepFilter2D() - Applies separable filters for efficiency (row then column filtering)
blur() - Normalized box filter (averaging)
boxFilter() - Unnormalized box filter for computing integral characteristics
GaussianBlur() - Gaussian smoothing with configurable kernel size and sigma

Smoothing & Noise Reduction

Specialized filters for noise removal while preserving edges:

medianBlur() - Median filter (non-linear, effective for salt-and-pepper noise)
bilateralFilter() - Edge-preserving filter using color and spatial distance
stackBlur() - Fast approximation of Gaussian blur with constant time complexity

Morphological Operations

Morphological operations use structuring elements to process binary or grayscale images:

erode() - Applies minimum filter; shrinks white regions
dilate() - Applies maximum filter; expands white regions
morphologyEx() - Advanced operations combining erosion and dilation:
- MORPH_OPEN - Erosion followed by dilation (removes small objects)
- MORPH_CLOSE - Dilation followed by erosion (fills small holes)
- MORPH_GRADIENT - Difference between dilation and erosion (edge detection)
- MORPH_TOPHAT - Original minus opening (extracts small objects)
- MORPH_BLACKHAT - Closing minus original (extracts small holes)

Derivative & Edge Detection

Compute image gradients and detect edges:

Sobel() - First/second derivatives with Gaussian smoothing
Scharr() - More accurate 3×3 derivative operator
Laplacian() - Second derivative (sum of x and y derivatives)
Canny() - Multi-stage edge detector with hysteresis thresholding

Key Concepts

Border Handling: Functions support multiple extrapolation methods (BORDER_REPLICATE, BORDER_CONSTANT, BORDER_REFLECT, etc.) for pixels outside image boundaries.

Depth Combinations: Output depth can differ from input. For example, 8-bit input can produce 16-bit or 32-bit output for derivatives to preserve precision.

Structuring Elements: Created via getStructuringElement() with shapes: MORPH_RECT, MORPH_CROSS, MORPH_ELLIPSE, MORPH_DIAMOND.

Performance: The module uses CPU dispatch (SSE2, AVX2, AVX512) and OpenCL acceleration. Separable filters and small kernels are optimized for speed.

// Example: Gaussian blur with edge detection
Mat src = imread("image.jpg", IMREAD_GRAYSCALE);
Mat blurred, edges;
GaussianBlur(src, blurred, Size(5, 5), 1.0);
Canny(blurred, edges, 50, 150);

Deep Neural Networks & Model Inference

Relevant Files

modules/dnn/include/opencv2/dnn/dnn.hpp
modules/dnn/src/dnn_read.cpp
modules/dnn/src/net.cpp
modules/dnn/src/layer.cpp
samples/dnn/README.md

The DNN module provides a comprehensive framework for loading and executing pre-trained deep neural networks. It supports multiple model formats and backends, enabling efficient inference across different hardware targets.

Core Architecture

The module is built around three main components:

Net Class - Represents a complete neural network graph. It manages layer connections, data flow, and forward pass execution.
Layer Class - Base class for all layer implementations. Each layer encapsulates computation logic and learned parameters (weights/biases).
Blob/Mat - Data containers for network inputs, outputs, and intermediate activations. OpenCV uses cv::Mat as the primary data structure.

Model Loading

OpenCV supports loading models from multiple frameworks through dedicated reader functions:

// Auto-detect framework from file extensions
Net net = cv::dnn::readNet(model, config);

// Framework-specific loaders
Net net = cv::dnn::readNetFromCaffe(prototxt, caffemodel);
Net net = cv::dnn::readNetFromTensorflow(pb, pbtxt);
Net net = cv::dnn::readNetFromONNX(onnxfile);
Net net = cv::dnn::readNetFromDarknet(cfg, weights);
Net net = cv::dnn::readNetFromTFLite(tflitefile);
Net net = cv::dnn::readNetFromTorch(t7file);

All readers support both file paths and in-memory buffers, enabling flexible deployment scenarios.

Inference Pipeline

Loading diagram...

Backend & Target Configuration

The module supports multiple computation backends and target devices:

Backends: DNN_BACKEND_OPENCV (CPU), DNN_BACKEND_CUDA (NVIDIA GPU), DNN_BACKEND_INFERENCE_ENGINE (Intel OpenVINO), DNN_BACKEND_HALIDE, DNN_BACKEND_VULKAN, DNN_BACKEND_WEBNN, DNN_BACKEND_TIMVX, DNN_BACKEND_CANN

Targets: DNN_TARGET_CPU, DNN_TARGET_OPENCL, DNN_TARGET_CUDA, DNN_TARGET_MYRIAD, DNN_TARGET_FPGA, DNN_TARGET_HDDL, DNN_TARGET_NPU

Input Preprocessing

Before inference, inputs must be preprocessed to match model expectations:

Mat blob = cv::dnn::blobFromImage(image, scalefactor, size, mean, swapRB);
net.setInput(blob);

The module handles normalization, resizing, channel reordering, and mean subtraction automatically.

Quantization & Optimization

The module supports model quantization for reduced memory footprint and faster inference:

Net quantizedNet = net.quantize(calibrationData, CV_8S, CV_8S);

Layer fusion and Winograd optimization can be enabled to further accelerate computation on compatible hardware.

Performance Analysis

Profiling tools help identify bottlenecks:

std::vector<double> timings;
int64 totalTime = net.getPerfProfile(timings);

This returns per-layer execution times, useful for optimization and debugging.

Hardware Abstraction Layer & Optimization

Relevant Files

modules/core/include/opencv2/core/hal/interface.h
modules/core/src/hal_replacement.hpp
hal/openvx/hal/openvx_hal.hpp
hal/fastcv/CMakeLists.txt
hal/ipp/src/warp_ipp.cpp
hal/carotene/CMakeLists.txt
hal/kleidicv/CMakeLists.txt

Overview

The Hardware Abstraction Layer (HAL) is OpenCV's pluggable optimization framework that decouples high-level algorithms from platform-specific implementations. It enables OpenCV to leverage specialized hardware capabilities (CPUs, GPUs, DSPs) without modifying core algorithm code.

Architecture

The HAL sits between algorithm layers and hardware:

Algorithm Layer (cv::add, cv::filter2D, etc.)
        ↓
HAL Interface (cv_hal_add8u, cv_hal_filter, etc.)
        ↓
HAL Implementations (IPP, OpenVX, FastCV, KleidiCV, Carotene, RISC-V RVV)
        ↓
CPU/GPU/DSP Hardware

Core Design Principles

Pluggable Implementations: Each HAL backend implements a standardized C interface defined in interface.h. Implementations return CV_HAL_ERROR_OK on success, CV_HAL_ERROR_NOT_IMPLEMENTED to fall back to default code, or CV_HAL_ERROR_UNKNOWN on failure.

Macro-Based Dispatch: The hal_replacement.hpp file defines default no-op implementations using macros like cv_hal_add8u, cv_hal_filter, etc. Backend headers override these macros with optimized versions using #undef and #define.

Type-Safe Generics: Many operations use C++ templates to handle multiple data types (8-bit, 16-bit, 32-bit, float, double) with a single implementation.

Available HAL Backends

Backend	Purpose	Target Hardware
IPP	Intel Performance Primitives	x86/x64 CPUs
OpenVX	Khronos standard compute API	GPUs, DSPs, heterogeneous systems
FastCV	Qualcomm mobile optimization	ARM processors, mobile devices
Carotene	ARM NEON optimization	ARM NEON-capable CPUs
KleidiCV	Arm Compute Library integration	Arm processors
RISC-V RVV	RISC-V Vector Extension	RISC-V processors
NDSRVP	Andes DSP extension	Andes processors with DSP

Optimization Strategies

Conditional Fallback: HAL functions check image dimensions and data types. For small images or unsupported formats, they return CV_HAL_ERROR_NOT_IMPLEMENTED to use the default OpenCV implementation, avoiding overhead.

SIMD Intrinsics: The intrin.hpp header provides platform-specific SIMD wrappers (SSE, AVX, NEON, RVV, etc.) for vectorized operations on supported architectures.

Lazy Initialization: Complex operations like DFT and filtering use context structures (cvhalDFT, cvhalFilter2D) to cache precomputed data across multiple calls, reducing setup overhead.

Integration Pattern

// In algorithm code (e.g., cv::add)
int status = cv_hal_add8u(src1, step1, src2, step2, dst, dst_step, width, height);
if (status == CV_HAL_ERROR_OK)
    return;  // HAL handled it
// Fall back to default implementation

Build Configuration

HAL backends are conditionally compiled based on CMake flags:

cmake -DWITH_IPP=ON -DWITH_OPENVX=ON -DWITH_FASTCV=ON ..

Multiple backends can coexist; the first available implementation is used at runtime. This allows OpenCV to gracefully degrade on systems lacking specialized libraries while maximizing performance on well-equipped platforms.

Language Bindings & API Generation

Relevant Files

modules/python/common.cmake
modules/python/src2/cv2.cpp
modules/python/src2/gen2.py
modules/java/generator/gen_java.py
modules/js/src/core_bindings.cpp
modules/js/generator/embindgen.py
modules/objc/generator/gen_objc.py

OpenCV provides language bindings for Python, Java, JavaScript, and Objective-C through a unified code generation system. This architecture enables developers to use OpenCV's C++ API from multiple programming languages while maintaining consistency and reducing manual maintenance.

Architecture Overview

The binding generation system follows a three-stage pipeline:

Header Parsing - C++ headers are parsed to extract classes, functions, enums, and constants
Code Generation - Language-specific generators create wrapper code for each target language
Compilation & Linking - Generated code is compiled and linked with OpenCV libraries

Loading diagram...

Python Bindings

Python bindings are generated via gen2.py and compiled into the cv2 module. The system:

Parses C++ headers using hdr_parser.CppHeaderParser
Generates C++ wrapper code that interfaces with Python's C API
Creates type mappings between C++ and Python types (Mat <-> numpy arrays)
Generates typing stubs for IDE support and type checking
Compiles to a native extension module (.so on Linux, .pyd on Windows)

Key files: cv2.cpp (module initialization), cv2_numpy.cpp (NumPy integration), cv2_convert.cpp (type conversions)

Java Bindings

Java bindings use JNI (Java Native Interface) to bridge Java and C++. The generator:

Creates Java wrapper classes that mirror C++ class hierarchies
Generates JNI glue code for method calls and data marshaling
Handles type conversions (Java primitives <-> C++ types)
Supports method overloading through suffix numbering
Generates both .java source files and .cpp JNI implementations

The system maintains a type dictionary mapping C++ types to Java equivalents (e.g., cv::Mat <-> long nativeObj reference).

JavaScript Bindings

JavaScript bindings compile to WebAssembly using Emscripten. The embindgen.py generator:

Parses headers and extracts public API surface
Generates Emscripten binding code using emscripten::bind
Creates JavaScript wrapper classes for C++ objects
Handles memory management through Emscripten's garbage collection
Produces WASM modules loadable in browsers and Node.js

The core_bindings.cpp file demonstrates binding patterns for Mat, Point, Rect, and other core types.

Objective-C Bindings

Objective-C bindings target iOS and macOS platforms. The generator:

Creates Objective-C wrapper classes with Swift extensions
Generates both .h headers and .mm implementations
Handles C++ object lifetime through cv::Ptr<T> smart pointers
Supports Swift interoperability with type-safe extensions
Generates framework bundles for distribution

Configuration files (gen_dict.json) control which classes and functions are exposed per module.

Configuration & Customization

Each binding generator reads a JSON configuration file specifying:

Which modules to bind
Header file locations
Type mappings and conversions
Classes and functions to skip or rename
Platform-specific settings

This allows fine-grained control over the API surface exposed to each language while sharing the same C++ implementation.

Build System & Configuration

Relevant Files

CMakeLists.txt - Root CMake configuration
cmake/OpenCVUtils.cmake - Utility macros and hooks system
cmake/OpenCVCompilerOptions.cmake - Compiler flags and optimization settings
cmake/templates/OpenCVConfig.cmake.in - Package configuration template
cmake/OpenCVModule.cmake - Module definition and registration
cmake/OpenCVInstallLayout.cmake - Installation directory structure

OpenCV uses CMake as its primary build system, supporting cross-platform compilation from Windows to embedded ARM systems. The build configuration is highly modular, allowing fine-grained control over features, dependencies, and optimization levels.

Core Build Architecture

The root CMakeLists.txt orchestrates the entire build process through several phases:

Initialization - Enforces out-of-source builds, sets CMake policies for compatibility
Detection - Identifies compiler, platform, and available dependencies
Configuration - Processes 100+ build options (WITH_CUDA, BUILD_SHARED_LIBS, etc.)
Module Registration - Discovers and registers OpenCV modules
Finalization - Generates configuration files and installation rules

Build Options & Features

OpenCV provides extensive customization through CMake options:

Core Options:

BUILD_SHARED_LIBS - Build dynamic libraries instead of static
CMAKE_BUILD_TYPE - Release (default) or Debug
ENABLE_PIC - Position-independent code for shared libraries

Acceleration & Hardware:

WITH_CUDA - NVIDIA GPU support (requires CUDA toolkit)
WITH_OPENCL - GPU compute via OpenCL
WITH_IPP - Intel Performance Primitives
WITH_OPENVX - OpenVX acceleration framework
WITH_HALIDE - Halide compiler backend

3rd-Party Libraries:

BUILD_ZLIB, BUILD_PNG, BUILD_JPEG - Build from source or use system libraries
WITH_FFMPEG - Video codec support
WITH_PROTOBUF - Protocol buffers for DNN module

Platform-Specific:

ANDROID_ABI - ARM, ARM64, x86, x86_64
ENABLE_NEON - ARM NEON intrinsics
WITH_CAROTENE - ARM acceleration library

Compiler Configuration

The OpenCVCompilerOptions.cmake file manages compiler-specific settings:

# Example: Enable ccache for faster rebuilds
cmake -DENABLE_CCACHE=ON ..

# Example: Custom compiler flags
cmake -DOPENCV_EXTRA_CXX_FLAGS="-march=native" ..

Key features include:

ccache Integration - Automatic caching of compilation results
Precompiled Headers - Faster builds on MSVC (disabled with Clang)
Link-Time Optimization - ENABLE_LTO for smaller, faster binaries
Sanitizers - Memory and address sanitizer support via OPENCV_ENABLE_MEMORY_SANITIZER

Module System

Modules are discovered via ocv_register_modules() and can be selectively built:

# Build only specific modules
cmake -DBUILD_LIST="core,imgproc,dnn" ..

Each module has its own CMakeLists.txt defining sources, dependencies, and tests. The module system supports:

Conditional compilation based on available dependencies
Inter-module dependency resolution
Automatic test discovery and registration

Installation & Package Configuration

The build generates OpenCVConfig.cmake for downstream projects:

# In your project
find_package(OpenCV REQUIRED core videoio)
target_link_libraries(my_app ${OpenCV_LIBS})

Installation layout is controlled by OpenCVInstallLayout.cmake, supporting:

Standard Unix paths (/usr/local)
Windows package layouts
Android NDK integration
Framework bundles on macOS/iOS

CMake Hooks System

OpenCV provides an extensibility mechanism via CMake hooks. Custom scripts can be registered at specific build phases:

# Register hooks in OPENCV_CMAKE_HOOKS_DIR
# Hooks are called at: CMAKE_INIT, PRE_CMAKE_BOOTSTRAP, POST_COMPILER_OPTIONS, etc.

This allows third-party integrations without modifying core CMake files.

Cross-Compilation

For embedded targets, platform-specific files in cmake/platforms/ configure:

System detection (Android, iOS, Linux, Windows)
Toolchain settings
Architecture-specific optimizations
Semihosting support for ARM bare-metal

Example cross-compilation:

cmake -DCMAKE_TOOLCHAIN_FILE=android.toolchain.cmake \
      -DANDROID_ABI=arm64-v8a \
      -DANDROID_NATIVE_API_LEVEL=21 ..

Video I/O & Capture

Relevant Files

modules/videoio/include/opencv2/videoio.hpp
modules/videoio/src/backend.hpp
modules/videoio/src/cap_interface.hpp
modules/videoio/doc/videoio_overview.markdown

The Video I/O module provides a unified C++ API for capturing video from cameras and files, and writing video output. It abstracts multiple backend implementations, allowing developers to work with a consistent interface regardless of the underlying platform or hardware.

Core Classes

VideoCapture is the primary class for reading video data. It supports three input modes:

Camera capture by index (e.g., VideoCapture(0) for the default camera)
File/stream reading from video files or image sequences
Stream-based input using custom IStreamReader implementations

VideoWriter handles video output, encoding frames to disk or streaming. Both classes support backend selection via apiPreference parameter and property configuration through VideoCaptureProperties and VideoWriterProperties enums.

Backend Architecture

OpenCV uses a pluggable backend system with two types:

Built-in backends: Compiled directly into OpenCV (FFmpeg, GStreamer, MSMF, V4L, etc.)
Plugin backends: Dynamically loaded at runtime (GStreamer, FFmpeg on Linux; MediaSDK on Windows/Linux)

The backend selection follows this priority order:

Modern multi-platform libraries (FFmpeg, GStreamer, MediaSDK)
Platform-specific SDKs (WINRT, AVFoundation, MSMF, V4L)
RGB-D sensors (OpenNI2, RealSense, OBSensor)
File-based backends (image sequences, Motion JPEG)
Specialized camera SDKs (DC1394, XIMEA, Aravis, uEye)

Common Usage Patterns

// Capture from default camera
cv::VideoCapture cap(0);
cv::Mat frame;
while (cap.read(frame)) {
    // Process frame
}

// Specify backend explicitly
cv::VideoCapture cap(filename, cv::CAP_FFMPEG);

// Write video with codec
cv::VideoWriter writer("output.mp4", cv::VideoWriter::fourcc('m','p','4','v'),
                       30.0, cv::Size(640, 480));
writer.write(frame);

Property Management

Both capture and writer support querying and setting properties:

Capture properties: Frame dimensions, FPS, position, codec, frame count
Writer properties: Encoder parameters, quality settings, hardware acceleration

Properties are backend-dependent; not all backends support all properties. Use get() and set() methods to access them.

Backend Configuration

Enable backends at build time using CMake options:

cmake -DWITH_GSTREAMER=ON -DWITH_FFMPEG=ON ..

For plugin backends, add to the plugin list:

cmake -DWITH_GSTREAMER=ON -DVIDEOIO_PLUGIN_LIST=gstreamer ..

Query available backends at runtime using cv::videoio_registry::getBackends(), hasBackend(), and getBackendName().

Hardware Acceleration

Some backends (MSMF, Intel MediaSDK) support hardware-accelerated encoding/decoding. The MSMF backend attempts hardware transforms by default; disable via the OPENCV_VIDEOIO_MSMF_ENABLE_HW_TRANSFORMS=0 environment variable if needed.

Advanced Features

Multi-stream capture: Use VideoCapture::waitAny() to efficiently wait for frames from multiple sources
Custom streams: Implement IStreamReader for non-file sources
Audio support: Some backends (FFmpeg, GStreamer) support audio alongside video
Image sequences: Use CAP_IMAGES backend to read/write numbered image files (e.g., img_%02d.jpg)

Graph API & Execution Framework

Relevant Files

modules/gapi/doc/00-root.markdown
modules/gapi/doc/10-hld-overview.md
modules/gapi/include/opencv2/gapi.hpp
modules/gapi/src/compiler/gcompiler.hpp
modules/gapi/src/executor/gabstractexecutor.hpp

G-API (Graph API) is OpenCV's graph-based execution framework designed for fast, portable image processing pipelines. Unlike traditional function-by-function OpenCV calls, G-API captures entire computation sequences as directed acyclic graphs (DAGs), enabling pipeline-level optimizations and seamless backend portability.

Core Architecture

G-API follows a three-layer architecture:

API Layer — User-facing interface with G-API data types (GMat, GScalar, GArray<T>, GFrame, GOpaque<T>) and operations. Graphs are built implicitly through expressions; no actual computation occurs during graph construction.
Graph Compiler Layer — Built on the ADE Framework, this layer unrolls user expressions into a bipartite graph (Data and Operation nodes), applies optimization passes, and organizes operations into execution "Islands" based on backend affinity.
Backends Layer — Platform-specific implementations (CPU, Fluid, OpenCL, etc.) that execute compiled graphs optimally for their target device.

Graph Compilation Pipeline

Loading diagram...

Compilation happens in two ways:

Implicit — GComputation::apply() compiles and executes immediately (useful for variable input formats)
Explicit — GComputation::compile() returns a reusable GCompiled object (recommended for production)

Data Types & Kernels

G-API provides five dynamic data types for graph construction:

GMat — Image matrices (maps to cv::Mat, cv::UMat, cv::RMat at runtime)
GScalar — Scalar values (maps to cv::Scalar)
GArray<T> — Dynamic lists (maps to std::vector<T>)
GFrame — Media frames in various formats (NV12, I420, BGR)
GOpaque<T> — Arbitrary user types

Kernels define operation interfaces using the G_TYPED_KERNEL() macro. Each kernel specifies a signature, metadata function, and unique identifier. Multiple implementations can exist for the same kernel interface across different backends, enabling the same graph to run on CPU, GPU, or specialized hardware without modification.

Execution Model

Compiled graphs execute as stateless functions — identical inputs always produce identical outputs. The executor manages data dependencies, triggers backend-specific executables when inputs are ready, and handles cross-Island data exchange via host buffers. G-API supports both single-threaded and threaded execution modes.

Key Benefits

Optimization — Tiling and data locality improvements applied automatically
Portability — Write once, deploy anywhere via backend selection
Heterogeneous Processing — Mix multiple backends in a single graph
Streaming — Native support for continuous frame processing pipelines