Overview
Relevant Files
README.mdlibavutil/avutil.hlibavformat/avformat.hlibavcodec/avcodec.hlibavfilter/avfilter.h
FFmpeg is a comprehensive multimedia framework consisting of libraries and command-line tools for processing audio, video, subtitles, and related metadata. It provides a modular architecture where each library handles a specific aspect of multimedia processing, from low-level utilities to high-level filtering and format handling.
Core Libraries
The framework is built on seven primary libraries:
libavutil is the foundation layer providing common utilities shared across all FFmpeg libraries. It includes cryptographic functions, hashing algorithms, memory management, data structures, and miscellaneous helper functions. This library is designed to be modular—you typically include only the specific headers you need.
libavcodec handles encoding and decoding of multimedia streams. It implements a wide range of audio and video codecs, including both native implementations and wrappers for external codec libraries. The library provides hardware acceleration bridges for GPU-accelerated encoding and decoding.
libavformat manages I/O operations, container formats, and streaming protocols. It handles demuxing (splitting media files into component streams) and muxing (writing streams into container formats). It supports numerous protocols including file, TCP, HTTP, and others.
libavfilter provides a graph-based framework for processing audio and video frames. Filters are connected in a directed graph, allowing complex transformations like scaling, color conversion, audio mixing, and effects to be chained together.
libavdevice abstracts access to capture and playback devices, enabling applications to interact with cameras, microphones, displays, and other hardware.
libswresample handles audio resampling, format conversion, and mixing operations, allowing audio streams to be converted between different sample rates and channel layouts.
libswscale implements color space conversion and image scaling, supporting various pixel formats and scaling algorithms.
Command-Line Tools
FFmpeg provides three primary tools built on these libraries:
- ffmpeg is a versatile command-line utility for transcoding, converting, and streaming multimedia content
- ffplay is a minimalist multimedia player for playback and testing
- ffprobe is an analysis tool for inspecting multimedia file properties and structure
Additional utilities like aviocat, ismindex, and qt-faststart provide specialized functionality for specific tasks.
Architecture Overview
Loading diagram...
Versioning and Compatibility
Each library maintains semantic versioning with major, minor, and micro version numbers. FFmpeg guarantees backward API and ABI compatibility as long as the major version remains unchanged. This ensures that applications built against a specific FFmpeg version will continue to work with later versions that share the same major version number.
Architecture & Data Flow
Relevant Files
fftools/ffmpeg_sched.hfftools/ffmpeg_sched.cfftools/ffmpeg.hlibavformat/internal.hlibavcodec/codec_internal.hlibavfilter/avfilter_internal.h
Overview
FFmpeg's transcoding pipeline is built around a directed acyclic graph (DAG) of interconnected components, all coordinated by a central Scheduler. The scheduler manages thread synchronization, packet/frame routing, and ensures all output streams remain synchronized during transcoding.
Core Components
The transcoding process involves five main component types:
- Demuxers – Read encoded packets from input files and distribute them to decoders or muxers (for stream copy)
- Decoders – Decode packets into frames, sending them to filters or encoders
- Filtergraphs – Process frames through audio/video filters, with zero or more inputs and one or more outputs
- Encoders – Encode frames into packets, sending them to muxers or other decoders
- Muxers – Interleave and write packets to output files
Data Flow Architecture
Loading diagram...
Packet Flow: Demuxer → Decoder/Muxer
Frame Flow: Decoder → Filtergraph → Encoder
Packet Flow: Encoder → Muxer
The Scheduler
The Scheduler object is the master coordinator. It:
- Manages all component instances and their thread tasks
- Routes packets and frames between components via thread-safe queues
- Maintains synchronization across all output streams (keeping them at the same DTS)
- Handles backpressure and buffering limits
- Detects and prevents cycles in the processing graph
- Coordinates muxer initialization and SDP writing
Key scheduler functions:
sch_add_demux(),sch_add_dec(),sch_add_filtergraph(),sch_add_enc(),sch_add_mux()– Register componentssch_connect()– Establish connections between componentssch_start()– Launch all worker threadssch_demux_send(),sch_dec_send(),sch_filter_send(),sch_enc_send()– Send data downstreamsch_dec_receive(),sch_filter_receive(),sch_enc_receive(),sch_mux_receive()– Receive data upstream
Thread Model
Each component runs in its own thread:
- Demuxer thread calls
sch_demux_send()to push packets downstream - Decoder thread calls
sch_dec_receive()to pull packets, thensch_dec_send()to push frames - Filter thread calls
sch_filter_receive()to pull frames, thensch_filter_send()to push filtered frames - Encoder thread calls
sch_enc_receive()to pull frames, thensch_enc_send()to push packets - Muxer thread calls
sch_mux_receive()to pull packets and writes them to the output file
All inter-thread communication uses thread-safe queues managed by the scheduler.
Synchronization Strategy
The scheduler keeps output streams synchronized by:
- Tracking the DTS (Decoding Time Stamp) of packets across all streams
- Controlling the rate at which demuxers read packets based on downstream buffer states
- Using sync queues for audio encoders with fixed frame sizes
- Implementing backpressure: if a downstream queue fills, upstream components are blocked
This ensures that even with multiple inputs and complex filter graphs, the output remains properly interleaved.
Container & Protocol Handling (libavformat)
Relevant Files
libavformat/avformat.hlibavformat/avio.hlibavformat/demux.clibavformat/mux.clibavformat/format.clibavformat/protocols.clibavformat/allformats.c
libavformat handles multimedia container formats and network protocols through a modular architecture. The library separates concerns into three layers: format detection and registration, I/O abstraction, and protocol handlers.
Core Structures
AVFormatContext is the central structure for both demuxing and muxing. It contains:
iformatoroformat: The detected or selected container formatpb: An AVIOContext for buffered I/O operationsstreams: Array of AVStream objects describing elementary streamspriv_data: Format-specific private data
AVIOContext provides buffered I/O abstraction with callback functions for reading, writing, and seeking. This allows custom I/O implementations (memory buffers, network sockets, etc.) without modifying demuxer/muxer code.
Format Registration & Discovery
Formats are registered statically at compile time through FFInputFormat and FFOutputFormat structures. The library maintains two lists:
const AVOutputFormat *av_muxer_iterate(void **opaque);
const AVInputFormat *av_demuxer_iterate(void **opaque);
These functions iterate over all registered muxers and demuxers. Format lookup uses:
- Extension matching: Fast but unreliable
- MIME type matching: More reliable for network streams
- Content probing: Reads file header bytes to identify format
Demuxing Workflow
- Open:
avformat_open_input()allocates AVFormatContext, opens the file via protocol handler, and reads the container header - Probe: If format is unknown,
av_probe_input_format()reads initial bytes and tests each demuxer'sread_probe()callback - Read:
av_read_frame()extracts packets from the container - Close:
avformat_close_input()releases resources
Muxing Workflow
- Allocate:
avformat_alloc_output_context2()creates context with specified format - Configure: User sets output format, streams, and options
- Write:
avformat_write_header()writes container header, thenav_write_frame()writes packets - Finalize:
av_write_trailer()writes footer and closes the file
Protocol Layer
Protocols are registered as URLProtocol structures with callbacks for:
url_open(): Establish connectionurl_read(): Read dataurl_write(): Write dataurl_seek(): Seek to position
Supported protocols include file, HTTP, HTTPS, FTP, TCP, UDP, RTMP, HLS, and many others. The protocol is selected by parsing the URL scheme (e.g., http:// selects HTTP protocol).
Custom I/O
Applications can implement custom I/O by:
AVIOContext *pb = avio_alloc_context(buffer, size, write_flag,
opaque, read_fn, write_fn, seek_fn);
AVFormatContext *s = avformat_alloc_context();
s->pb = pb;
avformat_open_input(&s, NULL, NULL, NULL);
This enables reading from memory, compressed streams, or proprietary sources without protocol support.
Format-Specific Options
Both demuxers and muxers expose options through AVClass for configuration:
- Generic options: Defined in AVFormatContext
- Format-specific options: Defined in
priv_classof AVInputFormat/AVOutputFormat - Protocol options: Defined in URLProtocol's
priv_data_class
Options are passed via AVDictionary to avformat_open_input() or set directly on the context before avformat_write_header().
Encoding & Decoding (libavcodec)
Relevant Files
libavcodec/avcodec.hlibavcodec/codec.hlibavcodec/encode.clibavcodec/decode.clibavcodec/allcodecs.c
Overview
libavcodec is FFmpeg's core encoding and decoding library. It provides a unified framework for compressing and decompressing audio, video, and subtitle streams across hundreds of codecs. The library abstracts codec-specific details behind a consistent API, enabling applications to work with multiple formats without codec-specific code.
Core Architecture
The encoding/decoding pipeline uses a send/receive model that decouples input and output:
Loading diagram...
Key Functions:
- Encoding:
avcodec_send_frame()→avcodec_receive_packet() - Decoding:
avcodec_send_packet()→avcodec_receive_frame()
Data Structures
AVCodecContext (libavcodec/avcodec.h): The central structure holding codec state, parameters, and configuration. Contains:
- Codec type and ID
- Bitrate, sample rate, dimensions
- Private codec-specific data (
priv_data) - Internal state (
AVCodecInternal)
AVCodec (libavcodec/codec.h): Describes a codec implementation with:
- Name and long description
- Media type (audio/video/subtitle)
- Capabilities (threading, hardware acceleration, etc.)
- Profiles and supported formats
AVPacket/AVFrame: Container structures for compressed and uncompressed data respectively, with metadata like timestamps and side data.
Encoding Pipeline
Encoding transforms uncompressed frames into compressed packets. The process:
- Allocate and configure
AVCodecContext - Open codec with
avcodec_open2() - Send frames via
avcodec_send_frame() - Receive packets via
avcodec_receive_packet()in a loop - Flush with
avcodec_send_frame(NULL)at end of stream - Continue receiving until
AVERROR_EOF
Encoder Capabilities include AV_CODEC_CAP_DELAY (buffering), AV_CODEC_CAP_ENCODER_RECON_FRAME (reconstruction), and AV_CODEC_CAP_ENCODER_REORDERED_OPAQUE (opaque data handling).
Decoding Pipeline
Decoding reverses the process, transforming compressed packets into uncompressed frames:
- Allocate and configure
AVCodecContext - Open codec with
avcodec_open2() - Send packets via
avcodec_send_packet() - Receive frames via
avcodec_receive_frame()in a loop - Flush with
avcodec_send_packet(NULL)at end of stream - Continue receiving until
AVERROR_EOF
Decoder Capabilities include AV_CODEC_CAP_DR1 (direct rendering), AV_CODEC_CAP_DELAY (frame buffering), and AV_CODEC_CAP_FRAME_THREADS (frame-level parallelism).
Codec Registration
allcodecs.c maintains a registry of all available codecs. Each codec is defined as an FFCodec structure with:
- Callback functions (
encode,decode,init,close) - Supported formats and sample rates
- Hardware acceleration wrappers (QSV, VAAPI, NVENC, etc.)
Codecs are registered at runtime and can be iterated via av_codec_iterate().
Error Handling
The API uses standard FFmpeg error codes:
AVERROR(EAGAIN): Need more input or output buffer spaceAVERROR_EOF: End of stream reachedAVERROR(EINVAL): Invalid parameters- Negative values: Actual errors
State Machine Guarantees
The send/receive API enforces strict state transitions:
- Both send and receive cannot return
EAGAINsimultaneously - Sending
NULLenters draining mode; no new input accepted - Codec must be flushed before reuse
- Timing-independent: state depends only on API calls, not elapsed time
Filter Graph Processing (libavfilter)
Relevant Files
libavfilter/avfilter.hlibavfilter/avfiltergraph.clibavfilter/graphparser.clibavfilter/formats.clibavfilter/buffersrc.c
Overview
libavfilter is FFmpeg's graph-based frame processing library. It enables complex audio and video transformations by connecting filters in a directed acyclic graph (DAG). Each filter performs a specific operation (scaling, color conversion, audio mixing, etc.), and frames flow through the graph from source to sink.
Core Architecture
The filter graph system consists of four main components:
AVFilter - Filter definition containing metadata, input/output pad specifications, and callback functions for initialization, frame processing, and cleanup.
AVFilterContext - An instance of a filter within a graph. Holds the filter's private state, input/output pads, and links to connected filters.
AVFilterLink - Represents a connection between two filters. Stores negotiated format parameters (pixel format, sample rate, channel layout) and manages frame queues for data flow.
AVFilterGraph - Container for all filters and links in a processing chain. Manages graph lifecycle, configuration, and execution.
Graph Construction and Configuration
Creating a filter graph involves three phases:
-
Allocation:
avfilter_graph_alloc()creates an empty graph with thread configuration options. -
Filter Creation:
avfilter_graph_create_filter()instantiates filters and adds them to the graph. Filters are initialized with options before linking. -
Linking:
avfilter_link()connects filter outputs to inputs. Both filters must be initialized before linking.
Format Negotiation
Before processing frames, avfilter_graph_config() performs format negotiation:
int avfilter_graph_config(AVFilterGraph *graphctx, void *log_ctx)
{
graph_check_validity(graphctx, log_ctx);
graph_config_formats(graphctx, log_ctx); // Negotiate formats
graph_config_links(graphctx, log_ctx); // Configure link parameters
graph_check_links(graphctx, log_ctx); // Validate connections
graph_config_pointers(graphctx, log_ctx); // Setup sink tracking
}
The format negotiation process queries each filter's supported formats, merges compatible options across links, and selects optimal formats to minimize conversions.
Data Flow
Frames enter through source filters (buffersrc) and exit through sink filters (buffersink). Intermediate filters process frames via their filter_frame() callback. Each link maintains a frame queue to buffer data between filters with different processing rates.
Graph Parsing
The graphparser.c module parses filter graph descriptions from strings:
[in]scale=640:480[scaled];[scaled]fps=30[out]
This creates a scale filter followed by an fps filter, with labeled pads for external connections.
Threading
Graphs support multi-threaded processing via AVFILTER_THREAD_SLICE mode, where filters process frame slices in parallel. Thread configuration is set per-graph and inherited by all filters.
Cleanup
avfilter_graph_free() releases all filters, links, and associated resources. Individual filters are freed automatically when the graph is destroyed.
Audio Processing (libswresample & Audio Filters)
Relevant Files
libswresample/swresample.hlibswresample/resample.clibswresample/rematrix.clibswresample/audioconvert.clibswresample/dither.clibavfilter/audio.clibavfilter/af_volume.clibavfilter/af_aresample.c
Overview
Audio processing in FFmpeg is split between libswresample (resampling, format conversion, mixing) and libavfilter (audio effects and transformations). These libraries handle the core operations needed to convert audio between different sample rates, channel layouts, and formats.
libswresample: Core Audio Conversion
libswresample provides the SwrContext API for audio resampling and format conversion. The main workflow is:
- Allocate context with
swr_alloc()orswr_alloc_set_opts2() - Configure input/output sample rates, formats, and channel layouts via AVOptions
- Initialize with
swr_init() - Convert audio frames using
swr_convert()
SwrContext *swr = swr_alloc();
av_opt_set_int(swr, "in_sample_rate", 48000, 0);
av_opt_set_int(swr, "out_sample_rate", 44100, 0);
av_opt_set_sample_fmt(swr, "in_sample_fmt", AV_SAMPLE_FMT_FLTP, 0);
av_opt_set_sample_fmt(swr, "out_sample_fmt", AV_SAMPLE_FMT_S16, 0);
swr_init(swr);
Resampling Engines
The resampling process converts audio between different sample rates using polyphase filterbanks. Two engines are available:
- SWR_ENGINE_SWR (default): FFmpeg's built-in resampler with configurable filter types
- SWR_ENGINE_SOXR: SoX Resampler for higher quality (if compiled with libsoxr support)
Filter types include Cubic, Blackman-Nuttall windowed sinc, and Kaiser windowed sinc. The resample_init() function builds the polyphase filterbank based on the resampling factor and selected filter type.
Channel Rematrixing
The swri_rematrix() function handles channel layout conversions (e.g., 5.1 surround to stereo). It uses a mixing matrix to combine input channels into output channels. SIMD-optimized paths exist for common conversions like stereo-to-mono and mono-to-stereo.
Sample Format Conversion & Dithering
swri_audio_convert_alloc() creates format converters between different sample formats (S16, S32, FLT, DBL, etc.). The dither.c module implements dithering to reduce quantization noise when converting to lower bit depths, supporting rectangular and triangular high-pass dithering methods.
libavfilter: Audio Effects
libavfilter provides audio filters that operate on AVFrame objects. The ff_get_audio_buffer() function allocates output frames with proper channel layout and sample rate metadata.
Volume Filter (af_volume.c)
The volume filter adjusts audio amplitude using expression evaluation. Key features:
- Precision modes: Fixed-point (8-bit), float (32-bit), or double (64-bit)
- Dynamic evaluation: Per-frame or once-per-stream
- ReplayGain support: Applies metadata-based gain adjustments
- Runtime control: Volume can be changed via filter commands
// Set volume to 0.5 (half amplitude)
av_opt_set(ctx, "volume", "0.5", 0);
Aresample Filter (af_aresample.c)
Wraps libswresample for use in filter graphs. It calculates output frame size accounting for resampling delay and uses swr_get_delay() to determine buffered samples.
Data Flow Architecture
Loading diagram...
Key Concepts
- Polyphase Filterbank: Pre-computed filter coefficients for efficient resampling at multiple phases
- Rematrix Matrix: Defines how input channels mix into output channels
- Dithering: Adds controlled noise to reduce quantization artifacts
- Filter Graphs: Chain multiple audio filters with automatic format negotiation
Video Processing (libswscale & Video Filters)
Relevant Files
libswscale/swscale.hlibswscale/swscale.clibswscale/format.clibavfilter/vf_scale.clibavfilter/video.c
Overview
Video processing in FFmpeg is split between two core libraries: libswscale handles low-level image scaling and color conversion, while libavfilter provides a high-level filter graph framework that uses libswscale for video transformations. Together, they enable efficient video resizing, format conversion, and filtering operations.
libswscale: Scaling & Color Conversion
libswscale is a dedicated library for image scaling and pixel format conversion. It provides:
- SwsContext: The main opaque context object that holds scaling state, filter coefficients, and function pointers for optimized operations
- Scaling algorithms: Multiple quality levels from fast bilinear to high-quality bicubic and Lanczos filters
- Format support: Handles conversion between 100+ pixel formats (YUV, RGB, packed, planar, etc.)
- Dithering: Optional dithering modes (Bayer, error diffusion) for quality reduction
- Slice-based processing: Supports streaming input via
sws_send_slice()andsws_receive_slice()for memory efficiency
Key functions:
// Create and initialize a scaling context
SwsContext *sws_getContext(int srcW, int srcH, enum AVPixelFormat srcFormat,
int dstW, int dstH, enum AVPixelFormat dstFormat,
int flags, SwsFilter *srcFilter,
SwsFilter *dstFilter, const double *param);
// Scale a complete frame
int sws_scale_frame(SwsContext *sws, AVFrame *dst, const AVFrame *src);
// Stream-based API for slice processing
int sws_send_slice(SwsContext *sws, unsigned int slice_start, unsigned int slice_height);
int sws_receive_slice(SwsContext *sws, unsigned int slice_start, unsigned int slice_height);
Video Filter Architecture
libavfilter provides a modular filter graph system where filters are connected via pads and links. The scale filter (vf_scale.c) is the primary video scaling filter that wraps libswscale.
Filter graph flow:
Loading diagram...
The scale filter supports:
- Dynamic expressions: Width and height can be specified as expressions (e.g.,
w=iw/2:h=ih/2) - Aspect ratio preservation: Options to maintain aspect ratio or force divisibility
- Colorspace metadata: Handles color matrix, primaries, transfer functions, and chroma location
- Scale2ref variant: Scales one input based on another input's dimensions
Processing Pipeline
When a frame flows through the scale filter:
- config_props() negotiates output dimensions and initializes the SwsContext
- filter_frame() receives input frames and calls
sws_scale_frame() - libswscale performs the actual scaling using optimized SIMD code paths
- Output frame is pushed to the next filter in the graph
The pipeline supports both legacy slice-based API and modern frame-based API for flexibility.
Performance Considerations
- Caching: SwsContext can be reused across multiple frames with identical parameters via
sws_getCachedContext() - Threading: libswscale supports multi-threaded scaling via slice parallelization
- SIMD optimization: Architecture-specific implementations (x86, ARM, RISC-V) for fast operations
- Cascaded scaling: Large downscaling factors are split into multiple passes to maintain quality
FFmpeg Command-Line Tool & Transcoding
Relevant Files
fftools/ffmpeg.c– Main entry point and transcoding loopfftools/ffmpeg.h– Core data structures and enumsfftools/ffmpeg_opt.c– Command-line option parsingfftools/ffmpeg_dec.c– Decoder implementationfftools/ffmpeg_enc.c– Encoder implementationfftools/ffmpeg_demux.c– Demuxer and input handlingfftools/ffmpeg_mux.c– Muxer and output handlingfftools/ffmpeg_filter.c– Filter graph managementfftools/ffmpeg_sched.c– Scheduler for multi-threaded pipeline
Overview
FFmpeg's command-line tool implements a modular transcoding pipeline that transforms multimedia data through a series of processing stages. The architecture separates concerns into independent components that communicate via queues, enabling efficient multi-threaded processing.
Transcoding Pipeline Architecture
Loading diagram...
Core Components
Demuxer (ffmpeg_demux.c): Reads input files and extracts elementary streams. Each input file gets one demuxer instance that parses the container format and sends encoded packets to decoders or muxers for stream copying.
Decoder (ffmpeg_dec.c): Decompresses encoded packets into raw frames. Handles hardware acceleration, frame rate conversion, and timestamp management. Decoded frames flow to filter graphs or directly to encoders.
Filter Graph (ffmpeg_filter.c): Applies audio/video transformations using libavfilter. Supports simple graphs (single input/output) and complex graphs (multiple inputs/outputs with arbitrary connections). Handles format negotiation and frame buffering.
Encoder (ffmpeg_enc.c): Compresses frames into packets using specified codecs. Manages bitrate control, quality settings, and multi-pass encoding. Outputs packets to muxers.
Muxer (ffmpeg_mux.c): Writes encoded packets into output containers. Handles stream synchronization, metadata, and format-specific requirements.
Data Structures
InputStream – Represents a single input stream with codec parameters, decoder reference, and connected filters.
OutputStream – Represents output stream configuration including encoder, bitrate, and muxer assignment.
InputFile/OutputFile – Container-level structures managing multiple streams and format context.
FilterGraph – Encapsulates filter chain with input/output pads and format negotiation state.
Scheduler & Threading
The Scheduler (ffmpeg_sched.c) coordinates all components using thread-safe queues. Each demuxer, decoder, filter, encoder, and muxer runs in its own thread. The scheduler manages:
- Task creation and lifecycle
- Queue-based communication between stages
- Backpressure handling when buffers fill
- Synchronization across multiple streams
- Graceful shutdown and error propagation
Command-Line Processing
Option parsing (ffmpeg_opt.c) builds a configuration from command-line arguments. Stream specifiers (:v, :a, :s) allow per-stream options. The tool supports:
- Multiple inputs and outputs
- Stream mapping (
-map) - Codec selection (
-c) - Filter graphs (
-filter_complex) - Hardware acceleration (
-hwaccel) - Bitrate/quality control (
-b,-q) - Metadata and chapters
Execution Flow
- Parse options and build input/output configuration
- Create scheduler and register all components
- Start demuxers, decoders, filters, encoders, and muxers as threads
- Scheduler coordinates packet/frame flow through queues
- Main thread monitors progress and handles user input
- On completion, flush remaining data and close all components
SIMD & Architecture-Specific Optimization
Relevant Files
libavcodec/x86/- x86/x64 SIMD implementations (SSE, AVX, AVX-512)libavcodec/arm/- ARM NEON and ARMv6 optimizationslibavcodec/aarch64/- AArch64 NEON, SVE, and SVE2 implementationslibavfilter/x86/- Filter SIMD optimizationslibswscale/x86/- Scaler SIMD implementationslibavutil/cpu.h- CPU feature detection flagslibavutil/x86/cpu.c,libavutil/arm/cpu.c,libavutil/aarch64/cpu.c- Runtime CPU detection
FFmpeg uses handwritten SIMD assembly to accelerate performance-critical operations across multiple architectures. Modern compilers cannot efficiently generate SIMD code, so architecture-specific implementations are essential for video and audio processing.
Architecture Support
x86/x64 (NASM Assembly)
The x86 backend supports a progression of instruction sets: MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX, AVX2, and AVX-512. Each codec and filter has multiple implementations targeting different CPU generations. For example, AAC encoding has separate functions for SSE, SSE2, and AVX, with the fastest available version selected at runtime.
ARM/AArch64 (GAS Assembly)
ARM uses NEON (128-bit SIMD) for 32-bit ARMv7 and AArch64. AArch64 additionally supports SVE (Scalable Vector Extension) and SVE2 for variable-width vectors. ARMv6 provides basic optimizations for older devices. Implementations use GAS syntax with intrinsics for clarity.
Other Architectures
RISC-V (RVV), MIPS (MSA), PowerPC (VSX/Altivec), and LoongArch (LSX/LASX) have specialized implementations in their respective directories.
Runtime CPU Detection
FFmpeg detects available CPU features at startup using platform-specific methods:
- Linux/Unix: CPUID instruction (x86) or
/proc/cpuinfoparsing - macOS:
sysctlbyname()API - Windows:
IsProcessorFeaturePresent()API - ARM Linux:
AT_HWCAPauxiliary vector
The av_get_cpu_flags() function returns a bitmask of supported features (e.g., AV_CPU_FLAG_AVX2, AV_CPU_FLAG_NEON). Initialization functions check these flags and assign optimized function pointers.
Implementation Pattern
av_cold void ff_aacenc_dsp_init_x86(AACEncDSPContext *s)
{
int cpu_flags = av_get_cpu_flags();
if (EXTERNAL_SSE(cpu_flags))
s->abs_pow34 = ff_abs_pow34_sse;
if (EXTERNAL_AVX_FAST(cpu_flags))
s->quant_bands = ff_aac_quantize_bands_avx;
}
Each module (codec, filter, scaler) has an _init_x86(), _init_arm(), or _init_aarch64() function that checks CPU flags and populates a DSP context with function pointers. The C code calls through these pointers, ensuring the best available implementation runs transparently.
Key Optimizations
- Vectorized loops: Process multiple samples/pixels per iteration
- Intrinsic macros: Portable SIMD abstractions (e.g.,
HADDD,VBROADCASTI128) - Bit-depth variants: Separate 8-bit, 10-bit, and 16-bit implementations
- Conditional compilation: Features enabled only if compiler supports them
This architecture allows FFmpeg to deliver near-native performance across diverse hardware while maintaining portable C fallbacks for compatibility.
Utilities & Common Infrastructure (libavutil)
Relevant Files
libavutil/avutil.hlibavutil/buffer.clibavutil/frame.clibavutil/dict.clibavutil/log.c
libavutil is FFmpeg's foundational utility library, providing portable, reusable components shared across all FFmpeg libraries. It abstracts platform-specific functionality and offers essential data structures, memory management, and multimedia utilities.
Core Architecture
libavutil is designed as a modular library where components are independent and can be included selectively. The library is organized into several functional categories:
- Memory Management - Aligned memory allocation for SIMD operations, reference counting
- Data Structures - Buffers, frames, dictionaries, FIFOs, trees
- Cryptography & Hashing - AES, DES, MD5, SHA, HMAC implementations
- Mathematics - Rational numbers, timestamp rescaling, mathematical utilities
- String Utilities - Safe string functions, parsing, formatting
- Logging & Debugging - Hierarchical logging system with categories
- Hardware Acceleration - GPU context management (CUDA, Vulkan, OpenCL, etc.)
Reference-Counted Buffers (AVBuffer)
The buffer system implements thread-safe reference counting for memory management:
AVBufferRef *ref = av_buffer_alloc(size);
AVBufferRef *ref2 = av_buffer_ref(ref); // Increment refcount
av_buffer_unref(&ref); // Decrement, auto-free when count reaches 0
Key features include writable buffer detection (av_buffer_is_writable()) and copy-on-write semantics. Multiple references can point to different regions of the same underlying buffer, enabling efficient data sharing.
Frames (AVFrame)
AVFrame is the central abstraction for multimedia data, supporting both audio and video:
AVFrame *frame = av_frame_alloc();
frame->format = AV_PIX_FMT_YUV420P;
av_frame_get_buffer(frame, align); // Allocate data planes
av_frame_unref(frame); // Release buffers
av_frame_free(&frame); // Deallocate frame
Frames carry extensive metadata including timestamps, side data (closed captions, motion vectors, HDR info), and hardware acceleration contexts.
Metadata & Dictionaries (AVDictionary)
Simple key-value storage for metadata:
AVDictionary *dict = NULL;
av_dict_set(&dict, "title", "My Video", 0);
AVDictionaryEntry *entry = av_dict_get(dict, "title", NULL, 0);
av_dict_free(&dict);
Note: AVDictionary is deprecated for new code; AVL trees (tree.h) offer better performance.
Logging System
Hierarchical logging with component categories and configurable levels:
av_log_set_level(AV_LOG_DEBUG);
av_log(ctx, AV_LOG_INFO, "Message: %s\n", text);
Supports colored output, timestamps, and per-component filtering through the AVClass system.
Data Flow Diagram
Loading diagram...
Key Design Principles
- Modularity - Include only needed headers; no monolithic dependencies
- Thread Safety - Reference counting and atomic operations are thread-safe
- Platform Abstraction - Handles endianness, alignment, and OS-specific quirks
- Performance - SIMD-aligned memory, architecture-specific optimizations
- Backward Compatibility - Strict ABI/API stability guarantees within major versions