Install Now

FFmpeg/FFmpeg

FFmpeg Multimedia Framework

Last updated on Dec 18, 2025 (Commit: 78c75d5)

Overview

Relevant Files
  • README.md
  • libavutil/avutil.h
  • libavformat/avformat.h
  • libavcodec/avcodec.h
  • libavfilter/avfilter.h

FFmpeg is a comprehensive multimedia framework consisting of libraries and command-line tools for processing audio, video, subtitles, and related metadata. It provides a modular architecture where each library handles a specific aspect of multimedia processing, from low-level utilities to high-level filtering and format handling.

Core Libraries

The framework is built on seven primary libraries:

libavutil is the foundation layer providing common utilities shared across all FFmpeg libraries. It includes cryptographic functions, hashing algorithms, memory management, data structures, and miscellaneous helper functions. This library is designed to be modular—you typically include only the specific headers you need.

libavcodec handles encoding and decoding of multimedia streams. It implements a wide range of audio and video codecs, including both native implementations and wrappers for external codec libraries. The library provides hardware acceleration bridges for GPU-accelerated encoding and decoding.

libavformat manages I/O operations, container formats, and streaming protocols. It handles demuxing (splitting media files into component streams) and muxing (writing streams into container formats). It supports numerous protocols including file, TCP, HTTP, and others.

libavfilter provides a graph-based framework for processing audio and video frames. Filters are connected in a directed graph, allowing complex transformations like scaling, color conversion, audio mixing, and effects to be chained together.

libavdevice abstracts access to capture and playback devices, enabling applications to interact with cameras, microphones, displays, and other hardware.

libswresample handles audio resampling, format conversion, and mixing operations, allowing audio streams to be converted between different sample rates and channel layouts.

libswscale implements color space conversion and image scaling, supporting various pixel formats and scaling algorithms.

Command-Line Tools

FFmpeg provides three primary tools built on these libraries:

  • ffmpeg is a versatile command-line utility for transcoding, converting, and streaming multimedia content
  • ffplay is a minimalist multimedia player for playback and testing
  • ffprobe is an analysis tool for inspecting multimedia file properties and structure

Additional utilities like aviocat, ismindex, and qt-faststart provide specialized functionality for specific tasks.

Architecture Overview

Loading diagram...

Versioning and Compatibility

Each library maintains semantic versioning with major, minor, and micro version numbers. FFmpeg guarantees backward API and ABI compatibility as long as the major version remains unchanged. This ensures that applications built against a specific FFmpeg version will continue to work with later versions that share the same major version number.

Architecture & Data Flow

Relevant Files
  • fftools/ffmpeg_sched.h
  • fftools/ffmpeg_sched.c
  • fftools/ffmpeg.h
  • libavformat/internal.h
  • libavcodec/codec_internal.h
  • libavfilter/avfilter_internal.h

Overview

FFmpeg's transcoding pipeline is built around a directed acyclic graph (DAG) of interconnected components, all coordinated by a central Scheduler. The scheduler manages thread synchronization, packet/frame routing, and ensures all output streams remain synchronized during transcoding.

Core Components

The transcoding process involves five main component types:

  1. Demuxers – Read encoded packets from input files and distribute them to decoders or muxers (for stream copy)
  2. Decoders – Decode packets into frames, sending them to filters or encoders
  3. Filtergraphs – Process frames through audio/video filters, with zero or more inputs and one or more outputs
  4. Encoders – Encode frames into packets, sending them to muxers or other decoders
  5. Muxers – Interleave and write packets to output files

Data Flow Architecture

Loading diagram...

Packet Flow: Demuxer → Decoder/Muxer
Frame Flow: Decoder → Filtergraph → Encoder
Packet Flow: Encoder → Muxer

The Scheduler

The Scheduler object is the master coordinator. It:

  • Manages all component instances and their thread tasks
  • Routes packets and frames between components via thread-safe queues
  • Maintains synchronization across all output streams (keeping them at the same DTS)
  • Handles backpressure and buffering limits
  • Detects and prevents cycles in the processing graph
  • Coordinates muxer initialization and SDP writing

Key scheduler functions:

  • sch_add_demux(), sch_add_dec(), sch_add_filtergraph(), sch_add_enc(), sch_add_mux() – Register components
  • sch_connect() – Establish connections between components
  • sch_start() – Launch all worker threads
  • sch_demux_send(), sch_dec_send(), sch_filter_send(), sch_enc_send() – Send data downstream
  • sch_dec_receive(), sch_filter_receive(), sch_enc_receive(), sch_mux_receive() – Receive data upstream

Thread Model

Each component runs in its own thread:

  • Demuxer thread calls sch_demux_send() to push packets downstream
  • Decoder thread calls sch_dec_receive() to pull packets, then sch_dec_send() to push frames
  • Filter thread calls sch_filter_receive() to pull frames, then sch_filter_send() to push filtered frames
  • Encoder thread calls sch_enc_receive() to pull frames, then sch_enc_send() to push packets
  • Muxer thread calls sch_mux_receive() to pull packets and writes them to the output file

All inter-thread communication uses thread-safe queues managed by the scheduler.

Synchronization Strategy

The scheduler keeps output streams synchronized by:

  • Tracking the DTS (Decoding Time Stamp) of packets across all streams
  • Controlling the rate at which demuxers read packets based on downstream buffer states
  • Using sync queues for audio encoders with fixed frame sizes
  • Implementing backpressure: if a downstream queue fills, upstream components are blocked

This ensures that even with multiple inputs and complex filter graphs, the output remains properly interleaved.

Container & Protocol Handling (libavformat)

Relevant Files
  • libavformat/avformat.h
  • libavformat/avio.h
  • libavformat/demux.c
  • libavformat/mux.c
  • libavformat/format.c
  • libavformat/protocols.c
  • libavformat/allformats.c

libavformat handles multimedia container formats and network protocols through a modular architecture. The library separates concerns into three layers: format detection and registration, I/O abstraction, and protocol handlers.

Core Structures

AVFormatContext is the central structure for both demuxing and muxing. It contains:

  • iformat or oformat: The detected or selected container format
  • pb: An AVIOContext for buffered I/O operations
  • streams: Array of AVStream objects describing elementary streams
  • priv_data: Format-specific private data

AVIOContext provides buffered I/O abstraction with callback functions for reading, writing, and seeking. This allows custom I/O implementations (memory buffers, network sockets, etc.) without modifying demuxer/muxer code.

Format Registration & Discovery

Formats are registered statically at compile time through FFInputFormat and FFOutputFormat structures. The library maintains two lists:

const AVOutputFormat *av_muxer_iterate(void **opaque);
const AVInputFormat *av_demuxer_iterate(void **opaque);

These functions iterate over all registered muxers and demuxers. Format lookup uses:

  • Extension matching: Fast but unreliable
  • MIME type matching: More reliable for network streams
  • Content probing: Reads file header bytes to identify format

Demuxing Workflow

  1. Open: avformat_open_input() allocates AVFormatContext, opens the file via protocol handler, and reads the container header
  2. Probe: If format is unknown, av_probe_input_format() reads initial bytes and tests each demuxer's read_probe() callback
  3. Read: av_read_frame() extracts packets from the container
  4. Close: avformat_close_input() releases resources

Muxing Workflow

  1. Allocate: avformat_alloc_output_context2() creates context with specified format
  2. Configure: User sets output format, streams, and options
  3. Write: avformat_write_header() writes container header, then av_write_frame() writes packets
  4. Finalize: av_write_trailer() writes footer and closes the file

Protocol Layer

Protocols are registered as URLProtocol structures with callbacks for:

  • url_open(): Establish connection
  • url_read(): Read data
  • url_write(): Write data
  • url_seek(): Seek to position

Supported protocols include file, HTTP, HTTPS, FTP, TCP, UDP, RTMP, HLS, and many others. The protocol is selected by parsing the URL scheme (e.g., http:// selects HTTP protocol).

Custom I/O

Applications can implement custom I/O by:

AVIOContext *pb = avio_alloc_context(buffer, size, write_flag,
                                      opaque, read_fn, write_fn, seek_fn);
AVFormatContext *s = avformat_alloc_context();
s->pb = pb;
avformat_open_input(&s, NULL, NULL, NULL);

This enables reading from memory, compressed streams, or proprietary sources without protocol support.

Format-Specific Options

Both demuxers and muxers expose options through AVClass for configuration:

  • Generic options: Defined in AVFormatContext
  • Format-specific options: Defined in priv_class of AVInputFormat/AVOutputFormat
  • Protocol options: Defined in URLProtocol's priv_data_class

Options are passed via AVDictionary to avformat_open_input() or set directly on the context before avformat_write_header().

Encoding & Decoding (libavcodec)

Relevant Files
  • libavcodec/avcodec.h
  • libavcodec/codec.h
  • libavcodec/encode.c
  • libavcodec/decode.c
  • libavcodec/allcodecs.c

Overview

libavcodec is FFmpeg's core encoding and decoding library. It provides a unified framework for compressing and decompressing audio, video, and subtitle streams across hundreds of codecs. The library abstracts codec-specific details behind a consistent API, enabling applications to work with multiple formats without codec-specific code.

Core Architecture

The encoding/decoding pipeline uses a send/receive model that decouples input and output:

Loading diagram...

Key Functions:

  • Encoding: avcodec_send_frame()avcodec_receive_packet()
  • Decoding: avcodec_send_packet()avcodec_receive_frame()

Data Structures

AVCodecContext (libavcodec/avcodec.h): The central structure holding codec state, parameters, and configuration. Contains:

  • Codec type and ID
  • Bitrate, sample rate, dimensions
  • Private codec-specific data (priv_data)
  • Internal state (AVCodecInternal)

AVCodec (libavcodec/codec.h): Describes a codec implementation with:

  • Name and long description
  • Media type (audio/video/subtitle)
  • Capabilities (threading, hardware acceleration, etc.)
  • Profiles and supported formats

AVPacket/AVFrame: Container structures for compressed and uncompressed data respectively, with metadata like timestamps and side data.

Encoding Pipeline

Encoding transforms uncompressed frames into compressed packets. The process:

  1. Allocate and configure AVCodecContext
  2. Open codec with avcodec_open2()
  3. Send frames via avcodec_send_frame()
  4. Receive packets via avcodec_receive_packet() in a loop
  5. Flush with avcodec_send_frame(NULL) at end of stream
  6. Continue receiving until AVERROR_EOF

Encoder Capabilities include AV_CODEC_CAP_DELAY (buffering), AV_CODEC_CAP_ENCODER_RECON_FRAME (reconstruction), and AV_CODEC_CAP_ENCODER_REORDERED_OPAQUE (opaque data handling).

Decoding Pipeline

Decoding reverses the process, transforming compressed packets into uncompressed frames:

  1. Allocate and configure AVCodecContext
  2. Open codec with avcodec_open2()
  3. Send packets via avcodec_send_packet()
  4. Receive frames via avcodec_receive_frame() in a loop
  5. Flush with avcodec_send_packet(NULL) at end of stream
  6. Continue receiving until AVERROR_EOF

Decoder Capabilities include AV_CODEC_CAP_DR1 (direct rendering), AV_CODEC_CAP_DELAY (frame buffering), and AV_CODEC_CAP_FRAME_THREADS (frame-level parallelism).

Codec Registration

allcodecs.c maintains a registry of all available codecs. Each codec is defined as an FFCodec structure with:

  • Callback functions (encode, decode, init, close)
  • Supported formats and sample rates
  • Hardware acceleration wrappers (QSV, VAAPI, NVENC, etc.)

Codecs are registered at runtime and can be iterated via av_codec_iterate().

Error Handling

The API uses standard FFmpeg error codes:

  • AVERROR(EAGAIN): Need more input or output buffer space
  • AVERROR_EOF: End of stream reached
  • AVERROR(EINVAL): Invalid parameters
  • Negative values: Actual errors

State Machine Guarantees

The send/receive API enforces strict state transitions:

  • Both send and receive cannot return EAGAIN simultaneously
  • Sending NULL enters draining mode; no new input accepted
  • Codec must be flushed before reuse
  • Timing-independent: state depends only on API calls, not elapsed time

Filter Graph Processing (libavfilter)

Relevant Files
  • libavfilter/avfilter.h
  • libavfilter/avfiltergraph.c
  • libavfilter/graphparser.c
  • libavfilter/formats.c
  • libavfilter/buffersrc.c

Overview

libavfilter is FFmpeg's graph-based frame processing library. It enables complex audio and video transformations by connecting filters in a directed acyclic graph (DAG). Each filter performs a specific operation (scaling, color conversion, audio mixing, etc.), and frames flow through the graph from source to sink.

Core Architecture

The filter graph system consists of four main components:

AVFilter - Filter definition containing metadata, input/output pad specifications, and callback functions for initialization, frame processing, and cleanup.

AVFilterContext - An instance of a filter within a graph. Holds the filter's private state, input/output pads, and links to connected filters.

AVFilterLink - Represents a connection between two filters. Stores negotiated format parameters (pixel format, sample rate, channel layout) and manages frame queues for data flow.

AVFilterGraph - Container for all filters and links in a processing chain. Manages graph lifecycle, configuration, and execution.

Graph Construction and Configuration

Creating a filter graph involves three phases:

  1. Allocation: avfilter_graph_alloc() creates an empty graph with thread configuration options.

  2. Filter Creation: avfilter_graph_create_filter() instantiates filters and adds them to the graph. Filters are initialized with options before linking.

  3. Linking: avfilter_link() connects filter outputs to inputs. Both filters must be initialized before linking.

Format Negotiation

Before processing frames, avfilter_graph_config() performs format negotiation:

int avfilter_graph_config(AVFilterGraph *graphctx, void *log_ctx)
{
    graph_check_validity(graphctx, log_ctx);
    graph_config_formats(graphctx, log_ctx);  // Negotiate formats
    graph_config_links(graphctx, log_ctx);    // Configure link parameters
    graph_check_links(graphctx, log_ctx);     // Validate connections
    graph_config_pointers(graphctx, log_ctx); // Setup sink tracking
}

The format negotiation process queries each filter's supported formats, merges compatible options across links, and selects optimal formats to minimize conversions.

Data Flow

Frames enter through source filters (buffersrc) and exit through sink filters (buffersink). Intermediate filters process frames via their filter_frame() callback. Each link maintains a frame queue to buffer data between filters with different processing rates.

Graph Parsing

The graphparser.c module parses filter graph descriptions from strings:

[in]scale=640:480[scaled];[scaled]fps=30[out]

This creates a scale filter followed by an fps filter, with labeled pads for external connections.

Threading

Graphs support multi-threaded processing via AVFILTER_THREAD_SLICE mode, where filters process frame slices in parallel. Thread configuration is set per-graph and inherited by all filters.

Cleanup

avfilter_graph_free() releases all filters, links, and associated resources. Individual filters are freed automatically when the graph is destroyed.

Audio Processing (libswresample & Audio Filters)

Relevant Files
  • libswresample/swresample.h
  • libswresample/resample.c
  • libswresample/rematrix.c
  • libswresample/audioconvert.c
  • libswresample/dither.c
  • libavfilter/audio.c
  • libavfilter/af_volume.c
  • libavfilter/af_aresample.c

Overview

Audio processing in FFmpeg is split between libswresample (resampling, format conversion, mixing) and libavfilter (audio effects and transformations). These libraries handle the core operations needed to convert audio between different sample rates, channel layouts, and formats.

libswresample: Core Audio Conversion

libswresample provides the SwrContext API for audio resampling and format conversion. The main workflow is:

  1. Allocate context with swr_alloc() or swr_alloc_set_opts2()
  2. Configure input/output sample rates, formats, and channel layouts via AVOptions
  3. Initialize with swr_init()
  4. Convert audio frames using swr_convert()
SwrContext *swr = swr_alloc();
av_opt_set_int(swr, "in_sample_rate", 48000, 0);
av_opt_set_int(swr, "out_sample_rate", 44100, 0);
av_opt_set_sample_fmt(swr, "in_sample_fmt", AV_SAMPLE_FMT_FLTP, 0);
av_opt_set_sample_fmt(swr, "out_sample_fmt", AV_SAMPLE_FMT_S16, 0);
swr_init(swr);

Resampling Engines

The resampling process converts audio between different sample rates using polyphase filterbanks. Two engines are available:

  • SWR_ENGINE_SWR (default): FFmpeg's built-in resampler with configurable filter types
  • SWR_ENGINE_SOXR: SoX Resampler for higher quality (if compiled with libsoxr support)

Filter types include Cubic, Blackman-Nuttall windowed sinc, and Kaiser windowed sinc. The resample_init() function builds the polyphase filterbank based on the resampling factor and selected filter type.

Channel Rematrixing

The swri_rematrix() function handles channel layout conversions (e.g., 5.1 surround to stereo). It uses a mixing matrix to combine input channels into output channels. SIMD-optimized paths exist for common conversions like stereo-to-mono and mono-to-stereo.

Sample Format Conversion & Dithering

swri_audio_convert_alloc() creates format converters between different sample formats (S16, S32, FLT, DBL, etc.). The dither.c module implements dithering to reduce quantization noise when converting to lower bit depths, supporting rectangular and triangular high-pass dithering methods.

libavfilter: Audio Effects

libavfilter provides audio filters that operate on AVFrame objects. The ff_get_audio_buffer() function allocates output frames with proper channel layout and sample rate metadata.

Volume Filter (af_volume.c)

The volume filter adjusts audio amplitude using expression evaluation. Key features:

  • Precision modes: Fixed-point (8-bit), float (32-bit), or double (64-bit)
  • Dynamic evaluation: Per-frame or once-per-stream
  • ReplayGain support: Applies metadata-based gain adjustments
  • Runtime control: Volume can be changed via filter commands
// Set volume to 0.5 (half amplitude)
av_opt_set(ctx, "volume", "0.5", 0);

Aresample Filter (af_aresample.c)

Wraps libswresample for use in filter graphs. It calculates output frame size accounting for resampling delay and uses swr_get_delay() to determine buffered samples.

Data Flow Architecture

Loading diagram...

Key Concepts

  • Polyphase Filterbank: Pre-computed filter coefficients for efficient resampling at multiple phases
  • Rematrix Matrix: Defines how input channels mix into output channels
  • Dithering: Adds controlled noise to reduce quantization artifacts
  • Filter Graphs: Chain multiple audio filters with automatic format negotiation

Video Processing (libswscale & Video Filters)

Relevant Files
  • libswscale/swscale.h
  • libswscale/swscale.c
  • libswscale/format.c
  • libavfilter/vf_scale.c
  • libavfilter/video.c

Overview

Video processing in FFmpeg is split between two core libraries: libswscale handles low-level image scaling and color conversion, while libavfilter provides a high-level filter graph framework that uses libswscale for video transformations. Together, they enable efficient video resizing, format conversion, and filtering operations.

libswscale: Scaling & Color Conversion

libswscale is a dedicated library for image scaling and pixel format conversion. It provides:

  • SwsContext: The main opaque context object that holds scaling state, filter coefficients, and function pointers for optimized operations
  • Scaling algorithms: Multiple quality levels from fast bilinear to high-quality bicubic and Lanczos filters
  • Format support: Handles conversion between 100+ pixel formats (YUV, RGB, packed, planar, etc.)
  • Dithering: Optional dithering modes (Bayer, error diffusion) for quality reduction
  • Slice-based processing: Supports streaming input via sws_send_slice() and sws_receive_slice() for memory efficiency

Key functions:

// Create and initialize a scaling context
SwsContext *sws_getContext(int srcW, int srcH, enum AVPixelFormat srcFormat,
                           int dstW, int dstH, enum AVPixelFormat dstFormat,
                           int flags, SwsFilter *srcFilter,
                           SwsFilter *dstFilter, const double *param);

// Scale a complete frame
int sws_scale_frame(SwsContext *sws, AVFrame *dst, const AVFrame *src);

// Stream-based API for slice processing
int sws_send_slice(SwsContext *sws, unsigned int slice_start, unsigned int slice_height);
int sws_receive_slice(SwsContext *sws, unsigned int slice_start, unsigned int slice_height);

Video Filter Architecture

libavfilter provides a modular filter graph system where filters are connected via pads and links. The scale filter (vf_scale.c) is the primary video scaling filter that wraps libswscale.

Filter graph flow:

Loading diagram...

The scale filter supports:

  • Dynamic expressions: Width and height can be specified as expressions (e.g., w=iw/2:h=ih/2)
  • Aspect ratio preservation: Options to maintain aspect ratio or force divisibility
  • Colorspace metadata: Handles color matrix, primaries, transfer functions, and chroma location
  • Scale2ref variant: Scales one input based on another input's dimensions

Processing Pipeline

When a frame flows through the scale filter:

  1. config_props() negotiates output dimensions and initializes the SwsContext
  2. filter_frame() receives input frames and calls sws_scale_frame()
  3. libswscale performs the actual scaling using optimized SIMD code paths
  4. Output frame is pushed to the next filter in the graph

The pipeline supports both legacy slice-based API and modern frame-based API for flexibility.

Performance Considerations

  • Caching: SwsContext can be reused across multiple frames with identical parameters via sws_getCachedContext()
  • Threading: libswscale supports multi-threaded scaling via slice parallelization
  • SIMD optimization: Architecture-specific implementations (x86, ARM, RISC-V) for fast operations
  • Cascaded scaling: Large downscaling factors are split into multiple passes to maintain quality

FFmpeg Command-Line Tool & Transcoding

Relevant Files
  • fftools/ffmpeg.c – Main entry point and transcoding loop
  • fftools/ffmpeg.h – Core data structures and enums
  • fftools/ffmpeg_opt.c – Command-line option parsing
  • fftools/ffmpeg_dec.c – Decoder implementation
  • fftools/ffmpeg_enc.c – Encoder implementation
  • fftools/ffmpeg_demux.c – Demuxer and input handling
  • fftools/ffmpeg_mux.c – Muxer and output handling
  • fftools/ffmpeg_filter.c – Filter graph management
  • fftools/ffmpeg_sched.c – Scheduler for multi-threaded pipeline

Overview

FFmpeg's command-line tool implements a modular transcoding pipeline that transforms multimedia data through a series of processing stages. The architecture separates concerns into independent components that communicate via queues, enabling efficient multi-threaded processing.

Transcoding Pipeline Architecture

Loading diagram...

Core Components

Demuxer (ffmpeg_demux.c): Reads input files and extracts elementary streams. Each input file gets one demuxer instance that parses the container format and sends encoded packets to decoders or muxers for stream copying.

Decoder (ffmpeg_dec.c): Decompresses encoded packets into raw frames. Handles hardware acceleration, frame rate conversion, and timestamp management. Decoded frames flow to filter graphs or directly to encoders.

Filter Graph (ffmpeg_filter.c): Applies audio/video transformations using libavfilter. Supports simple graphs (single input/output) and complex graphs (multiple inputs/outputs with arbitrary connections). Handles format negotiation and frame buffering.

Encoder (ffmpeg_enc.c): Compresses frames into packets using specified codecs. Manages bitrate control, quality settings, and multi-pass encoding. Outputs packets to muxers.

Muxer (ffmpeg_mux.c): Writes encoded packets into output containers. Handles stream synchronization, metadata, and format-specific requirements.

Data Structures

InputStream – Represents a single input stream with codec parameters, decoder reference, and connected filters.

OutputStream – Represents output stream configuration including encoder, bitrate, and muxer assignment.

InputFile/OutputFile – Container-level structures managing multiple streams and format context.

FilterGraph – Encapsulates filter chain with input/output pads and format negotiation state.

Scheduler & Threading

The Scheduler (ffmpeg_sched.c) coordinates all components using thread-safe queues. Each demuxer, decoder, filter, encoder, and muxer runs in its own thread. The scheduler manages:

  • Task creation and lifecycle
  • Queue-based communication between stages
  • Backpressure handling when buffers fill
  • Synchronization across multiple streams
  • Graceful shutdown and error propagation

Command-Line Processing

Option parsing (ffmpeg_opt.c) builds a configuration from command-line arguments. Stream specifiers (:v, :a, :s) allow per-stream options. The tool supports:

  • Multiple inputs and outputs
  • Stream mapping (-map)
  • Codec selection (-c)
  • Filter graphs (-filter_complex)
  • Hardware acceleration (-hwaccel)
  • Bitrate/quality control (-b, -q)
  • Metadata and chapters

Execution Flow

  1. Parse options and build input/output configuration
  2. Create scheduler and register all components
  3. Start demuxers, decoders, filters, encoders, and muxers as threads
  4. Scheduler coordinates packet/frame flow through queues
  5. Main thread monitors progress and handles user input
  6. On completion, flush remaining data and close all components

SIMD & Architecture-Specific Optimization

Relevant Files
  • libavcodec/x86/ - x86/x64 SIMD implementations (SSE, AVX, AVX-512)
  • libavcodec/arm/ - ARM NEON and ARMv6 optimizations
  • libavcodec/aarch64/ - AArch64 NEON, SVE, and SVE2 implementations
  • libavfilter/x86/ - Filter SIMD optimizations
  • libswscale/x86/ - Scaler SIMD implementations
  • libavutil/cpu.h - CPU feature detection flags
  • libavutil/x86/cpu.c, libavutil/arm/cpu.c, libavutil/aarch64/cpu.c - Runtime CPU detection

FFmpeg uses handwritten SIMD assembly to accelerate performance-critical operations across multiple architectures. Modern compilers cannot efficiently generate SIMD code, so architecture-specific implementations are essential for video and audio processing.

Architecture Support

x86/x64 (NASM Assembly)

The x86 backend supports a progression of instruction sets: MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX, AVX2, and AVX-512. Each codec and filter has multiple implementations targeting different CPU generations. For example, AAC encoding has separate functions for SSE, SSE2, and AVX, with the fastest available version selected at runtime.

ARM/AArch64 (GAS Assembly)

ARM uses NEON (128-bit SIMD) for 32-bit ARMv7 and AArch64. AArch64 additionally supports SVE (Scalable Vector Extension) and SVE2 for variable-width vectors. ARMv6 provides basic optimizations for older devices. Implementations use GAS syntax with intrinsics for clarity.

Other Architectures

RISC-V (RVV), MIPS (MSA), PowerPC (VSX/Altivec), and LoongArch (LSX/LASX) have specialized implementations in their respective directories.

Runtime CPU Detection

FFmpeg detects available CPU features at startup using platform-specific methods:

  • Linux/Unix: CPUID instruction (x86) or /proc/cpuinfo parsing
  • macOS: sysctlbyname() API
  • Windows: IsProcessorFeaturePresent() API
  • ARM Linux: AT_HWCAP auxiliary vector

The av_get_cpu_flags() function returns a bitmask of supported features (e.g., AV_CPU_FLAG_AVX2, AV_CPU_FLAG_NEON). Initialization functions check these flags and assign optimized function pointers.

Implementation Pattern

av_cold void ff_aacenc_dsp_init_x86(AACEncDSPContext *s)
{
    int cpu_flags = av_get_cpu_flags();
    
    if (EXTERNAL_SSE(cpu_flags))
        s->abs_pow34 = ff_abs_pow34_sse;
    
    if (EXTERNAL_AVX_FAST(cpu_flags))
        s->quant_bands = ff_aac_quantize_bands_avx;
}

Each module (codec, filter, scaler) has an _init_x86(), _init_arm(), or _init_aarch64() function that checks CPU flags and populates a DSP context with function pointers. The C code calls through these pointers, ensuring the best available implementation runs transparently.

Key Optimizations

  • Vectorized loops: Process multiple samples/pixels per iteration
  • Intrinsic macros: Portable SIMD abstractions (e.g., HADDD, VBROADCASTI128)
  • Bit-depth variants: Separate 8-bit, 10-bit, and 16-bit implementations
  • Conditional compilation: Features enabled only if compiler supports them

This architecture allows FFmpeg to deliver near-native performance across diverse hardware while maintaining portable C fallbacks for compatibility.

Utilities & Common Infrastructure (libavutil)

Relevant Files
  • libavutil/avutil.h
  • libavutil/buffer.c
  • libavutil/frame.c
  • libavutil/dict.c
  • libavutil/log.c

libavutil is FFmpeg's foundational utility library, providing portable, reusable components shared across all FFmpeg libraries. It abstracts platform-specific functionality and offers essential data structures, memory management, and multimedia utilities.

Core Architecture

libavutil is designed as a modular library where components are independent and can be included selectively. The library is organized into several functional categories:

  • Memory Management - Aligned memory allocation for SIMD operations, reference counting
  • Data Structures - Buffers, frames, dictionaries, FIFOs, trees
  • Cryptography & Hashing - AES, DES, MD5, SHA, HMAC implementations
  • Mathematics - Rational numbers, timestamp rescaling, mathematical utilities
  • String Utilities - Safe string functions, parsing, formatting
  • Logging & Debugging - Hierarchical logging system with categories
  • Hardware Acceleration - GPU context management (CUDA, Vulkan, OpenCL, etc.)

Reference-Counted Buffers (AVBuffer)

The buffer system implements thread-safe reference counting for memory management:

AVBufferRef *ref = av_buffer_alloc(size);
AVBufferRef *ref2 = av_buffer_ref(ref);  // Increment refcount
av_buffer_unref(&ref);                   // Decrement, auto-free when count reaches 0

Key features include writable buffer detection (av_buffer_is_writable()) and copy-on-write semantics. Multiple references can point to different regions of the same underlying buffer, enabling efficient data sharing.

Frames (AVFrame)

AVFrame is the central abstraction for multimedia data, supporting both audio and video:

AVFrame *frame = av_frame_alloc();
frame->format = AV_PIX_FMT_YUV420P;
av_frame_get_buffer(frame, align);  // Allocate data planes
av_frame_unref(frame);              // Release buffers
av_frame_free(&frame);              // Deallocate frame

Frames carry extensive metadata including timestamps, side data (closed captions, motion vectors, HDR info), and hardware acceleration contexts.

Metadata & Dictionaries (AVDictionary)

Simple key-value storage for metadata:

AVDictionary *dict = NULL;
av_dict_set(&dict, "title", "My Video", 0);
AVDictionaryEntry *entry = av_dict_get(dict, "title", NULL, 0);
av_dict_free(&dict);

Note: AVDictionary is deprecated for new code; AVL trees (tree.h) offer better performance.

Logging System

Hierarchical logging with component categories and configurable levels:

av_log_set_level(AV_LOG_DEBUG);
av_log(ctx, AV_LOG_INFO, "Message: %s\n", text);

Supports colored output, timestamps, and per-component filtering through the AVClass system.

Data Flow Diagram

Loading diagram...

Key Design Principles

  1. Modularity - Include only needed headers; no monolithic dependencies
  2. Thread Safety - Reference counting and atomic operations are thread-safe
  3. Platform Abstraction - Handles endianness, alignment, and OS-specific quirks
  4. Performance - SIMD-aligned memory, architecture-specific optimizations
  5. Backward Compatibility - Strict ABI/API stability guarantees within major versions