Milvus Vector Database | Augment Code

Overview

Relevant Files

README.md
cmd/main.go
cmd/milvus/milvus.go
internal/coordinator/mix_coord.go
internal/proxy/proxy.go
internal/datacoord/server.go
internal/querycoordv2/server.go
pkg/util/paramtable

Milvus is a high-performance, open-source vector database designed for AI applications. Written in Go and C++, it efficiently organizes and searches vast amounts of unstructured data like text, images, and multi-modal information at scale.

Key Characteristics

Milvus combines several powerful features that make it ideal for modern AI workloads:

Distributed Architecture: Fully-distributed and Kubernetes-native design enables horizontal scaling across multiple nodes
Hardware Acceleration: Implements CPU/GPU acceleration for best-in-class vector search performance
Multiple Deployment Modes: Supports Cluster mode for production, Standalone mode for single machines, and Milvus Lite for Python quickstart
Real-time Updates: Keeps data fresh with streaming updates while handling billions of vectors
Flexible Multi-tenancy: Supports isolation at database, collection, partition, or partition key levels

System Architecture

Loading diagram...

Core Components

Proxy: Entry point for all client requests. Handles request routing, timestamp assignment, and primary key allocation. Manages task scheduling and communicates with coordinators.

MixCoord: Unified coordinator combining RootCoord, DataCoord, QueryCoord, and StreamingCoord. Simplifies deployment and management in distributed setups.

RootCoord: Manages metadata, handles DDL operations (create/drop collections), allocates IDs and timestamps, and maintains collection schemas.

DataCoord: Organizes DataNodes and manages segment allocation. Handles data ingestion, compaction, and garbage collection.

QueryCoord: Manages QueryNodes and segment distribution. Handles query load balancing and replica management.

DataNode: Processes write operations, transforms WAL to binlog format, and persists data to storage.

QueryNode: Executes search queries, maintains indexes in memory, and handles segment loading from storage.

Data Organization

Data is organized hierarchically: Collections contain Partitions, which contain Segment Groups, which contain Segments. Segments are the finest unit where actual data and indexes are stored in column-based format for optimal SIMD performance.

Vector Index Support

Milvus supports multiple vector index types optimized for different scenarios: HNSW, IVF, FLAT (brute-force), SCANN, and DiskANN, with quantization-based variations and memory-mapped (mmap) support for cost-effective storage.

Getting Started

Install the Python SDK with pip install -U pymilvus. Create a local database with MilvusClient("milvus_demo.db") or connect to a deployed server with credentials. Then create collections, insert data, and perform vector searches with simple API calls.

Architecture & Distributed System Design

Relevant Files

internal/coordinator/mix_coord.go
internal/rootcoord/root_coord.go
internal/datacoord/server.go
internal/querycoordv2/server.go
internal/streamingcoord/server/server.go
internal/proxy/proxy.go
internal/datanode/data_node.go
internal/querynodev2/server.go
internal/streamingnode/server/server.go

Milvus is a distributed vector database with a microservices architecture that separates compute and storage concerns. The system consists of multiple independent components that communicate via gRPC and message queues, enabling horizontal scalability and high availability.

System Components

Coordinators manage cluster state and orchestration:

Root Coordinator (rootcoord): Manages metadata, collection schemas, timestamps (TSO), and ID allocation. Acts as the source of truth for cluster state.
Data Coordinator (datacoord): Organizes data nodes and segment allocations. Manages flush operations, compaction, and garbage collection.
Query Coordinator (querycoordv2): Manages query nodes and segment distribution. Handles load balancing and replica management.
Streaming Coordinator (streamingcoord): Manages streaming nodes and message channel assignments for real-time data ingestion.
Mix Coordinator (mix_coord): Unified interface aggregating all coordinators for simplified client interaction.

Worker Nodes execute operations:

Proxy: Entry point for all client requests. Routes requests to appropriate coordinators, manages metadata caching, and handles task scheduling (DDL, DML, DQL queues).
Data Nodes: Write insert/delete messages to persistent storage (MinIO/S3). Subscribe to DML channels and handle data persistence.
Query Nodes: Load and search segments. Maintain in-memory indexes and handle query execution with replica support.
Streaming Nodes: Manage write-ahead logs (WAL) and message streaming for real-time data ingestion.

Communication Architecture

Loading diagram...

Data Flow Patterns

Write Path: Client → Proxy → Data Coordinator → Data Nodes → Streaming Nodes → Object Storage. Proxy enqueues DML tasks, which are distributed to data nodes via message channels. Data nodes persist to storage and publish statistics.

Read Path: Client → Proxy → Query Coordinator → Query Nodes. Proxy routes search requests to appropriate query nodes based on shard leaders. Query nodes load segments from storage and execute searches in parallel.

Metadata Management: All coordinators use etcd for persistent metadata storage. Proxy maintains an in-memory metadata cache that is invalidated when changes occur, ensuring consistency across the cluster.

Key Design Patterns

Task Scheduling: Proxy uses three task queues (DDL, DML, DQL) with separate goroutine pools. Tasks are enqueued with timestamps and IDs allocated from Root Coordinator, then executed asynchronously with pre/post-execution hooks.

Service Discovery: Components register in etcd with session information. Proxy watches etcd for coordinator and node changes, maintaining up-to-date client connections.

Message Streaming: Milvus supports multiple message queue backends (Pulsar, Kafka, RocksMQ). Components subscribe to channels for asynchronous communication, enabling loose coupling and scalability.

Replica & Load Balancing: Query Coordinator distributes segments across multiple query nodes as replicas. Proxy uses shard client manager to route queries to shard leaders with intelligent load balancing policies.

Vector Search Engine & Indexing

Relevant Files

internal/core/src/index/IndexFactory.h – Factory for creating all index types
internal/core/src/index/VectorIndex.h – Base class for vector indexes
internal/core/src/index/VectorMemIndex.h – In-memory vector index for growing segments
internal/core/src/index/VectorDiskIndex.h – Disk-based vector index for sealed segments
internal/core/src/segcore/SegmentGrowingImpl.h – Growing segment with incremental indexing
internal/core/src/segcore/ChunkedSegmentSealedImpl.h – Sealed segment with loaded indexes
internal/core/src/query/SearchOnIndex.cpp – Query execution on indexes
pkg/util/indexparams/index_params.go – Index parameter configuration

Architecture Overview

Milvus implements a two-tier vector indexing system optimized for different segment lifecycle stages. Growing segments use in-memory indexes (VectorMemIndex) that support incremental updates, while sealed segments use disk-based indexes (VectorDiskIndex) optimized for search performance and memory efficiency.

Loading diagram...

Index Factory & Creation

The IndexFactory singleton manages creation of all index types through a unified interface. It determines the appropriate index implementation based on data type, index type, and metric type. For vector fields, it creates either VectorMemIndex (for growing segments) or VectorDiskIndex (for sealed segments), delegating the actual index implementation to the Knowhere library.

IndexBasePtr CreateVectorIndex(const CreateIndexInfo& info,
                               const FileManagerContext& context);

Growing Segment Indexing

Growing segments maintain VectorMemIndex instances that support incremental data addition. When data is inserted, the index is built or updated via BuildWithDataset() and AddWithDataset() methods. The index is built in chunks based on configurable thresholds, enabling efficient search on recently inserted data without waiting for segment sealing.

Sealed Segment Indexing

Sealed segments load pre-built indexes from disk using VectorDiskIndex. The index is serialized to a BinarySet during segment flushing, then loaded back with memory-mapping support for cost-effective storage. Disk indexes support streaming loads to avoid caching entire indexes locally.

Query Execution

Vector search queries are executed through the SearchOnIndex() function, which:

Prepares a dataset from query vectors
Optionally uses vector iterators for streaming results
Calls the index’s Query() method with search parameters and a bitset for filtering deleted rows

The bitset enables efficient filtering of deleted records without modifying the index.

Supported Index Types

Milvus supports multiple vector index algorithms optimized for different scenarios:

HNSW – Hierarchical Navigable Small World for balanced speed and accuracy
IVF – Inverted File with quantization variants (IVF_FLAT, IVF_PQ, IVF_SQ8)
FLAT – Brute-force search for exact results
SCANN – Scalable Nearest Neighbors with quantization
DiskANN – Disk-based approximate nearest neighbor search

All indexes support metric types: L2, IP (Inner Product), and COSINE for dense vectors, plus MAX_SIM for sparse vectors.

Index Serialization & Loading

Indexes are serialized to binary format and stored in object storage. The serialization process:

Calls Serialize() to produce a BinarySet containing index data
Encodes metadata (index type, metric type, parameters) separately
Stores files in object storage with versioning

Loading reverses this process, with optional memory-mapping for disk indexes to reduce memory footprint.

Data Flow & Write Pipeline

Relevant Files

internal/flushcommon/pipeline - Flow graph pipeline orchestration
internal/flushcommon/writebuffer - Write buffer management and data buffering
internal/flushcommon/syncmgr - Sync manager and task execution
internal/storage/binlog_writer.go - Binlog serialization and writing
internal/proxy/msg_pack.go - Message packing and routing

Data flows through Milvus in a carefully orchestrated pipeline from the proxy through the data node to persistent storage. This section explains how insert and delete messages are buffered, flushed, and written to binlogs.

Flow Graph Pipeline Architecture

The DataSyncService manages a TimeTickedFlowGraph that processes DML messages for each virtual channel. The pipeline consists of sequential nodes that transform and buffer data:

Loading diagram...

Key Pipeline Nodes:

DmInputNode - Receives messages from message streams and packs them between time ticks
DDNode - Filters messages, removes inserts for sealed segments, handles DDL operations
EmbeddingNode (optional) - Processes embedding functions if collection has function definitions
WriteNode - Prepares insert data and buffers it into the write buffer manager
TimeTickNode - Handles checkpoint updates and collection drop signals

Write Buffer Management

The WriteBuffer interface provides abstraction for buffering DML data with segment-level granularity:

BufferData() - Accepts insert and delete messages, organizes them by segment
SealSegments() - Triggers flush of specified segments to sync manager
EvictBuffer() - Evicts buffered data matching sync policies (memory threshold, time-based)
GetCheckpoint() - Returns earliest buffer position for recovery

The BufferManager maintains a concurrent map of WriteBuffers, one per channel. It monitors memory usage and triggers eviction when thresholds are exceeded.

Sync Manager & Task Execution

The SyncManager processes SyncTask objects that serialize buffered data to storage:

Maintains a thread pool with parallelism based on CPU cores
Supports both StorageV1 and StorageV2 formats
Executes BulkPackWriter to serialize insert/delete/stats data into binlogs
Writes binlogs to object storage (S3, MinIO, etc.)
Updates metadata via MetaWriter callbacks

// SyncTask execution flow
task.Execute() {
  writer := NewBulkPackWriter(...)
  inserts, deltas, stats := writer.Write(ctx, pack)
  // Write binlogs to storage
  // Update segment metadata
}

Binlog Format & Storage

BinlogWriter serializes data into binlog files with:

Magic number header for validation
Descriptor event containing collection/partition/segment IDs
Event writers for each field, encoding data in columnar format
Optional encryption via hooks

Binlogs are organized by type: InsertBinlog, DeleteBinlog, StatsBinlog, BM25Binlog. Each segment produces separate binlog files per field, enabling efficient columnar storage and retrieval.

Data Flow Example

Client sends insert request to proxy
Proxy packs messages by partition and routes to data node channels
DmInputNode receives MsgPack, DDNode filters for growing segments
WriteNode buffers data into segment-specific buffers
Memory check triggers eviction when threshold exceeded
SyncManager executes BulkPackWriter to serialize data
Binlogs written to object storage with manifest metadata
Segment metadata updated in data coordinator

Metadata Storage & Catalog

Relevant Files

internal/metastore/catalog.go
internal/metastore/kv/rootcoord/kv_catalog.go
internal/metastore/kv/datacoord/kv_catalog.go
internal/metastore/kv/querycoord/kv_catalog.go
internal/rootcoord/meta_table.go
pkg/kv/kv.go
internal/kv/etcd/etcd_kv.go
internal/kv/tikv/txn_tikv.go
pkg/kv/rocksdb/rocksdb_kv.go

Overview

Milvus uses a hierarchical metadata storage architecture to persist collection schemas, partitions, segments, indexes, and system state. The system separates concerns into Catalog interfaces (coordinator-specific contracts) and KV backends (pluggable storage implementations), enabling flexibility and scalability.

Catalog Layer

The catalog layer defines coordinator-specific metadata operations through interfaces:

RootCoordCatalog - Manages databases, collections, partitions, aliases, and RBAC metadata
DataCoordCatalog - Handles segment info, binlogs, indexes, and compaction state
QueryCoordCatalog - Stores collection load info, replicas, and resource groups
StreamingCoordCatalog - Manages streaming channel configurations and replication state

Each catalog is implemented as a KV-based adapter (e.g., internal/metastore/kv/rootcoord/kv_catalog.go) that translates high-level operations into key-value store calls.

Key-Value Store Hierarchy

Loading diagram...

KV Interface Hierarchy:

BaseKV - Core operations: Load, Save, Remove, MultiLoad, MultiSave, LoadWithPrefix
TxnKV - Extends BaseKV with transactional operations: MultiSaveAndRemove with predicates
MetaKv - Extends TxnKV for metadata: CompareVersionAndSwap, WalkWithPrefix, GetPath
WatchKV - Extends MetaKv with watch capabilities (etcd-specific)
SnapShotKV - Timestamp-aware operations for versioned snapshots

Storage Backends

etcdKV (internal/kv/etcd/etcd_kv.go)

Distributed, highly available metadata store
Supports transactions, watches, and leases
Default for production deployments
Implements WatchKV interface

TiKV (internal/kv/tikv/txn_tikv.go)

Distributed transactional KV store
Strong consistency guarantees
Implements MetaKv interface

RocksDB (pkg/kv/rocksdb/rocksdb_kv.go)

Embedded local storage
Used for standalone deployments
Implements BaseKV interface

MemoryKV (internal/kv/mem/mem_kv.go)

In-memory B-tree storage
Used for testing only
Implements TxnKV interface

Metadata Organization

RootCoord metadata uses hierarchical key prefixes:

rootcoord-meta/
├── database/{dbID}
├── collection/{collectionID}
├── partition/{collectionID}/{partitionID}
├── field/{collectionID}/{fieldID}
├── alias/{aliasName}
└── rbac/{role|user|grant}

DataCoord organizes segment metadata:

datacoord-meta/
├── segment/{collectionID}/{segmentID}
├── binlog/{segmentID}/{fieldID}
├── index/{collectionID}/{indexID}
└── compaction/{taskID}

Transactional Guarantees

The catalog layer ensures consistency through:

Atomic Multi-Operations - MultiSaveAndRemove bundles saves and deletes in single transactions
Conditional Updates - CompareVersionAndSwap prevents concurrent modification conflicts
Predicate Filtering - Optional predicates validate state before committing changes
Timestamp Versioning - SnapShotKV tracks logical timestamps for point-in-time queries

Initialization Flow

Loading diagram...

Coordinators reload metadata on startup via reloadFromKV(), reconstructing in-memory caches from persistent storage to ensure consistency after restarts.

Query Execution & Search Pipeline

Relevant Files

internal/querynodev2/delegator/delegator.go
internal/querynodev2/tasks/search_task.go
internal/querynodev2/tasks/query_task.go
internal/querynodev2/segments/search.go
internal/querynodev2/segments/retrieve.go
internal/querynodev2/pipeline/pipeline.go
internal/parser/planparserv2/plan_parser_v2.go
internal/proxy/search_reduce_util.go

Query execution in Milvus follows a multi-stage pipeline: request parsing, task scheduling, segment execution, and result reduction. The system separates concerns between the proxy layer (which coordinates), query nodes (which execute), and delegators (which manage shard-level operations).

Request Flow Overview

Loading diagram...

Delegator & Task Organization

The ShardDelegator is the core orchestrator for a single shard. It receives search/query requests and:

Segment Pruning – Filters segments using clustering key statistics to skip irrelevant data
Parameter Optimization – Adjusts search parameters based on segment count
SubTask Organization – Groups segments by worker node and creates subtasks via organizeSubTask()
Execution – Sends subtasks to workers and collects results

Key methods: Search(), Query(), QueryStream() all follow this pattern.

Search Task Execution

SearchTask and StreamingSearchTask implement the scheduler.Task interface:

PreExecute() – Validates collection, parses search plan, prepares placeholder groups
Execute() – Calls segments.SearchHistorical() or segments.SearchStreaming() depending on scope
PostExecute() – Reduces results from multiple segments using ReduceHelper (handles group-by, ranking, offset/limit)

For streaming search, results are reduced incrementally as segments complete, improving latency.

Query Task Execution

QueryTask follows a similar pattern:

Creates a RetrievePlan from the serialized expression plan
Calls segments.Retrieve() to fetch matching rows from sealed and growing segments
Merges results respecting MVCC timestamp and consistency level

Segment Search & Retrieve

The segments package provides low-level operations:

SearchHistorical() – Searches sealed segments in parallel
SearchStreaming() – Searches growing segments
SearchHistoricalStreamly() – Streaming search with callback-based reduction
Retrieve() – Fetches rows matching predicates
RetrieveStream() – Streaming retrieve for large result sets

Each segment is pinned during execution to prevent eviction, then unpinned after results are collected.

Result Reduction

Results from multiple segments are merged via:

ReduceHelper (C++) – Merges search results, applies ranking, handles group-by
reduceSearchResult() (Proxy) – Combines results from multiple shards, applies offset/limit
Reducer (Query) – Merges retrieve results, deduplicates, applies output field filtering

Reduction respects metric type (L2, IP, COSINE), group-by field, and pagination parameters.

Data Pipeline (Ingestion Side)

The Pipeline manages DML message flow through nodes:

FilterNode – Validates and filters insert/delete messages
EmbeddingNode – Generates embeddings for vector fields (optional)
InsertNode – Buffers inserts into growing segments
DeleteNode – Applies deletes and updates tSafe timestamp

Messages flow sequentially through nodes; each node processes messages from its input channel and forwards to the next.

Optimization Techniques

Segment Pruning – Uses clustering key statistics to skip segments that cannot match predicates
Parameter Optimization – Adjusts topk/nq based on segment count to balance memory and latency
Lazy Loading – Disk-cached segments are loaded on-demand during search
Stream Reduction – Reduces results incrementally instead of buffering all segment results
Task Merging – Combines multiple search tasks with compatible parameters to reduce overhead

Client SDKs & API Layer

Relevant Files

client/milvusclient - Go SDK client implementation
client/entity - Schema and collection metadata types
client/column - Column-based data representation
client/row - Row-based data representation
client/bulkwriter - Bulk import functionality
internal/distributed/proxy/httpserver - HTTP API handlers
internal/http/server.go - HTTP server setup

Overview

Milvus provides multiple client interfaces for interacting with the vector database: a Go SDK for programmatic access and REST APIs (v1 and v2) for HTTP-based operations. Both layers communicate with the Proxy component via gRPC, which orchestrates requests to the distributed system.

Go SDK Architecture

The Go SDK (client/milvusclient) is the primary programmatic interface. The Client struct manages a gRPC connection to the Proxy and provides methods for all database operations:

type Client struct {
    conn    *grpc.ClientConn
    service milvuspb.MilvusServiceClient
    config  *ClientConfig
    collCache *CollectionCache
}

Key initialization:

Clients connect via milvusclient.New(ctx, &ClientConfig{Address: "..."})
Supports authentication (username/password or API key)
Maintains a collection schema cache to avoid repeated metadata lookups
Implements automatic retry logic for schema mismatches

Data Representation

The SDK supports two data models:

Column-based (client/column) - Efficient for bulk operations
- Separate arrays per field: WithInt64Column("id", []int64{...})
- Supports all types: scalars, vectors, JSON, arrays, sparse vectors
- Used by NewColumnBasedInsertOption()
Row-based (client/row) - Intuitive for structured data
- Individual row structs with field tags
- Used by NewRowBasedInsertOption()

Core Operations

Write Operations:

Insert() - Add new records, returns generated IDs
Upsert() - Insert or update by primary key
Delete() - Remove records by filter expression

Read Operations:

Search() - Vector similarity search with optional filters
Query() - Retrieve records by filter (no vector search)
Get() - Fetch specific records by primary key

Collection Management:

CreateCollection() - Define schema and create collection
DescribeCollection() - Retrieve collection metadata
DropCollection() - Delete collection and data

REST API Layer

HTTP endpoints are exposed via Gin framework in two versions:

v1 API (/v1/...) - Simple REST interface:

POST /v1/vector/collections/insert - Insert data
POST /v1/vector/search - Search vectors
POST /v1/vector/query - Query with filters

v2 API (/v2/vectordb/...) - Structured RESTful design:

POST /v2/vectordb/collections/list - List collections
POST /v2/vectordb/data/insert - Insert data
POST /v2/vectordb/search - Search with ranking

Both REST versions translate HTTP requests to gRPC calls via the Proxy. Request handlers in internal/distributed/proxy/httpserver parse JSON payloads, validate inputs, and marshal responses.

Request Flow

Loading diagram...

Error Handling & Retries

Schema cache invalidation triggers automatic retries on schema mismatch errors
gRPC middleware implements exponential backoff for transient failures
All operations return structured error responses with error codes

Bulk Operations

The bulkwriter module enables efficient large-scale data import:

Supports local file writing and remote S3 uploads
Handles data serialization to Parquet format
Integrates with Milvus bulk import service for batch ingestion

Operations, Deployment & Configuration

Relevant Files

Makefile - Build targets and compilation commands
build/ - Docker-based build environment and scripts
scripts/ - Deployment and startup scripts
deployments/ - Deployment configurations for various environments
configs/milvus.yaml - Main configuration file
pkg/config/ - Configuration management system
docker-compose.yml - Development environment setup

Build System

Milvus uses a multi-language build pipeline combining C++ (core) and Go (distributed components). The build process is orchestrated through a Makefile and containerized via Docker for consistency.

Key build targets:

make milvus - Build the main binary with C++ core and Go components
make milvus-gpu - Build GPU-accelerated version
make unittest - Run all unit tests (Go and C++)
make verifiers - Run code quality checks

The build system supports PGO (Profile-Guided Optimization) and multiple SIMD instruction sets (SSE4.2, AVX, AVX2, AVX512). Build artifacts are cached in .docker/ volumes to accelerate incremental builds.

Deployment Strategies

Milvus supports multiple deployment topologies:

Standalone Mode - Single-process deployment ideal for development and small-scale deployments. Uses embedded etcd and local storage by default.

Cluster Mode - Distributed architecture with separate coordinator, data, query, and index nodes. Requires external etcd, MinIO/S3, and message queue (Pulsar, Kafka, or Woodpecker).

Docker Deployments - Pre-configured in deployments/docker/:

standalone/ - Single-container setup with dependencies
cluster-distributed-deployment/ - Ansible-based multi-node cluster
gpu/ - GPU-enabled variant
dev/ - Development environment with all services

Configuration Management

Milvus implements a hierarchical configuration system with multiple sources and priority levels:

Priority (highest to lowest):
1. Runtime overlays (SetConfig API)
2. Environment variables (MILVUS_* prefix)
3. etcd (remote configuration center)
4. YAML files (configs/milvus.yaml)

Configuration sources in pkg/config/:

FileSource - Loads YAML configuration files with optional refresh intervals
EnvSource - Reads environment variables with MILVUS_ prefix support
EtcdSource - Connects to etcd for dynamic configuration updates

Key configuration sections in milvus.yaml:

etcd - Metadata storage and service discovery endpoints
minio - Object storage for vector data persistence
mq - Message queue selection (rocksmq, Pulsar, Kafka, Woodpecker)
rootCoord, dataNode, queryNode - Component-specific tuning
common - Thread pools, index settings, entity expiration

Development Environment

The docker-compose.yml provides an integrated dev environment with:

Builder containers - CPU and GPU variants with pre-installed dependencies
Dependencies - etcd, MinIO, Pulsar, Azurite (Azure), fake-gcs-server
Caching - ccache and Go module caching for fast rebuilds
Jaeger integration - Distributed tracing support

Start the dev environment with ./scripts/devcontainer.sh up. This creates isolated volumes for build artifacts, Go modules, and compiler cache, enabling efficient incremental development.

Deployment Utilities

Key scripts in deployments/:

upgrade/rollingUpdate.sh - Zero-downtime rolling updates for cluster mode
migrate-meta/migrate.sh - Metadata migration between storage backends
export-log/export-milvus-log.sh - Log collection and export
windows/ - Batch scripts for Windows standalone deployment

CI/CD Integration - Jenkins pipelines in ci/jenkins/ support:

Helm-based deployments (standalone and cluster)
Version upgrades with image repository configuration
Multi-mode testing (Pulsar, Kafka, authentication variants)