Presto: Distributed SQL Query Engine | Augment Code

Overview

Relevant Files

README.md
ARCHITECTURE.md
FUNCTIONS.md

Presto is a distributed SQL query engine designed for fast analytic queries against data sources ranging from gigabytes to petabytes. Originally developed by Facebook, it is now governed by the Presto Foundation as an open-source project under the Apache License 2.0.

Core Purpose

Presto enables efficient, high-speed data processing for analytics and batch workloads at scale. It provides a unified query system that can access and process data stored in various formats and storage systems through a highly customizable plugin infrastructure.

Key Characteristics

Architecture: Presto follows a distributed system model with a coordinator and multiple worker nodes. The coordinator manages query planning and execution, while workers process data in parallel.

Performance: Designed for low-latency interactive workloads, ad-hoc analytics, and large-scale batch processing. The system emphasizes vertical integration and modular design to achieve optimal performance.

Connectivity: Supports numerous data sources through pluggable connectors including Hive, Cassandra, PostgreSQL, MySQL, Elasticsearch, Kafka, and many others. This flexibility allows Presto to serve as a unified query layer across heterogeneous data infrastructure.

User Experience: Uses familiar ANSI SQL syntax with sensible defaults and an optimizer that produces efficient query plans, making it accessible to users without specialized knowledge.

Technology Stack

Java 17 (primary implementation)
Maven (build system)
C++ (Presto native execution via Velox)
React/JSX (Presto Console UI)

Project Structure

The repository contains over 100 modules organized by function:

Core: presto-main, presto-server, presto-spi (Service Provider Interface)
Connectors: presto-hive, presto-postgresql, presto-cassandra, etc.
Execution: presto-parser, presto-expressions, presto-bytecode
Native: presto-native-execution (C++ implementation using Velox)
Tools: presto-cli, presto-ui, presto-verifier

Long-Term Vision

Presto is moving toward a native evaluation engine using Velox for improved performance, state-of-the-art query optimization, and greater modularity through standardized components like Arrow and Substrait for interoperability with other data infrastructure.

Architecture & Query Execution Pipeline

Relevant Files

presto-parser/src/main/java/com/facebook/presto/sql/parser/SqlParser.java
presto-main-base/src/main/java/com/facebook/presto/sql/planner/LogicalPlanner.java
presto-main-base/src/main/java/com/facebook/presto/sql/planner/QueryPlanner.java
presto-main-base/src/main/java/com/facebook/presto/execution/SqlQueryExecution.java
presto-main/src/main/java/com/facebook/presto/execution/SqlQueryManager.java

Presto's query execution pipeline transforms SQL text into distributed execution plans through four distinct phases: parsing, semantic analysis, logical planning, and optimization. Understanding this flow is essential for extending the query engine or debugging query behavior.

Phase 1: SQL Parsing

The SqlParser class converts raw SQL strings into an Abstract Syntax Tree (AST) using ANTLR4. The parser:

Uses a lexer (SqlBaseLexer) to tokenize input and a parser (SqlBaseParser) to build the parse tree
Attempts parsing in fast SLL mode first, then falls back to LL mode if needed
Converts the ANTLR parse tree to Presto's AST via AstBuilder visitor pattern
Handles three input types: statements, expressions, and routine bodies

public Statement createStatement(String sql, ParsingOptions parsingOptions) {
    return (Statement) invokeParser("statement", sql, 
        SqlBaseParser::singleStatement, parsingOptions);
}

Phase 2: Semantic Analysis

The Analyzer class validates the AST against catalog metadata and resolves references. It:

Validates table and column existence
Resolves function signatures and type coercion
Builds scope information for name resolution
Produces an Analysis object containing semantic metadata

Phase 3: Logical Planning

The LogicalPlanner converts the analyzed AST into a logical plan tree of PlanNode objects. For queries, it delegates to QueryPlanner, which:

Builds table scans from FROM clauses
Applies filters (WHERE, HAVING)
Constructs aggregations and window functions
Adds projections, sorting, and limits

public RelationPlan plan(QuerySpecification node) {
    PlanBuilder builder = planFrom(node);
    builder = filter(builder, analysis.getWhere(node), node);
    builder = aggregate(builder, node);
    builder = window(builder, node);
    // ... additional transformations
}

Phase 4: Optimization & Execution

The Optimizer applies cost-based transformations to the logical plan:

Pushes filters and projections down the tree
Eliminates redundant operations
Chooses optimal join orders
Validates the final plan with PlanChecker

The optimized plan is then fragmented into distributed tasks and scheduled across worker nodes.

Loading diagram...

Query Execution Lifecycle

SqlQueryManager orchestrates the full lifecycle. When a query arrives:

createQuery() registers the SqlQueryExecution instance
start() triggers parsing and planning in doCreateLogicalPlanAndOptimize()
The scheduler creates execution tasks from the optimized plan
Results stream back through output buffers
queryCompletedEvent() fires when done, triggering cleanup

Each phase is instrumented with runtime metrics (e.g., LOGICAL_PLANNER_TIME_NANOS, OPTIMIZER_TIME_NANOS) for performance monitoring.

Connector Framework & Data Sources

Relevant Files

presto-spi/src/main/java/com/facebook/presto/spi/connector/Connector.java
presto-spi/src/main/java/com/facebook/presto/spi/connector/ConnectorMetadata.java
presto-spi/src/main/java/com/facebook/presto/spi/Plugin.java
presto-spi/src/main/java/com/facebook/presto/spi/connector/ConnectorFactory.java
presto-hive/src/main/java/com/facebook/presto/hive/HivePlugin.java
presto-iceberg/src/main/java/com/facebook/presto/iceberg/IcebergPlugin.java
presto-kafka/src/main/java/com/facebook/presto/kafka/KafkaPlugin.java

Overview

Presto's connector framework enables integration with diverse data sources through a standardized plugin architecture. Connectors act as adapters that translate Presto's query execution model into data source-specific operations. The framework uses a plugin-based design where each data source (Hive, Iceberg, Kafka, etc.) implements the connector SPI (Service Provider Interface) to provide metadata, data access, and query optimization capabilities.

Architecture

Loading diagram...

Plugin Registration

Plugins are the entry point for connector registration. Each plugin implements the Plugin interface and provides connector factories via getConnectorFactories(). The PluginManager discovers and loads plugins, registering their connector factories with the ConnectorManager.

public class HivePlugin implements Plugin {
    @Override
    public Iterable<ConnectorFactory> getConnectorFactories() {
        return ImmutableList.of(
            new HiveConnectorFactory(name, classLoader, metastore)
        );
    }
}

Connector Factory Pattern

ConnectorFactory creates connector instances for specific catalogs. Each factory implements three key methods:

getName() - Returns the connector type identifier (e.g., "hive", "iceberg")
getHandleResolver() - Provides serialization/deserialization for connector-specific handles
create() - Instantiates a Connector with catalog configuration and context

The factory pattern allows connectors to be instantiated lazily and configured per catalog.

Core Connector Interface

The Connector interface defines the contract for data source integration:

Transaction Management: beginTransaction(), commit(), rollback() - Manages transaction lifecycle and isolation levels
Metadata Access: getMetadata() - Returns ConnectorMetadata for schema/table operations
Data Access: getSplitManager(), getPageSourceProvider() - Handles data retrieval
Write Operations: getPageSinkProvider() - Supports INSERT/UPDATE/DELETE
Query Optimization: getConnectorPlanOptimizerProvider() - Enables connector-specific optimizations
Properties: getSessionProperties(), getTableProperties() - Defines connector-specific configuration

ConnectorMetadata

ConnectorMetadata is the primary interface for metadata operations. Key responsibilities include:

Schema Discovery: listSchemaNames(), schemaExists() - Enumerate available schemas
Table Resolution: getTableHandle(), getTableMetadata() - Resolve table references
Column Information: getColumnMetadata(), listTableColumns() - Retrieve column details
Constraint Pushdown: getTableLayoutForConstraint() - Optimize data access with predicates
DML Operations: beginInsert(), metadataDelete() - Support data modifications
Statistics: getTableStatistics() - Provide cardinality and distribution info

Data Source Examples

Hive Connector - Integrates with Hive metastore and HDFS. Supports full ACID transactions, partitioning, and complex data types. Uses HiveConnectorFactory to instantiate HiveConnector with metastore configuration.

Iceberg Connector - Provides ACID table format support with time-travel queries. Implements SERIALIZABLE isolation level and supports schema evolution. Manages transactions through IcebergTransactionManager.

Kafka Connector - Read-only streaming connector for Kafka topics. Implements READ_COMMITTED isolation and provides record-based data access through ConnectorRecordSetProvider.

Transaction Handling

Connectors manage transactions through ConnectorTransactionHandle. Each transaction is isolated and single-threaded for metadata access. Connectors specify supported isolation levels (READ_UNCOMMITTED, READ_COMMITTED, SERIALIZABLE) and validate requests accordingly.

@Override
public ConnectorTransactionHandle beginTransaction(
    IsolationLevel isolationLevel, boolean readOnly) {
    checkConnectorSupports(READ_COMMITTED, isolationLevel);
    return KafkaTransactionHandle.INSTANCE;
}

Capabilities and Extensions

Connectors declare capabilities via getCapabilities() to indicate support for features like rewindable splits, page sink commits, and constraint types. Additional extensions include system tables, procedures, table functions, and custom session properties for fine-grained control.

Plugin System & SPI

Relevant Files

presto-spi/src/main/java/com/facebook/presto/spi/Plugin.java
presto-spi/src/main/java/com/facebook/presto/spi/CoordinatorPlugin.java
presto-main-base/src/main/java/com/facebook/presto/server/PluginManager.java
presto-main-base/src/main/java/com/facebook/presto/server/PluginManagerUtil.java

Presto's plugin system enables extensibility through a Service Provider Interface (SPI) that allows third-party developers to add connectors, functions, security controls, and other components without modifying core code. The system uses Java's ServiceLoader mechanism combined with custom classloading to isolate plugin dependencies.

Plugin Interfaces

The Plugin interface is the primary extension point, offering default methods for registering various components:

Connectors: getConnectorFactories() registers data source connectors
Types & Functions: getTypes(), getParametricTypes(), getFunctions() add custom types and functions
Security: getSystemAccessControlFactories(), getPasswordAuthenticatorFactories() provide authentication and authorization
Event Listeners: getEventListenerFactories() enable query lifecycle monitoring
Resource Management: getResourceGroupConfigurationManagerFactories(), getSessionPropertyConfigurationManagerFactories()
Advanced Features: TTL providers, query prerequisites, tracer providers, and more

The CoordinatorPlugin interface is a newer, coordinator-specific extension point for features like function namespace managers, plan checkers, and expression optimizers.

Plugin Loading Mechanism

Loading diagram...

The PluginManager orchestrates plugin loading through PluginManagerUtil.loadPlugins(). Plugins can be specified as:

Directory paths: Scanned for JAR files
POM files: Resolved via Maven artifact resolver
Maven coordinates: Resolved from configured repositories

Each plugin gets its own URLClassLoader with a whitelist of SPI packages (e.g., com.facebook.presto.spi.*, Jackson, Airlift) to prevent classloader conflicts.

Plugin Installation

Once loaded, plugins are installed via installPlugin() and installCoordinatorPlugin() methods. The installation process iterates through each component type and registers them with appropriate managers:

for (ConnectorFactory factory : plugin.getConnectorFactories()) {
    connectorManager.addConnectorFactory(factory);
}
for (Type type : plugin.getTypes()) {
    metadata.getFunctionAndTypeManager().addType(type);
}

This pattern ensures plugins can contribute multiple component types in a single installation pass.

Key Design Patterns

Isolation: Each plugin runs in its own classloader, preventing dependency conflicts between plugins and the core system.

Lazy Registration: Components are registered only when needed, allowing plugins to define factories that create instances on-demand.

Extensibility: New plugin types (like CoordinatorPlugin) can be added without breaking existing plugins, as they use default methods returning empty collections.

Service Discovery: Java's ServiceLoader automatically discovers plugin implementations via META-INF/services/ files, enabling zero-configuration deployment.

Coordinator & Server Components

Relevant Files

presto-main/src/main/java/com/facebook/presto/server/CoordinatorModule.java
presto-main/src/main/java/com/facebook/presto/server/protocol/QueuedStatementResource.java
presto-main/src/main/java/com/facebook/presto/server/protocol/ExecutingStatementResource.java
presto-main/src/main/java/com/facebook/presto/server/QueryResource.java
presto-main-base/src/main/java/com/facebook/presto/Session.java
presto-main-base/src/main/java/com/facebook/presto/dispatcher/DispatchManager.java

The Coordinator is the central hub of a Presto cluster, responsible for query submission, parsing, planning, and execution orchestration. The server components expose HTTP endpoints that clients use to submit and monitor queries.

Query Submission Flow

Presto uses a lazy execution model where query submission returns immediately with a placeholder, and execution begins only when the client polls for results. This two-stage protocol involves two main REST resources:

QueuedStatementResource (POST /v1/statement) accepts SQL queries and returns a QueryResults object with a next URI. The resource creates a Query wrapper object that encapsulates the statement, session context, and dispatch manager reference. The query is not immediately executed; instead, it waits for the client to request results.

ExecutingStatementResource (GET /v1/statement/executing/{queryId}/{token}) handles result polling once a query has been dispatched. It waits asynchronously for query results and returns them in batches, supporting compression and binary serialization options.

Dispatcher & Query Lifecycle

The DispatchManager orchestrates the transition from queued to executing state. When a client polls for results, the dispatcher:

Parses and validates the SQL statement
Selects a resource group based on user, source, and query type
Creates a DispatchQuery (local execution wrapper) with a QueryStateMachine
Transitions the query through states: QUEUED → WAITING_FOR_RESOURCES → DISPATCHED → RUNNING
Submits the query to the ResourceGroupManager for queuing and execution

The LocalDispatchQuery implementation manages the state machine and coordinates with the QueryManager to begin actual execution.

Session Management

The Session object encapsulates all query context: user identity, catalog/schema, system properties, prepared statements, and transaction state. It is created from HttpRequestSessionContext during query submission and passed through the entire execution pipeline. Session properties can be connector-specific or system-wide, allowing fine-grained control over query behavior.

Coordinator Module Bindings

The CoordinatorModule (Guice configuration) wires together all coordinator components:

Statement resources (QueuedStatementResource, ExecutingStatementResource)
Query manager and dispatcher (QueryManager, DispatchManager, DispatchExecutor)
Resource group management and rate limiting
Memory management and failure detection
Query monitoring and statistics collection

Loading diagram...

Key Responsibilities

Query Lifecycle: From submission through completion, tracking state transitions and failures
Resource Management: Enforcing resource group limits, memory constraints, and concurrency controls
Session Context: Maintaining user identity, properties, and transaction state across execution
Async Protocol: Non-blocking HTTP endpoints using futures for scalable query handling
Monitoring: Recording query events, performance metrics, and operator statistics

Presto Native Execution & Velox

Relevant Files

presto-native-execution/README.md
presto-native-execution/presto_cpp/main/PrestoServer.cpp
presto-native-execution/presto_cpp/main/PrestoMain.cpp
presto-native-execution/presto_cpp/main/TaskManager.cpp
presto-native-execution/presto_cpp/main/PrestoTask.cpp
presto-native-execution/presto_cpp/main/connectors/Registration.cpp

Prestissimo is the C++ native worker implementation of Presto that uses Velox as its execution engine. It implements the Presto Worker REST API to enable high-performance query execution by replacing the Java-based worker with optimized C++ code.

Architecture Overview

Loading diagram...

Core Components

PrestoServer initializes the native worker, managing HTTP endpoints, memory allocation, and Velox registration. It sets up thread pools for drivers, HTTP operations, and spilling, while registering connectors (Hive, Iceberg, TPC-H, TPC-DS) and functions.

TaskManager handles task lifecycle: creation, updates, and cancellation. It receives task fragments from the coordinator, creates Velox tasks, manages task queuing when the server is overloaded, and tracks task statistics. Each task is wrapped in a PrestoTask structure that bridges Presto protocol with Velox execution.

PrestoTask wraps a Velox exec::Task and maintains Presto-specific state including task status, statistics, and lifecycle information. It translates between Presto and Velox task states (e.g., Presto's "Planned" state maps to Velox's pre-started state).

Query Execution Flow

Task Creation: Coordinator sends a TaskUpdateRequest with the query plan fragment and source metadata
Plan Conversion: Presto plan is converted to Velox plan via PrestoToVeloxQueryPlan
Task Initialization: Velox task is created with the converted plan and memory pool
Execution: Task starts with configurable driver threads and concurrent lifespans
Result Buffering: Output pages are serialized and buffered for coordinator retrieval
Status Reporting: Task statistics are periodically updated and sent back to coordinator

Connector Integration

Connectors are registered through registerConnectors(), which creates adapter instances for each catalog type. PrestoToVeloxConnector implementations translate Presto protocol objects (splits, column handles, table handles) to Velox equivalents. Supported connectors include Hive, Iceberg, Arrow Flight, TPC-H, and TPC-DS.

Memory Management

Velox memory is initialized with a configurable capacity (default: system memory in GB). The system uses a SharedArbitrator for memory arbitration across queries and an optional SsdCache for spilling. Memory pools are hierarchically organized per query and task.

Key Features

Parallel Execution: Multiple drivers per task with configurable concurrency
Task Queuing: Automatic queuing when server is overloaded
Metrics Collection: Optional Prometheus integration for runtime metrics
GPU Support: Optional cuDF integration for GPU-accelerated operations
Flexible Storage: Support for local, HDFS, S3, GCS, and ABFS file systems

Testing & Benchmarking Infrastructure

Relevant Files

presto-product-tests/README.md
presto-tests/src/main/java/com/facebook/presto/tests/AbstractTestQueryFramework.java
presto-benchmark/src/main/java/com/facebook/presto/benchmark/BenchmarkSuite.java
presto-testng-services/src/main/java/com/facebook/presto/testng/services/LogTestDurationListener.java
presto-benchmark-runner/src/main/java/com/facebook/presto/benchmark/framework/BenchmarkRunner.java

Presto employs a multi-layered testing strategy combining unit tests, integration tests, product tests, and performance benchmarks. This infrastructure ensures correctness across all execution modes and validates end-to-end functionality.

Test Layers

Unit Tests run as part of the standard Maven build and test individual components in isolation. Integration Tests use AbstractTestQueryFramework and DistributedQueryRunner to validate query execution against in-memory or distributed clusters. Product Tests exercise user-facing interfaces like presto-cli against real Hadoop and Presto deployments using Docker containers and the Tempto test harness.

Product Testing with Docker

Product tests are containerized for reproducibility and isolation. The presto-product-tests/bin/run_on_docker.sh script orchestrates Docker Compose to spin up Hadoop, Presto coordinators, and workers. Tests run in a separate JVM and support multiple profiles:

singlenode - Single Presto instance with pseudo-distributed Hadoop
multinode - Distributed Presto with multiple workers
multinode-tls - TLS-encrypted coordinator and worker communication
singlenode-kerberos-hdfs-impersonation - Kerberized Hadoop with user impersonation
singlenode-ldap - LDAP authentication testing

Tests are organized into groups (e.g., string_functions, authorization, hdfs_impersonation) and can be run selectively with -g or -t flags.

Benchmarking Framework

The presto-benchmark module contains hand-written and SQL-based benchmarks for performance regression detection. BenchmarkSuite aggregates benchmarks covering aggregations, joins, filtering, and TPC-H queries. The presto-benchmark-runner provides a CLI interface with pluggable event clients for metrics collection.

presto-benchto-benchmarks integrates with Benchto for large-scale performance testing using TPC-H and TPC-DS datasets. Benchmarks are defined in YAML with configurable runs, prewarm iterations, and session properties.

Test Monitoring

LogTestDurationListener tracks test execution times and detects hangs. It logs individual tests exceeding 30 seconds, test classes exceeding 1 minute, and global idle periods exceeding 8 minutes. Thread dumps are captured when hangs are detected, aiding in debugging stuck tests.

// Example: Creating a distributed query runner for integration tests
DistributedQueryRunner queryRunner = new DistributedQueryRunner.Builder(session)
    .setNodeCount(3)
    .setExtraProperties(extraProperties)
    .build();
MaterializedResult result = queryRunner.execute("SELECT * FROM table");

Running Tests

Execute unit tests with Maven: ./mvnw test. Run product tests with: presto-product-tests/bin/run_on_docker.sh multinode -x quarantine,big_query. Run benchmarks with: java -jar presto-benchmark-runner-*-executable.jar. Debugging Java-based product tests requires starting Hadoop containers, configuring /etc/hosts, and running Presto from IntelliJ with appropriate JVM options.

Query Optimizer & Plan Optimization

Relevant Files

presto-main-base/src/main/java/com/facebook/presto/sql/planner/RelationPlanner.java
presto-main-base/src/main/java/com/facebook/presto/sql/planner/iterative/IterativeOptimizer.java
presto-main-base/src/main/java/com/facebook/presto/sql/planner/iterative/Rule.java
presto-main-base/src/main/java/com/facebook/presto/sql/planner/iterative/Memo.java
presto-main-base/src/main/java/com/facebook/presto/sql/planner/iterative/rule/PickTableLayout.java
presto-main-base/src/main/java/com/facebook/presto/sql/planner/PlanOptimizers.java

Overview

Presto's query optimizer transforms logical plans into efficient execution plans through an iterative rule-based system. The optimizer applies cost-based and logical transformations to reduce query execution time by pushing filters down, eliminating redundant operations, and choosing optimal join strategies.

Iterative Optimizer Architecture

The IterativeOptimizer is the core engine that repeatedly applies transformation rules until no further improvements are possible. It uses a Memo data structure to efficiently represent and mutate the plan tree without full rewrites.

Key Components:

Memo: Stores plan nodes in groups with symbolic references to child groups, enabling efficient local mutations
RuleIndex: Indexes rules by pattern for fast candidate matching
Rule: Defines a pattern and transformation logic; rules return empty if not applicable
Context: Provides rules access to metadata, statistics, costs, and utilities

// Rule interface - all optimizations implement this
public interface Rule<T> {
    Pattern<T> getPattern();
    Result apply(T node, Captures captures, Context context);
}

Major Optimization Rules

The optimizer applies rules in multiple passes, organized by category:

Predicate & Filter Pushdown:

PredicatePushDown - Pushes filter conditions down the tree toward table scans
PickTableLayout - Selects optimal table layouts based on predicates
PushDownDereferences - Pushes field access operations down through operators

Column Pruning:

PruneUnreferencedOutputs - Removes unused columns early
PruneProjectColumns, PruneFilterColumns, PruneTopNColumns - Prune columns at each operator

Join Optimization:

ReorderJoins - Reorders joins using cost-based strategies (AUTOMATIC, ELIMINATE_CROSS_JOINS, NONE)
EliminateCrossJoins - Converts cross joins to inner joins when possible
TransformDistinctInnerJoinToLeftEarlyOutJoin - Optimizes distinct joins

Aggregation & Limit:

PushLimitThroughProject, PushLimitThroughUnion - Pushes limits down
SingleDistinctAggregationToGroupBy - Simplifies distinct aggregations
MultipleDistinctAggregationToMarkDistinct - Optimizes multiple distinct aggregations

Expression Simplification:

RemoveRedundantIdentityProjections - Eliminates unnecessary projections
RemoveTrivialFilters - Removes always-true filters
InlineProjections - Inlines simple projections

Table Layout Selection

PickTableLayout is critical for connector-specific optimizations. It operates in two modes:

With Predicate: Pushes filter conditions into table scans, allowing connectors to select layouts that support partition pruning or index usage
Without Predicate: Selects default layouts when no filters apply

The rule extracts predicates, translates them to tuple domains, and queries the connector's getLayout() method to determine the best physical layout.

Optimization Flow

Loading diagram...

Cost-Based Decisions

Rules can be cost-based, using StatsProvider and CostProvider to evaluate alternatives. The optimizer tracks rule execution time and statistics for debugging and tuning. Session properties control which rules are enabled and their behavior (e.g., join reordering strategy).

Extensibility

Connectors can implement ConnectorPlanOptimizer to apply connector-specific optimizations. The framework also supports logical properties (constraints, ordering, partitioning) that rules can exploit for advanced transformations.

Data Types, Functions & Expressions

Relevant Files

presto-common/src/main/java/com/facebook/presto/common/type
presto-spi/src/main/java/com/facebook/presto/spi/relation
presto-expressions/src/main/java/com/facebook/presto/expressions

Type System

Presto's type system is built on the Type interface, which defines how data is represented and manipulated. Every value in Presto has an associated type that determines its Java representation, comparability, and ordering semantics.

Core Type Hierarchy:

Primitive Types: BIGINT, INTEGER, SMALLINT, TINYINT, BOOLEAN, DOUBLE, REAL, DATE, TIME, TIMESTAMP
Variable-Width Types: VARCHAR, VARBINARY, JSON
Complex Types: ARRAY, MAP, ROW
Parametric Types: Types with parameters like DECIMAL(precision, scale), CHAR(length), VARCHAR(length)
User-Defined Types: BIGINT_ENUM, VARCHAR_ENUM, DISTINCT_TYPE
Specialized Types: HyperLogLog, QDigest, TDigest, KllSketch for approximate aggregations

Each type implements methods for:

Comparability: Whether values can be compared for equality
Orderability: Whether values can be sorted
Java Representation: The runtime class used (boolean, long, double, Slice, or Block)
Block Operations: Creating and manipulating columnar data structures

Row Expressions

Row expressions represent the intermediate form of SQL expressions after parsing and analysis. They form a tree structure where each node is a RowExpression subtype.

Expression Types:

CallExpression: Function or operator invocation with a FunctionHandle, return type, and arguments
ConstantExpression: Literal values with their associated type
InputReferenceExpression: References to input columns by index
SpecialFormExpression: Control flow constructs (IF, COALESCE, IN, AND, OR, SWITCH)
LambdaDefinitionExpression: Lambda functions for higher-order operations
VariableReferenceExpression: References to variables in scope

All expressions are immutable and include optional source location information for error reporting.

Function Resolution

Functions are resolved through the FunctionHandle mechanism, which encapsulates function metadata and implementation details. The system supports:

Scalar Functions: Single-row operations
Aggregate Functions: Multi-row reductions
Window Functions: Partitioned computations
Operators: Built-in operations like +, =, <

Function resolution occurs during planning, matching function names and argument types to registered implementations. The StandardFunctionResolution interface provides methods for resolving comparison operators, arithmetic operations, and type coercions.

Type Coercion

Presto automatically coerces types when needed through the TypeManager.canCoerce() method. Common coercions include:

Numeric type widening (TINYINT → BIGINT)
String to numeric conversions
Temporal type conversions

Explicit casting is handled as a special function call with a CAST function handle.

// Example: Creating a function call expression
CallExpression add = call(
    "add",
    functionHandle,
    BIGINT,
    left,  // RowExpression
    right  // RowExpression
);