Overview
Relevant Files
README.mdARCHITECTURE.mdFUNCTIONS.md
Presto is a distributed SQL query engine designed for fast analytic queries against data sources ranging from gigabytes to petabytes. Originally developed by Facebook, it is now governed by the Presto Foundation as an open-source project under the Apache License 2.0.
Core Purpose
Presto enables efficient, high-speed data processing for analytics and batch workloads at scale. It provides a unified query system that can access and process data stored in various formats and storage systems through a highly customizable plugin infrastructure.
Key Characteristics
Architecture: Presto follows a distributed system model with a coordinator and multiple worker nodes. The coordinator manages query planning and execution, while workers process data in parallel.
Performance: Designed for low-latency interactive workloads, ad-hoc analytics, and large-scale batch processing. The system emphasizes vertical integration and modular design to achieve optimal performance.
Connectivity: Supports numerous data sources through pluggable connectors including Hive, Cassandra, PostgreSQL, MySQL, Elasticsearch, Kafka, and many others. This flexibility allows Presto to serve as a unified query layer across heterogeneous data infrastructure.
User Experience: Uses familiar ANSI SQL syntax with sensible defaults and an optimizer that produces efficient query plans, making it accessible to users without specialized knowledge.
Technology Stack
Java 17 (primary implementation)
Maven (build system)
C++ (Presto native execution via Velox)
React/JSX (Presto Console UI)
Project Structure
The repository contains over 100 modules organized by function:
- Core:
presto-main,presto-server,presto-spi(Service Provider Interface) - Connectors:
presto-hive,presto-postgresql,presto-cassandra, etc. - Execution:
presto-parser,presto-expressions,presto-bytecode - Native:
presto-native-execution(C++ implementation using Velox) - Tools:
presto-cli,presto-ui,presto-verifier
Long-Term Vision
Presto is moving toward a native evaluation engine using Velox for improved performance, state-of-the-art query optimization, and greater modularity through standardized components like Arrow and Substrait for interoperability with other data infrastructure.
Architecture & Query Execution Pipeline
Relevant Files
presto-parser/src/main/java/com/facebook/presto/sql/parser/SqlParser.javapresto-main-base/src/main/java/com/facebook/presto/sql/planner/LogicalPlanner.javapresto-main-base/src/main/java/com/facebook/presto/sql/planner/QueryPlanner.javapresto-main-base/src/main/java/com/facebook/presto/execution/SqlQueryExecution.javapresto-main/src/main/java/com/facebook/presto/execution/SqlQueryManager.java
Presto's query execution pipeline transforms SQL text into distributed execution plans through four distinct phases: parsing, semantic analysis, logical planning, and optimization. Understanding this flow is essential for extending the query engine or debugging query behavior.
Phase 1: SQL Parsing
The SqlParser class converts raw SQL strings into an Abstract Syntax Tree (AST) using ANTLR4. The parser:
- Uses a lexer (
SqlBaseLexer) to tokenize input and a parser (SqlBaseParser) to build the parse tree - Attempts parsing in fast SLL mode first, then falls back to LL mode if needed
- Converts the ANTLR parse tree to Presto's AST via
AstBuildervisitor pattern - Handles three input types: statements, expressions, and routine bodies
public Statement createStatement(String sql, ParsingOptions parsingOptions) {
return (Statement) invokeParser("statement", sql,
SqlBaseParser::singleStatement, parsingOptions);
}
Phase 2: Semantic Analysis
The Analyzer class validates the AST against catalog metadata and resolves references. It:
- Validates table and column existence
- Resolves function signatures and type coercion
- Builds scope information for name resolution
- Produces an
Analysisobject containing semantic metadata
Phase 3: Logical Planning
The LogicalPlanner converts the analyzed AST into a logical plan tree of PlanNode objects. For queries, it delegates to QueryPlanner, which:
- Builds table scans from FROM clauses
- Applies filters (WHERE, HAVING)
- Constructs aggregations and window functions
- Adds projections, sorting, and limits
public RelationPlan plan(QuerySpecification node) {
PlanBuilder builder = planFrom(node);
builder = filter(builder, analysis.getWhere(node), node);
builder = aggregate(builder, node);
builder = window(builder, node);
// ... additional transformations
}
Phase 4: Optimization & Execution
The Optimizer applies cost-based transformations to the logical plan:
- Pushes filters and projections down the tree
- Eliminates redundant operations
- Chooses optimal join orders
- Validates the final plan with
PlanChecker
The optimized plan is then fragmented into distributed tasks and scheduled across worker nodes.
Loading diagram...
Query Execution Lifecycle
SqlQueryManager orchestrates the full lifecycle. When a query arrives:
createQuery()registers theSqlQueryExecutioninstancestart()triggers parsing and planning indoCreateLogicalPlanAndOptimize()- The scheduler creates execution tasks from the optimized plan
- Results stream back through output buffers
queryCompletedEvent()fires when done, triggering cleanup
Each phase is instrumented with runtime metrics (e.g., LOGICAL_PLANNER_TIME_NANOS, OPTIMIZER_TIME_NANOS) for performance monitoring.
Connector Framework & Data Sources
Relevant Files
presto-spi/src/main/java/com/facebook/presto/spi/connector/Connector.javapresto-spi/src/main/java/com/facebook/presto/spi/connector/ConnectorMetadata.javapresto-spi/src/main/java/com/facebook/presto/spi/Plugin.javapresto-spi/src/main/java/com/facebook/presto/spi/connector/ConnectorFactory.javapresto-hive/src/main/java/com/facebook/presto/hive/HivePlugin.javapresto-iceberg/src/main/java/com/facebook/presto/iceberg/IcebergPlugin.javapresto-kafka/src/main/java/com/facebook/presto/kafka/KafkaPlugin.java
Overview
Presto's connector framework enables integration with diverse data sources through a standardized plugin architecture. Connectors act as adapters that translate Presto's query execution model into data source-specific operations. The framework uses a plugin-based design where each data source (Hive, Iceberg, Kafka, etc.) implements the connector SPI (Service Provider Interface) to provide metadata, data access, and query optimization capabilities.
Architecture
Loading diagram...
Plugin Registration
Plugins are the entry point for connector registration. Each plugin implements the Plugin interface and provides connector factories via getConnectorFactories(). The PluginManager discovers and loads plugins, registering their connector factories with the ConnectorManager.
public class HivePlugin implements Plugin {
@Override
public Iterable<ConnectorFactory> getConnectorFactories() {
return ImmutableList.of(
new HiveConnectorFactory(name, classLoader, metastore)
);
}
}
Connector Factory Pattern
ConnectorFactory creates connector instances for specific catalogs. Each factory implements three key methods:
getName()- Returns the connector type identifier (e.g., "hive", "iceberg")getHandleResolver()- Provides serialization/deserialization for connector-specific handlescreate()- Instantiates aConnectorwith catalog configuration and context
The factory pattern allows connectors to be instantiated lazily and configured per catalog.
Core Connector Interface
The Connector interface defines the contract for data source integration:
- Transaction Management:
beginTransaction(),commit(),rollback()- Manages transaction lifecycle and isolation levels - Metadata Access:
getMetadata()- ReturnsConnectorMetadatafor schema/table operations - Data Access:
getSplitManager(),getPageSourceProvider()- Handles data retrieval - Write Operations:
getPageSinkProvider()- Supports INSERT/UPDATE/DELETE - Query Optimization:
getConnectorPlanOptimizerProvider()- Enables connector-specific optimizations - Properties:
getSessionProperties(),getTableProperties()- Defines connector-specific configuration
ConnectorMetadata
ConnectorMetadata is the primary interface for metadata operations. Key responsibilities include:
- Schema Discovery:
listSchemaNames(),schemaExists()- Enumerate available schemas - Table Resolution:
getTableHandle(),getTableMetadata()- Resolve table references - Column Information:
getColumnMetadata(),listTableColumns()- Retrieve column details - Constraint Pushdown:
getTableLayoutForConstraint()- Optimize data access with predicates - DML Operations:
beginInsert(),metadataDelete()- Support data modifications - Statistics:
getTableStatistics()- Provide cardinality and distribution info
Data Source Examples
Hive Connector - Integrates with Hive metastore and HDFS. Supports full ACID transactions, partitioning, and complex data types. Uses HiveConnectorFactory to instantiate HiveConnector with metastore configuration.
Iceberg Connector - Provides ACID table format support with time-travel queries. Implements SERIALIZABLE isolation level and supports schema evolution. Manages transactions through IcebergTransactionManager.
Kafka Connector - Read-only streaming connector for Kafka topics. Implements READ_COMMITTED isolation and provides record-based data access through ConnectorRecordSetProvider.
Transaction Handling
Connectors manage transactions through ConnectorTransactionHandle. Each transaction is isolated and single-threaded for metadata access. Connectors specify supported isolation levels (READ_UNCOMMITTED, READ_COMMITTED, SERIALIZABLE) and validate requests accordingly.
@Override
public ConnectorTransactionHandle beginTransaction(
IsolationLevel isolationLevel, boolean readOnly) {
checkConnectorSupports(READ_COMMITTED, isolationLevel);
return KafkaTransactionHandle.INSTANCE;
}
Capabilities and Extensions
Connectors declare capabilities via getCapabilities() to indicate support for features like rewindable splits, page sink commits, and constraint types. Additional extensions include system tables, procedures, table functions, and custom session properties for fine-grained control.
Plugin System & SPI
Relevant Files
presto-spi/src/main/java/com/facebook/presto/spi/Plugin.javapresto-spi/src/main/java/com/facebook/presto/spi/CoordinatorPlugin.javapresto-main-base/src/main/java/com/facebook/presto/server/PluginManager.javapresto-main-base/src/main/java/com/facebook/presto/server/PluginManagerUtil.java
Presto's plugin system enables extensibility through a Service Provider Interface (SPI) that allows third-party developers to add connectors, functions, security controls, and other components without modifying core code. The system uses Java's ServiceLoader mechanism combined with custom classloading to isolate plugin dependencies.
Plugin Interfaces
The Plugin interface is the primary extension point, offering default methods for registering various components:
- Connectors:
getConnectorFactories()registers data source connectors - Types & Functions:
getTypes(),getParametricTypes(),getFunctions()add custom types and functions - Security:
getSystemAccessControlFactories(),getPasswordAuthenticatorFactories()provide authentication and authorization - Event Listeners:
getEventListenerFactories()enable query lifecycle monitoring - Resource Management:
getResourceGroupConfigurationManagerFactories(),getSessionPropertyConfigurationManagerFactories() - Advanced Features: TTL providers, query prerequisites, tracer providers, and more
The CoordinatorPlugin interface is a newer, coordinator-specific extension point for features like function namespace managers, plan checkers, and expression optimizers.
Plugin Loading Mechanism
Loading diagram...
The PluginManager orchestrates plugin loading through PluginManagerUtil.loadPlugins(). Plugins can be specified as:
- Directory paths: Scanned for JAR files
- POM files: Resolved via Maven artifact resolver
- Maven coordinates: Resolved from configured repositories
Each plugin gets its own URLClassLoader with a whitelist of SPI packages (e.g., com.facebook.presto.spi.*, Jackson, Airlift) to prevent classloader conflicts.
Plugin Installation
Once loaded, plugins are installed via installPlugin() and installCoordinatorPlugin() methods. The installation process iterates through each component type and registers them with appropriate managers:
for (ConnectorFactory factory : plugin.getConnectorFactories()) {
connectorManager.addConnectorFactory(factory);
}
for (Type type : plugin.getTypes()) {
metadata.getFunctionAndTypeManager().addType(type);
}
This pattern ensures plugins can contribute multiple component types in a single installation pass.
Key Design Patterns
Isolation: Each plugin runs in its own classloader, preventing dependency conflicts between plugins and the core system.
Lazy Registration: Components are registered only when needed, allowing plugins to define factories that create instances on-demand.
Extensibility: New plugin types (like CoordinatorPlugin) can be added without breaking existing plugins, as they use default methods returning empty collections.
Service Discovery: Java's ServiceLoader automatically discovers plugin implementations via META-INF/services/ files, enabling zero-configuration deployment.
Coordinator & Server Components
Relevant Files
presto-main/src/main/java/com/facebook/presto/server/CoordinatorModule.javapresto-main/src/main/java/com/facebook/presto/server/protocol/QueuedStatementResource.javapresto-main/src/main/java/com/facebook/presto/server/protocol/ExecutingStatementResource.javapresto-main/src/main/java/com/facebook/presto/server/QueryResource.javapresto-main-base/src/main/java/com/facebook/presto/Session.javapresto-main-base/src/main/java/com/facebook/presto/dispatcher/DispatchManager.java
The Coordinator is the central hub of a Presto cluster, responsible for query submission, parsing, planning, and execution orchestration. The server components expose HTTP endpoints that clients use to submit and monitor queries.
Query Submission Flow
Presto uses a lazy execution model where query submission returns immediately with a placeholder, and execution begins only when the client polls for results. This two-stage protocol involves two main REST resources:
QueuedStatementResource (POST /v1/statement) accepts SQL queries and returns a QueryResults object with a next URI. The resource creates a Query wrapper object that encapsulates the statement, session context, and dispatch manager reference. The query is not immediately executed; instead, it waits for the client to request results.
ExecutingStatementResource (GET /v1/statement/executing/{queryId}/{token}) handles result polling once a query has been dispatched. It waits asynchronously for query results and returns them in batches, supporting compression and binary serialization options.
Dispatcher & Query Lifecycle
The DispatchManager orchestrates the transition from queued to executing state. When a client polls for results, the dispatcher:
- Parses and validates the SQL statement
- Selects a resource group based on user, source, and query type
- Creates a
DispatchQuery(local execution wrapper) with aQueryStateMachine - Transitions the query through states:
QUEUED→WAITING_FOR_RESOURCES→DISPATCHED→RUNNING - Submits the query to the
ResourceGroupManagerfor queuing and execution
The LocalDispatchQuery implementation manages the state machine and coordinates with the QueryManager to begin actual execution.
Session Management
The Session object encapsulates all query context: user identity, catalog/schema, system properties, prepared statements, and transaction state. It is created from HttpRequestSessionContext during query submission and passed through the entire execution pipeline. Session properties can be connector-specific or system-wide, allowing fine-grained control over query behavior.
Coordinator Module Bindings
The CoordinatorModule (Guice configuration) wires together all coordinator components:
- Statement resources (
QueuedStatementResource,ExecutingStatementResource) - Query manager and dispatcher (
QueryManager,DispatchManager,DispatchExecutor) - Resource group management and rate limiting
- Memory management and failure detection
- Query monitoring and statistics collection
Loading diagram...
Key Responsibilities
- Query Lifecycle: From submission through completion, tracking state transitions and failures
- Resource Management: Enforcing resource group limits, memory constraints, and concurrency controls
- Session Context: Maintaining user identity, properties, and transaction state across execution
- Async Protocol: Non-blocking HTTP endpoints using futures for scalable query handling
- Monitoring: Recording query events, performance metrics, and operator statistics
Presto Native Execution & Velox
Relevant Files
presto-native-execution/README.mdpresto-native-execution/presto_cpp/main/PrestoServer.cpppresto-native-execution/presto_cpp/main/PrestoMain.cpppresto-native-execution/presto_cpp/main/TaskManager.cpppresto-native-execution/presto_cpp/main/PrestoTask.cpppresto-native-execution/presto_cpp/main/connectors/Registration.cpp
Prestissimo is the C++ native worker implementation of Presto that uses Velox as its execution engine. It implements the Presto Worker REST API to enable high-performance query execution by replacing the Java-based worker with optimized C++ code.
Architecture Overview
Loading diagram...
Core Components
PrestoServer initializes the native worker, managing HTTP endpoints, memory allocation, and Velox registration. It sets up thread pools for drivers, HTTP operations, and spilling, while registering connectors (Hive, Iceberg, TPC-H, TPC-DS) and functions.
TaskManager handles task lifecycle: creation, updates, and cancellation. It receives task fragments from the coordinator, creates Velox tasks, manages task queuing when the server is overloaded, and tracks task statistics. Each task is wrapped in a PrestoTask structure that bridges Presto protocol with Velox execution.
PrestoTask wraps a Velox exec::Task and maintains Presto-specific state including task status, statistics, and lifecycle information. It translates between Presto and Velox task states (e.g., Presto's "Planned" state maps to Velox's pre-started state).
Query Execution Flow
- Task Creation: Coordinator sends a
TaskUpdateRequestwith the query plan fragment and source metadata - Plan Conversion: Presto plan is converted to Velox plan via
PrestoToVeloxQueryPlan - Task Initialization: Velox task is created with the converted plan and memory pool
- Execution: Task starts with configurable driver threads and concurrent lifespans
- Result Buffering: Output pages are serialized and buffered for coordinator retrieval
- Status Reporting: Task statistics are periodically updated and sent back to coordinator
Connector Integration
Connectors are registered through registerConnectors(), which creates adapter instances for each catalog type. PrestoToVeloxConnector implementations translate Presto protocol objects (splits, column handles, table handles) to Velox equivalents. Supported connectors include Hive, Iceberg, Arrow Flight, TPC-H, and TPC-DS.
Memory Management
Velox memory is initialized with a configurable capacity (default: system memory in GB). The system uses a SharedArbitrator for memory arbitration across queries and an optional SsdCache for spilling. Memory pools are hierarchically organized per query and task.
Key Features
- Parallel Execution: Multiple drivers per task with configurable concurrency
- Task Queuing: Automatic queuing when server is overloaded
- Metrics Collection: Optional Prometheus integration for runtime metrics
- GPU Support: Optional cuDF integration for GPU-accelerated operations
- Flexible Storage: Support for local, HDFS, S3, GCS, and ABFS file systems
Testing & Benchmarking Infrastructure
Relevant Files
presto-product-tests/README.mdpresto-tests/src/main/java/com/facebook/presto/tests/AbstractTestQueryFramework.javapresto-benchmark/src/main/java/com/facebook/presto/benchmark/BenchmarkSuite.javapresto-testng-services/src/main/java/com/facebook/presto/testng/services/LogTestDurationListener.javapresto-benchmark-runner/src/main/java/com/facebook/presto/benchmark/framework/BenchmarkRunner.java
Presto employs a multi-layered testing strategy combining unit tests, integration tests, product tests, and performance benchmarks. This infrastructure ensures correctness across all execution modes and validates end-to-end functionality.
Test Layers
Unit Tests run as part of the standard Maven build and test individual components in isolation. Integration Tests use AbstractTestQueryFramework and DistributedQueryRunner to validate query execution against in-memory or distributed clusters. Product Tests exercise user-facing interfaces like presto-cli against real Hadoop and Presto deployments using Docker containers and the Tempto test harness.
Product Testing with Docker
Product tests are containerized for reproducibility and isolation. The presto-product-tests/bin/run_on_docker.sh script orchestrates Docker Compose to spin up Hadoop, Presto coordinators, and workers. Tests run in a separate JVM and support multiple profiles:
- singlenode - Single Presto instance with pseudo-distributed Hadoop
- multinode - Distributed Presto with multiple workers
- multinode-tls - TLS-encrypted coordinator and worker communication
- singlenode-kerberos-hdfs-impersonation - Kerberized Hadoop with user impersonation
- singlenode-ldap - LDAP authentication testing
Tests are organized into groups (e.g., string_functions, authorization, hdfs_impersonation) and can be run selectively with -g or -t flags.
Benchmarking Framework
The presto-benchmark module contains hand-written and SQL-based benchmarks for performance regression detection. BenchmarkSuite aggregates benchmarks covering aggregations, joins, filtering, and TPC-H queries. The presto-benchmark-runner provides a CLI interface with pluggable event clients for metrics collection.
presto-benchto-benchmarks integrates with Benchto for large-scale performance testing using TPC-H and TPC-DS datasets. Benchmarks are defined in YAML with configurable runs, prewarm iterations, and session properties.
Test Monitoring
LogTestDurationListener tracks test execution times and detects hangs. It logs individual tests exceeding 30 seconds, test classes exceeding 1 minute, and global idle periods exceeding 8 minutes. Thread dumps are captured when hangs are detected, aiding in debugging stuck tests.
// Example: Creating a distributed query runner for integration tests
DistributedQueryRunner queryRunner = new DistributedQueryRunner.Builder(session)
.setNodeCount(3)
.setExtraProperties(extraProperties)
.build();
MaterializedResult result = queryRunner.execute("SELECT * FROM table");
Running Tests
Execute unit tests with Maven: ./mvnw test. Run product tests with: presto-product-tests/bin/run_on_docker.sh multinode -x quarantine,big_query. Run benchmarks with: java -jar presto-benchmark-runner-*-executable.jar. Debugging Java-based product tests requires starting Hadoop containers, configuring /etc/hosts, and running Presto from IntelliJ with appropriate JVM options.
Query Optimizer & Plan Optimization
Relevant Files
presto-main-base/src/main/java/com/facebook/presto/sql/planner/RelationPlanner.javapresto-main-base/src/main/java/com/facebook/presto/sql/planner/iterative/IterativeOptimizer.javapresto-main-base/src/main/java/com/facebook/presto/sql/planner/iterative/Rule.javapresto-main-base/src/main/java/com/facebook/presto/sql/planner/iterative/Memo.javapresto-main-base/src/main/java/com/facebook/presto/sql/planner/iterative/rule/PickTableLayout.javapresto-main-base/src/main/java/com/facebook/presto/sql/planner/PlanOptimizers.java
Overview
Presto's query optimizer transforms logical plans into efficient execution plans through an iterative rule-based system. The optimizer applies cost-based and logical transformations to reduce query execution time by pushing filters down, eliminating redundant operations, and choosing optimal join strategies.
Iterative Optimizer Architecture
The IterativeOptimizer is the core engine that repeatedly applies transformation rules until no further improvements are possible. It uses a Memo data structure to efficiently represent and mutate the plan tree without full rewrites.
Key Components:
- Memo: Stores plan nodes in groups with symbolic references to child groups, enabling efficient local mutations
- RuleIndex: Indexes rules by pattern for fast candidate matching
- Rule: Defines a pattern and transformation logic; rules return empty if not applicable
- Context: Provides rules access to metadata, statistics, costs, and utilities
// Rule interface - all optimizations implement this
public interface Rule<T> {
Pattern<T> getPattern();
Result apply(T node, Captures captures, Context context);
}
Major Optimization Rules
The optimizer applies rules in multiple passes, organized by category:
Predicate & Filter Pushdown:
PredicatePushDown- Pushes filter conditions down the tree toward table scansPickTableLayout- Selects optimal table layouts based on predicatesPushDownDereferences- Pushes field access operations down through operators
Column Pruning:
PruneUnreferencedOutputs- Removes unused columns earlyPruneProjectColumns,PruneFilterColumns,PruneTopNColumns- Prune columns at each operator
Join Optimization:
ReorderJoins- Reorders joins using cost-based strategies (AUTOMATIC, ELIMINATE_CROSS_JOINS, NONE)EliminateCrossJoins- Converts cross joins to inner joins when possibleTransformDistinctInnerJoinToLeftEarlyOutJoin- Optimizes distinct joins
Aggregation & Limit:
PushLimitThroughProject,PushLimitThroughUnion- Pushes limits downSingleDistinctAggregationToGroupBy- Simplifies distinct aggregationsMultipleDistinctAggregationToMarkDistinct- Optimizes multiple distinct aggregations
Expression Simplification:
RemoveRedundantIdentityProjections- Eliminates unnecessary projectionsRemoveTrivialFilters- Removes always-true filtersInlineProjections- Inlines simple projections
Table Layout Selection
PickTableLayout is critical for connector-specific optimizations. It operates in two modes:
- With Predicate: Pushes filter conditions into table scans, allowing connectors to select layouts that support partition pruning or index usage
- Without Predicate: Selects default layouts when no filters apply
The rule extracts predicates, translates them to tuple domains, and queries the connector's getLayout() method to determine the best physical layout.
Optimization Flow
Loading diagram...
Cost-Based Decisions
Rules can be cost-based, using StatsProvider and CostProvider to evaluate alternatives. The optimizer tracks rule execution time and statistics for debugging and tuning. Session properties control which rules are enabled and their behavior (e.g., join reordering strategy).
Extensibility
Connectors can implement ConnectorPlanOptimizer to apply connector-specific optimizations. The framework also supports logical properties (constraints, ordering, partitioning) that rules can exploit for advanced transformations.
Data Types, Functions & Expressions
Relevant Files
presto-common/src/main/java/com/facebook/presto/common/typepresto-spi/src/main/java/com/facebook/presto/spi/relationpresto-expressions/src/main/java/com/facebook/presto/expressions
Type System
Presto's type system is built on the Type interface, which defines how data is represented and manipulated. Every value in Presto has an associated type that determines its Java representation, comparability, and ordering semantics.
Core Type Hierarchy:
- Primitive Types:
BIGINT,INTEGER,SMALLINT,TINYINT,BOOLEAN,DOUBLE,REAL,DATE,TIME,TIMESTAMP - Variable-Width Types:
VARCHAR,VARBINARY,JSON - Complex Types:
ARRAY,MAP,ROW - Parametric Types: Types with parameters like
DECIMAL(precision, scale),CHAR(length),VARCHAR(length) - User-Defined Types:
BIGINT_ENUM,VARCHAR_ENUM,DISTINCT_TYPE - Specialized Types:
HyperLogLog,QDigest,TDigest,KllSketchfor approximate aggregations
Each type implements methods for:
- Comparability: Whether values can be compared for equality
- Orderability: Whether values can be sorted
- Java Representation: The runtime class used (boolean, long, double, Slice, or Block)
- Block Operations: Creating and manipulating columnar data structures
Row Expressions
Row expressions represent the intermediate form of SQL expressions after parsing and analysis. They form a tree structure where each node is a RowExpression subtype.
Expression Types:
CallExpression: Function or operator invocation with aFunctionHandle, return type, and argumentsConstantExpression: Literal values with their associated typeInputReferenceExpression: References to input columns by indexSpecialFormExpression: Control flow constructs (IF, COALESCE, IN, AND, OR, SWITCH)LambdaDefinitionExpression: Lambda functions for higher-order operationsVariableReferenceExpression: References to variables in scope
All expressions are immutable and include optional source location information for error reporting.
Function Resolution
Functions are resolved through the FunctionHandle mechanism, which encapsulates function metadata and implementation details. The system supports:
- Scalar Functions: Single-row operations
- Aggregate Functions: Multi-row reductions
- Window Functions: Partitioned computations
- Operators: Built-in operations like
+,=,<
Function resolution occurs during planning, matching function names and argument types to registered implementations. The StandardFunctionResolution interface provides methods for resolving comparison operators, arithmetic operations, and type coercions.
Type Coercion
Presto automatically coerces types when needed through the TypeManager.canCoerce() method. Common coercions include:
- Numeric type widening (TINYINT → BIGINT)
- String to numeric conversions
- Temporal type conversions
Explicit casting is handled as a special function call with a CAST function handle.
// Example: Creating a function call expression
CallExpression add = call(
"add",
functionHandle,
BIGINT,
left, // RowExpression
right // RowExpression
);