Selenium WebDriver | Augment Code

Overview

Relevant Files

README.md
CONTRIBUTING.md
java/src/org/openqa/selenium/WebDriver.java
py/selenium/webdriver/init.py
javascript/selenium-webdriver/index.js

Selenium is an umbrella project that automates web browsers across multiple platforms and programming languages. It provides a unified interface for browser automation through the W3C WebDriver specification, enabling developers to write tests and automation scripts that work consistently across Chrome, Firefox, Edge, Safari, and other browsers.

Core Architecture

The project is organized into language-specific bindings and shared infrastructure:

Language Bindings: Java, Python, JavaScript, Ruby, and .NET implementations of the WebDriver API
Selenium Grid: Distributed testing infrastructure for running tests across multiple machines and browsers
Browser Drivers: ChromeDriver, GeckoDriver, and other driver implementations that communicate with browsers
Common Infrastructure: Shared utilities, atoms, and DevTools support

Key Components

WebDriver Interface — The central abstraction that all language bindings implement. It provides methods for:

Browser navigation (get(), back(), forward())
Element finding and interaction (findElement(), click(), sendKeys())
Window and frame management
Cookie and session handling

Language Bindings — Each language (Java, Python, JavaScript, Ruby, .NET) provides a native implementation of the WebDriver interface, allowing developers to write automation code in their preferred language.

Selenium Grid — A server that enables distributed testing by managing multiple browser instances across different machines, allowing parallel test execution and cross-browser testing at scale.

Build System

Selenium uses Bazel as its build system to manage dependencies, compile source code, and execute tests efficiently. The project structure uses BUILD.bazel files to define build targets for each module. Common commands include:

bazel build //<language>/...
bazel test //<language>/...
bazel run //java/src/org/openqa/selenium/grid:executable-grid

Multi-Language Support

Loading diagram...

Development Workflow

The project follows a trunk-based development model with HEAD-based development. Contributors fork the repository, create feature branches, and submit pull requests. All code must include proper license headers and pass tests before integration. The project uses Rake tasks (via ./go command) to simplify common operations like building, testing, and releasing.

Testing Strategy

Tests are organized by size and type:

Small tests: Unit tests without browser interaction
Medium tests: Integration tests with limited browser usage
Large tests: Full browser automation tests

Each language binding has specific test commands and can be filtered by browser type and test tags for targeted testing.

Architecture & Core Components

Relevant Files

java/src/org/openqa/selenium/remote/RemoteWebDriver.java
java/src/org/openqa/selenium/grid/package-info.java
py/selenium/webdriver/remote/webdriver.py
javascript/selenium-webdriver/lib/webdriver.js
rb/lib/selenium/webdriver/remote/bridge.rb

Selenium is built on a client-server architecture where language bindings communicate with a remote WebDriver server via the W3C WebDriver Protocol over HTTP. This design enables cross-platform browser automation and distributed testing through Selenium Grid.

Core Components

RemoteWebDriver is the central client-side class (available in Java, Python, JavaScript, and Ruby) that orchestrates all browser interactions. It maintains a session ID and delegates commands to a CommandExecutor, which translates high-level operations into HTTP requests conforming to the W3C protocol.

Command Execution Flow:

User calls a method (e.g., driver.get(url))
RemoteWebDriver creates a Command object with the session ID and parameters
CommandExecutor encodes the command using a protocol codec (W3C or legacy)
HTTP request is sent to the remote server
Server processes the request and returns a JSON response
Response is decoded and converted back to language-native types

Selenium Grid Architecture

Selenium Grid distributes test execution across multiple machines using a hierarchical routing system:

Loading diagram...

Router receives all incoming requests and routes them to appropriate handlers. For new session requests, it forwards to the Distributor.

Distributor maintains a registry of available Nodes and their capabilities. When a new session is requested, it matches the desired capabilities against available slots and allocates a session on a suitable Node.

Node represents a machine capable of running browser instances. Each Node registers with the Distributor and maintains a pool of browser slots. When a session is created, the Node stores its location in the SessionMap.

SessionMap is a distributed registry that maps session IDs to their Node locations, enabling the Router to proxy subsequent commands to the correct Node.

Protocol & Communication

Commands are serialized as JSON following the W3C WebDriver specification. Each command includes:

Command name (e.g., GET, FIND_ELEMENT)
Session ID (identifies the browser session)
Parameters (command-specific data)

The server responds with a JSON object containing a value field (the result) or an error field if the command failed. Language bindings deserialize responses and convert them to native types (e.g., WebElement objects).

Multi-Language Consistency

All language bindings (Java, Python, JavaScript, Ruby) implement the same architecture:

Bridge/Connection layer handles HTTP communication
Command codec translates between language APIs and W3C protocol
Element representation wraps server-side element references
Error handling maps protocol errors to language-specific exceptions

This consistency ensures that test code behaves identically across languages while leveraging each language's idioms and libraries.

WebDriver Protocol & Remote Communication

Relevant Files

java/src/org/openqa/selenium/remote/HttpCommandExecutor.java
py/selenium/webdriver/remote/remote_connection.py
dotnet/src/webdriver/Remote/W3CWireProtocolCommandInfoRepository.cs
javascript/selenium-webdriver/http/index.js
rb/lib/selenium/webdriver/remote/driver.rb

Selenium uses a standardized W3C WebDriver Protocol for remote communication between client bindings and WebDriver servers. This protocol defines how commands are encoded, transmitted over HTTP, and responses are decoded across all language implementations.

Protocol Architecture

The WebDriver protocol operates on a request-response model using HTTP. Each language binding implements three core components:

Command Codec - Converts high-level commands into HTTP requests following the W3C specification
HTTP Client - Sends requests to the remote server and handles network communication
Response Codec - Parses HTTP responses back into structured command results

Loading diagram...

Protocol Handshake

When a new session is created, the client performs a protocol handshake to negotiate capabilities with the server. The Java implementation demonstrates this:

ProtocolHandshake handshake = new ProtocolHandshake();
ProtocolHandshake.Result result = handshake.createSession(client, command);
Dialect dialect = result.getDialect();
commandCodec = dialect.getCommandCodec();
responseCodec = dialect.getResponseCodec();

The server responds with the negotiated dialect (currently W3C), and the client configures its codecs accordingly. This allows servers to support multiple protocol versions.

Command Mapping

Commands are mapped to HTTP endpoints using a standardized pattern. Each language binding maintains a command registry:

Python example:

remote_commands = {
    Command.NEW_SESSION: ("POST", "/session"),
    Command.FIND_ELEMENT: ("POST", "/session/$sessionId/element"),
    Command.CLICK_ELEMENT: ("POST", "/session/$sessionId/element/$id/click"),
}

C# example:

this.TryAddCommand(DriverCommand.NewSession, 
    new HttpCommandInfo(HttpCommandInfo.PostCommand, "/session"));
this.TryAddCommand(DriverCommand.FindElement, 
    new HttpCommandInfo(HttpCommandInfo.PostCommand, "/session/{sessionId}/element"));

Path parameters like $sessionId and $id are substituted at execution time with actual values.

Request Execution Flow

The execution pipeline is consistent across all bindings:

Encode - Command codec converts the command object to an HTTP request with JSON payload
Send - HTTP client transmits the request to the server
Receive - HTTP client reads the response
Decode - Response codec parses the JSON response into a result object

Java execution:

HttpRequest httpRequest = commandCodec.encode(command);
HttpResponse httpResponse = client.execute(httpRequest);
Response response = responseCodec.decode(httpResponse);

W3C Protocol Specifics

The W3C WebDriver specification defines:

Element Identifiers - Elements are encoded with the key element-6066-11e4-a52e-4f735466cecf
Shadow Root Identifiers - Shadow roots use shadow-6066-11e4-a52e-4f735466cecf
Response Format - All responses wrap values in a {"value": ...} structure
Error Handling - HTTP status codes and error objects follow the W3C error model

BiDi Protocol Support

Modern implementations also support the WebDriver BiDi protocol for bidirectional communication. When a server advertises WebSocket support via the web_socket_url capability, clients can establish a persistent connection:

socket_url = @capabilities[:web_socket_url]
@bidi = Selenium::WebDriver::BiDi.new(url: socket_url)

BiDi enables server-initiated events and real-time communication, complementing the traditional request-response model.

Cross-Language Consistency

All language bindings follow the same protocol structure, ensuring compatibility:

Java - HttpCommandExecutor with W3CHttpCommandCodec
Python - RemoteConnection with command registry
JavaScript - HttpClient with request/response handling
Ruby - Bridge with HTTP communication
.NET - W3CWireProtocolCommandInfoRepository with command definitions

This standardization allows any client binding to communicate with any W3C-compliant WebDriver server.

Browser Driver Implementations

Relevant Files

java/src/org/openqa/selenium/chrome/ChromeDriver.java
java/src/org/openqa/selenium/firefox/FirefoxDriver.java
py/selenium/webdriver/chrome/webdriver.py
py/selenium/webdriver/firefox/webdriver.py
rb/lib/selenium/webdriver/chromium/driver.rb

Architecture Overview

Browser drivers in Selenium are language-specific implementations that control local browser instances. Each driver communicates with a browser-specific executable (chromedriver, geckodriver) via the WebDriver protocol. The architecture follows a consistent pattern across Java, Python, and Ruby: a driver class inherits from a base remote driver, manages a driver service process, and accepts browser-specific options.

Loading diagram...

Driver Hierarchy

Java: ChromeDriver and FirefoxDriver extend RemoteWebDriver. ChromeDriver specifically extends ChromiumDriver, which adds Chromium-specific capabilities like CDP, BiDi, and casting.

Python: Chrome.WebDriver extends ChromiumDriver, while Firefox.WebDriver extends RemoteWebDriver directly. This reflects Firefox's different protocol support.

Ruby: Chrome::Driver and Firefox::Driver extend Chromium::Driver and base Driver respectively, with driver extensions providing additional functionality.

Driver Services

Services manage the lifecycle of browser driver executables. They handle:

Process Management: Starting and stopping the driver executable (chromedriver, geckodriver)
Port Assignment: Allocating available ports for the driver to listen on
Connectivity Checks: Verifying the driver is ready before returning control
Logging: Capturing driver output for debugging

Each language implements service classes: ChromeDriverService, GeckoDriverService (Java); Service classes (Python/Ruby) with browser-specific subclasses.

Driver Options

Options objects configure browser behavior before session creation. They support:

Browser Arguments: Command-line flags passed to the browser (--headless, --no-sandbox)
Binary Path: Custom browser executable location
Preferences: Browser-specific settings (Firefox profiles, Chrome prefs)
Extensions: Pre-installed browser extensions
Capabilities: W3C WebDriver capabilities like acceptInsecureCerts, pageLoadStrategy

Options are converted to capabilities dictionaries sent to the driver service during session initialization.

Initialization Flow

Create options object with desired configuration
Create driver service (or use default)
Driver service starts the browser executable on an available port
Driver instance connects to the service via HTTP
Session is created with negotiated capabilities
Driver is ready for commands

Browser-Specific Features

Chrome/Chromium: Drivers implement HasCdp, HasBiDi, HasCasting, HasAuthentication, and HasNetworkConditions for advanced debugging and network control.

Firefox: Drivers implement HasExtensions, HasFullPageScreenshot, HasContext, and HasBiDi for Firefox-specific functionality like addon installation and context switching between chrome and content scopes.

Key Implementation Details

DriverFinder: Automatically locates browser and driver executables using Selenium Manager
Command Executors: Translate high-level driver commands into WebDriver protocol messages
Error Handling: Standardized error responses from the driver service
Cleanup: Drivers properly shut down services and close browser processes on quit()

BiDi (Bidirectional Protocol) Support

Relevant Files

java/src/org/openqa/selenium/bidi/BiDi.java
py/selenium/webdriver/common/bidi/script.py
javascript/selenium-webdriver/bidi/index.js
rb/lib/selenium/webdriver/bidi.rb
dotnet/src/webdriver/BiDi/BiDi.cs

BiDi (WebDriver Bidirectional Protocol) is a modern protocol that enables full-duplex communication between WebDriver clients and browsers. Unlike the traditional JSON-Wire Protocol, BiDi allows browsers to send events to clients in real-time, enabling advanced automation scenarios like network interception, script execution, and event listening.

Architecture Overview

BiDi is implemented across all Selenium language bindings with a consistent architecture:

WebSocket Connection: Each binding maintains a WebSocket connection to the browser for bidirectional messaging
Command-Response Pattern: Clients send commands with unique IDs and receive responses asynchronously
Event Subscription: Clients can subscribe to browser events and receive notifications in real-time
Module System: Functionality is organized into modules (Session, BrowsingContext, Script, Network, Log, Storage, Browser, Emulation, WebExtension)

Core Components

BiDi Class serves as the main entry point:

public class BiDi implements Closeable {
  public <X> X send(Command<X> command) { ... }
  public <X> long addListener(Event<X> event, Consumer<X> handler) { ... }
  public BiDiSessionStatus getBidiSessionStatus() { ... }
}

Connection Management handles WebSocket communication:

Maintains message ID sequencing for request-response correlation
Manages callback registries for event handlers
Handles timeouts (default 30 seconds) for command execution
Supports both global and context-specific event subscriptions

Key Modules

Session: Status checks and event subscription management
BrowsingContext: Navigation, window management, and context creation
Script: JavaScript execution, realm management, and console message handling
Network: Request/response interception, data collection, and cookie management
Log: Console and JavaScript error event listening
Storage: Cookie and storage operations
Browser: Window management and download behavior configuration
Emulation: Device emulation, geolocation, timezone, and user agent overrides

Usage Pattern

from selenium import webdriver
from selenium.webdriver.common.bidi.script import Script

driver = webdriver.Chrome()
bidi = driver.bidi

# Subscribe to console messages
def handle_console(message):
    print(f"Console: {message}")

handler_id = bidi.script.add_console_message_handler(handle_console)

# Execute JavaScript
result = bidi.script.execute("return 2 + 2")

# Clean up
bidi.script.remove_console_message_handler(handler_id)

Event Subscription

BiDi supports subscribing to events at different scopes:

Global: All events across all browsing contexts
Context-Specific: Events from a particular browsing context
Multi-Context: Events from a set of specific browsing contexts

Event subscriptions require explicit subscription via session.subscribe before handlers receive notifications.

Implementation Details

Each language binding implements BiDi with language-specific patterns:

Java: Uses Consumer callbacks and CompletableFuture for async operations
Python: Uses callback functions and async/await patterns
JavaScript: Uses EventEmitter and Promise-based APIs
.NET: Uses async/await with Task and IAsyncDisposable
Ruby: Uses blocks and lazy initialization for modules

Selenium Grid & Distributed Testing

Relevant Files

java/src/org/openqa/selenium/grid/Bootstrap.java
java/src/org/openqa/selenium/grid/router/Router.java
java/src/org/openqa/selenium/grid/distributor/Distributor.java
java/src/org/openqa/selenium/grid/node/Node.java
java/src/org/openqa/selenium/grid/sessionmap/SessionMap.java
java/src/org/openqa/selenium/grid/sessionqueue/NewSessionQueue.java
javascript/grid-ui/src

Architecture Overview

Selenium Grid is a distributed testing infrastructure composed of independent, loosely-coupled components that communicate over HTTP. The system is designed to scale horizontally, allowing you to run tests in parallel across multiple machines and browsers.

Loading diagram...

Core Components

Router is the entry point for all client requests. It examines incoming WebDriver commands and routes them to the appropriate handler. For new session requests, it delegates to the Distributor. For existing sessions, it queries the Session Map to locate the session and forwards commands to the appropriate Node.

Distributor manages the allocation of browser sessions across available Nodes. It maintains a registry of all connected Nodes and their available slots (browser instances). When a new session request arrives, the Distributor selects the best-matching Node based on requested capabilities and current availability.

Node represents a machine capable of running browser sessions. Each Node exposes HTTP endpoints for session creation, execution, and termination. Nodes register themselves with the Distributor and report their available slots and capabilities. Multiple browser instances can run on a single Node.

Session Map maintains a registry of all active sessions and their locations. When a session is created on a Node, it is registered in the Session Map. This allows the Router to quickly locate any session without querying every Node.

New Session Queue buffers incoming session requests when no suitable Nodes are available. Requests wait in the queue until capacity becomes available. The Distributor polls the queue periodically, matching requests to available slots.

Request Lifecycle

Client sends a new session request to the Router
Router forwards the request to the New Session Queue
Distributor polls the queue and matches requests to available Node slots
Selected Node creates the session and registers it in the Session Map
Session ID is returned to the client
Subsequent commands are routed directly to the Node via the Session Map

Bootstrap & Deployment

The Bootstrap class handles dynamic class loading and extension support. It allows Grid components to be started as separate processes (Router, Distributor, Node, Session Queue) or combined into a single Hub. The --ext flag enables loading custom extensions from external JAR files.

Grid can be deployed in multiple topologies:

Standalone: Single process running Router, Distributor, Node, and Session Map
Hub-and-Node: Central Hub (Router + Distributor) with remote Nodes
Fully Distributed: Each component (Router, Distributor, Session Queue, Session Map, Nodes) runs independently

Grid UI

The React-based Grid UI (javascript/grid-ui/src) provides real-time visualization of Grid status, active sessions, queued requests, and Node availability. It communicates with the Grid via GraphQL endpoints, displaying live metrics and allowing operators to drain Nodes gracefully.

Key Design Patterns

HTTP-First: All components communicate via HTTP, enabling deployment across networks
Stateless Components: Router and Distributor are stateless; state is managed by Session Map and Queue
Event-Driven: Components emit events (session created, node removed) for monitoring and coordination
Slot-Based Allocation: Nodes expose discrete slots; each slot can run one session
Capability Matching: Requests are matched to Nodes based on browser capabilities and stereotypes

Selenium Manager & Driver Management

Relevant Files

rust/src/lib.rs
java/src/org/openqa/selenium/manager/SeleniumManager.java
py/selenium/webdriver/common/selenium_manager.py
javascript/selenium-webdriver/common/seleniumManager.js
rb/lib/selenium/webdriver/common/selenium_manager.rb
py/selenium/webdriver/common/driver_finder.py

Selenium Manager is a cross-platform binary tool that automatically detects, downloads, and manages WebDriver binaries and browser installations. It eliminates the need for manual driver management and provides a unified interface across all language bindings.

Architecture Overview

Loading diagram...

Core Components

Selenium Manager Binary is a Rust-based executable distributed with each language binding. It handles platform-specific logic for locating browsers and drivers across Windows, macOS, and Linux.

Language Bindings wrap the binary with language-specific APIs. Each binding (Python, Java, JavaScript, Ruby) provides a SeleniumManager class that:

Locates the platform-appropriate binary
Constructs command-line arguments
Executes the binary as a subprocess
Parses JSON output
Returns driver and browser paths

Execution Flow

Binary Discovery: Each binding searches for the Selenium Manager binary in this order:
- Environment variable SE_MANAGER_PATH (if set)
- Bundled binary in the language package
- Platform-specific subdirectory (windows/, macos/, linux/)
Command Execution: The binary is invoked with arguments specifying:
- Browser name or driver name
- Language binding identifier
- Output format (JSON)
- Debug flag (if logging enabled)
Result Parsing: The binary returns JSON containing:
- driver_path: Full path to the WebDriver executable
- browser_path: Full path to the browser installation
- logs: Array of debug/info/warning messages

Language-Specific Implementations

Python (driver_finder.py): The DriverFinder class integrates Selenium Manager into the driver initialization flow. If no explicit driver path is provided via Service, it queries Selenium Manager to locate both driver and browser.

Java: Uses a singleton pattern to ensure only one binary instance exists per JVM. The binary is extracted from a JAR file to a temporary location on first use and cleaned up on shutdown.

JavaScript: Invokes the binary synchronously using spawnSync, parsing the JSON output to extract driver and browser paths.

Ruby: Executes the binary via Open3, capturing stdout and parsing the result as JSON.

Configuration & Environment Variables

SE_MANAGER_PATH: Override the default binary location
SE_CACHE_PATH: Custom cache directory for downloaded drivers
SE_* properties: Language-specific configuration passed to the binary
Debug logging: Automatically enabled when the language binding's logger is set to DEBUG level

Key Design Patterns

Lazy Initialization: Binaries are located and validated only when first needed, reducing startup overhead.

Caching: Downloaded drivers and browser metadata are cached locally to avoid redundant downloads.

Error Handling: Failures include detailed error messages from the binary's execution, helping diagnose missing browsers or network issues.

Platform Abstraction: The Rust binary encapsulates all platform-specific logic, allowing language bindings to remain simple and maintainable.

Testing Infrastructure & Test Utilities

Relevant Files

java/test/org/openqa/selenium/testing/drivers/WebDriverBuilder.java
py/conftest.py
rb/spec/integration/selenium/webdriver/spec_helper.rb
javascript/selenium-webdriver/test
java/private/selenium_test.bzl
java/private/junit5_test.bzl
javascript/private/mocha_test.bzl

Selenium uses a multi-language testing infrastructure built on Bazel, with language-specific test runners and utilities. Each binding provides fixtures, annotations, and helpers to manage WebDriver instances, browser selection, and test isolation.

Test Runners & Frameworks

Java uses JUnit 5 via junit5_test.bzl, which wraps the JUnit5Runner. Tests are organized by size (small, medium, large) and browser. The selenium_test() macro in selenium_test.bzl automatically generates test variants for multiple browsers (Chrome, Firefox, Edge, Safari, IE) with appropriate JVM flags and dependencies.

Python uses pytest with a comprehensive conftest.py that provides fixtures and CLI options. Tests support multiple drivers via --driver flag (chrome, firefox, edge, safari, remote, etc.) and can be filtered by driver type. The framework includes headless mode, BiDi support, and LAN IP configuration.

Ruby uses RSpec with a custom spec_helper.rb that configures test environment, browser guards, and driver lifecycle. Tests are organized into unit and integration suites with automatic driver reset on failures.

JavaScript uses Mocha for WebDriver tests and Jest for Grid UI. The mocha_test.bzl rule configures timeouts and XML output. Tests use the suite() wrapper from selenium-webdriver/testing for browser-specific test filtering.

Driver Management & Fixtures

The WebDriverBuilder class (Java) implements a Supplier pattern for creating managed WebDriver instances. It detects the target browser, tries multiple suppliers (External, Grid, Remote, IE, Default), and registers shutdown hooks to clean up drivers. Log levels are configurable via system properties.

Python's driver fixture (conftest.py) manages a global driver instance with platform validation, remote test filtering, and automatic cleanup. The clean_driver fixture provides fresh instances for isolation-sensitive tests. Options and services can be customized per test.

Ruby's TestEnvironment manages driver lifecycle with automatic reset on non-assertion failures. The SeleniumTestListener hooks into RSpec to detect exceptions and reset state appropriately.

JavaScript's Environment class (from selenium-webdriver/testing) provides a builder with browser-specific configuration, binary paths, and service setup.

Test Annotations & Guards

Java provides annotations for conditional test execution:

@Ignore(Browser.CHROME) - Skip on specific browsers
@NeedsFreshDriver - Require driver isolation
@NoDriverAfterTest - Don't reuse driver
@NotYetImplemented - Mark incomplete tests

Python uses pytest markers:

@pytest.mark.xfail_chrome - Expected failures per browser
@pytest.mark.no_driver_after_test - Fresh driver requirement
@pytest.mark.needs_fresh_driver - Isolation marker

Ruby uses Guards with conditions for driver, browser, platform, CI environment, headless mode, and BiDi support. Guards can skip or mark tests as pending.

JavaScript provides ignore() wrapper for conditional test suppression based on browser or environment predicates.

Multi-Browser Test Generation

Bazel rules automatically generate test variants across browsers. The selenium_test() macro creates separate test targets for each browser with appropriate:

JVM flags (e.g., -Dselenium.browser=chrome)
Browser-specific dependencies
Data files (driver binaries, test resources)
Tags for filtering (chrome, firefox, remote-browser, etc.)

Remote variants are generated with -Dselenium.browser.remote=true and Grid server dependencies. This enables comprehensive cross-browser testing without duplicating test code.

Test Data & Resources

Tests access shared resources via Bazel's runfiles system. JavaScript tests use @bazel/runfiles to resolve browser binaries and driver executables. Python and Ruby tests similarly resolve paths for Selenium Manager and browser binaries. Test data (HTML pages, test files) are bundled as Bazel data dependencies.

Language Bindings & Ecosystem Integration

Relevant Files

java/src/org/openqa/selenium - Java WebDriver API and implementations
py/selenium/webdriver - Python WebDriver bindings
javascript/selenium-webdriver - JavaScript/Node.js bindings
rb/lib/selenium/webdriver - Ruby bindings
dotnet/src/webdriver - .NET bindings
rust/src - Rust Selenium Manager

Selenium provides language bindings for Java, Python, JavaScript, Ruby, and .NET, all implementing the W3C WebDriver specification. Each binding offers a consistent API while leveraging language-specific idioms and libraries.

Binding Architecture

All language bindings follow a three-layer architecture:

Public API Layer - Language-specific interfaces (WebDriver, WebElement, Options)
Command Codec Layer - Translates API calls to W3C protocol commands
Transport Layer - HTTP/WebSocket communication with WebDriver servers

The WebDriver interface is the entry point in every binding. It defines methods for browser control, element finding, and session management. Implementations like RemoteWebDriver (Java), WebDriver (Python), and WebDriver (JavaScript) handle the actual protocol communication.

Cross-Language Consistency

All bindings implement the same core concepts:

Capabilities - Browser configuration passed during session creation
Locators - Element finding strategies (By.id(), By.xpath(), etc.)
Actions - User interactions (click, type, drag-and-drop)
Waits - Explicit and implicit wait mechanisms
Error Handling - Standardized exception hierarchy

Protocol Communication

Bindings communicate with WebDriver servers using two protocols:

W3C WebDriver Protocol - The standard request-response model for commands like navigation, element interaction, and script execution. Each binding maintains a CommandExecutor that serializes commands to JSON and deserializes responses.

WebDriver BiDi Protocol - Enables bidirectional communication via WebSocket for real-time events and server-initiated messages. Bindings detect BiDi support through the webSocketUrl capability and create specialized bridges (BiDiBridge in Ruby, BiDi in Java) for enhanced functionality.

Browser-Specific Extensions

Each binding provides browser-specific options and features:

Chrome/Chromium - CDP support, casting, network conditions, permissions
Firefox - Profile management, extensions, full-page screenshots
Safari - Debugging, permissions
.NET - Strong typing, async/await patterns

Ecosystem Integration

Selenium Manager (Rust) - Automatically downloads and manages browser drivers across all language bindings, eliminating manual driver setup.

Package Distribution - Bindings are published to language-specific repositories (Maven Central, PyPI, npm, RubyGems, NuGet) with consistent versioning.

Testing Frameworks - Each binding integrates with popular testing frameworks (JUnit, pytest, Mocha, RSpec, xUnit) through adapters and utilities.

Development Workflow

The repository uses Bazel for unified builds across all languages. Each binding has:

BUILD.bazel files defining compilation targets
Language-specific dependency management (pom.xml, pyproject.toml, package.json, Gemfile, .csproj)
Comprehensive test suites validating protocol compliance
Documentation and examples in language conventions