LightGBM Wiki | Augment Code

Overview

Relevant Files

README.md
docs/Features.rst
docs/Quick-Start.rst

LightGBM (Light Gradient Boosting Machine) is a fast, distributed, and efficient gradient boosting framework developed by Microsoft. It uses tree-based learning algorithms and is designed for both speed and accuracy, making it ideal for large-scale machine learning tasks.

What is LightGBM?

LightGBM is a gradient boosting framework that builds an ensemble of decision trees sequentially, where each new tree corrects errors made by previous trees. Unlike traditional boosting methods, LightGBM employs histogram-based algorithms that bucket continuous feature values into discrete bins, dramatically reducing memory usage and training time while maintaining or improving accuracy.

Key Advantages

LightGBM offers several critical advantages over competing frameworks:

Faster training speed with histogram-based algorithms that reduce computational complexity from O(#data) to O(#bins)
Lower memory consumption by replacing continuous values with discrete bins and eliminating pre-sorting overhead
Better accuracy through leaf-wise (best-first) tree growth that optimizes loss reduction at each split
Distributed and GPU learning support for scaling to massive datasets
Categorical feature support without one-hot encoding, providing up to 8x speedup on high-cardinality features

Core Architecture

Loading diagram...

Optimization Techniques

Histogram-based Learning: Instead of sorting all data points, LightGBM constructs histograms of feature values. This reduces the cost of calculating split gains and enables histogram subtraction for further speedup.

Leaf-wise Tree Growth: Most boosting frameworks grow trees level-by-level (depth-wise), creating symmetric trees. LightGBM grows trees leaf-wise, always splitting the leaf with maximum loss reduction, resulting in deeper but more accurate trees.

Categorical Feature Handling: LightGBM sorts categorical features by their accumulated gradient values and finds optimal partitions in O(k * log(k)) time, avoiding the inefficiency of one-hot encoding.

Sparse Data Optimization: Sparse features require only O(2 * #non_zero_data) operations for histogram construction, making LightGBM efficient for sparse datasets.

Supported Tasks

LightGBM supports multiple machine learning tasks:

Binary classification
Multi-class classification
Regression
Ranking (LambdaRank)
Custom objectives and metrics

Language Support

LightGBM provides official packages for:

Python (pip, conda)
R (CRAN)
C/C++ (native library)
Java, Scala, Julia, Rust and other languages via community bindings

The framework is widely used in machine learning competitions and production systems, with numerous integrations available for AutoML, model serving, and deployment platforms.

Architecture & Core Components

Relevant Files

include/LightGBM/application.h
include/LightGBM/boosting.h
include/LightGBM/tree_learner.h
include/LightGBM/dataset.h
include/LightGBM/config.h
src/application/application.cpp
src/boosting/gbdt.h
src/boosting/gbdt.cpp

LightGBM's architecture follows a modular, layered design that separates concerns between data management, model training, and prediction. The system is built around four core components that work together to enable efficient gradient boosting.

Application Layer

The Application class serves as the main entry point and orchestrator. It manages the complete workflow: loading parameters from command-line arguments and config files, loading training and validation datasets, initializing the boosting engine, and executing either training or prediction tasks. The Run() method dispatches to the appropriate task handler based on configuration.

Data Management

The Dataset class is the central data structure, storing feature data organized into feature groups for efficient access patterns. It maintains:

Feature Groups: Features are grouped together to optimize memory layout and cache efficiency during tree learning
Metadata: Labels, weights, query boundaries (for ranking tasks), and initial scores
Bin Mappers: Convert continuous features into discrete bins for faster histogram construction

The Metadata class handles non-feature data including labels, sample weights, query information for LambdaRank, and initial scores. This separation allows flexible data loading and manipulation.

Boosting Engine

The Boosting interface defines the contract for boosting algorithms. The primary implementation is GBDT (Gradient Boosting Decision Trees), which orchestrates the iterative training process:

Initialization: Sets up tree learners, metrics, and objective functions
Iteration Loop: For each iteration, computes gradients/hessians, trains a new tree, and updates predictions
Validation: Evaluates metrics on validation sets and implements early stopping
Model Management: Handles model serialization, merging, and prediction

// Core training loop pattern
for (int iter = 0; iter &lt; num_iterations; ++iter) {
  objective_function_->GetGradients(scores, gradients, hessians);
  tree_learner_->Train(gradients, hessians);
  UpdateScore(tree);
  EvaluateMetrics();
}

Tree Learning

The TreeLearner interface abstracts tree construction. Implementations support:

Serial Learning: Single-machine tree building
Parallel Learning: Distributed tree construction across machines
GPU Learning: CUDA-accelerated tree building
Linear Trees: Alternative to decision trees for specific use cases

Tree learners receive gradients and hessians, then perform greedy leaf-wise tree growth by finding optimal splits that minimize the loss function.

Configuration System

The Config struct centralizes all hyperparameters and settings. It provides:

Parameter parsing from command-line and config files
Type-safe accessors for strings, integers, doubles, and booleans
Validation and constraint checking
Support for parameter aliases for backward compatibility

Data Flow

Loading diagram...

Key Design Patterns

Factory Pattern: Boosting::CreateBoosting() and TreeLearner::CreateTreeLearner() instantiate implementations based on configuration, enabling runtime selection of algorithms.

Strategy Pattern: Objective functions and metrics are pluggable, allowing different loss functions and evaluation criteria without modifying core training logic.

Template Methods: The GBDT class defines the training loop structure while delegating specific operations (tree building, gradient computation) to specialized components.

This modular architecture enables LightGBM to support diverse use cases—from single-machine training to distributed learning, CPU to GPU acceleration, and various tree-building strategies—while maintaining clean separation of concerns.

Tree Learning & Histogram Algorithms

Relevant Files

src/treelearner/serial_tree_learner.cpp
src/treelearner/feature_histogram.hpp
src/treelearner/feature_histogram.cpp
include/LightGBM/tree.h
src/io/bin.cpp
src/treelearner/leaf_splits.hpp
src/treelearner/data_partition.hpp

LightGBM uses a histogram-based approach to efficiently find optimal splits during tree learning. This strategy dramatically reduces memory usage and computation time compared to scanning all feature values.

Histogram Construction

The core workflow begins with FeatureHistogram, which aggregates gradient and Hessian statistics into discrete bins for each feature. During tree training, the system constructs histograms by iterating through data points and accumulating their gradients and Hessians into corresponding feature bins.

Key components:

FeatureMetainfo: Stores metadata for each feature including bin count, missing type, and monotone constraints
FeatureHistogram: Holds aggregated gradient/Hessian pairs for each bin
Histogram Pool: Caches histograms to avoid redundant computation across tree levels

The histogram construction process supports both floating-point and quantized (integer) gradients. Quantized gradients reduce memory footprint and enable faster computation through bit-level operations.

Split Finding Algorithm

Once histograms are constructed, FindBestSplitsFromHistograms scans each histogram to identify the optimal split threshold. The algorithm:

Iterates through histogram bins sequentially
Accumulates left-side gradients and Hessians
Computes right-side statistics by subtraction
Calculates split gain using the gain formula: gain = (grad_left² / (hess_left + lambda_l2)) + (grad_right² / (hess_right + lambda_l2)) - (grad_total² / (hess_total + lambda_l2))
Tracks the threshold with maximum gain

The algorithm respects constraints like min_data_in_leaf and min_sum_hessian_in_leaf to ensure valid splits.

Leaf Output Calculation

After identifying a split, leaf outputs are computed using CalculateSplittedLeafOutput, which applies regularization:

output = -sum_gradients / (sum_hessians + lambda_l2)

With L1 regularization, gradients are soft-thresholded. Path smoothing adjusts outputs based on parent node values for improved generalization.

Data Partitioning

DataPartition maintains the mapping of data points to leaves. When a split occurs, it repartitions data using the split threshold, efficiently reordering indices so each leaf's data is contiguous in memory.

Optimization Techniques

Histogram Subtraction: For parent-child leaf pairs, compute one histogram directly and derive the other by subtraction, saving ~50% computation
Quantized Gradients: Integer representation reduces memory and enables SIMD optimizations
Parallel Histogram Construction: Multiple threads build histograms for different features simultaneously
Bin Compression: Automatically selects appropriate bin widths (8, 16, or 32 bits) based on data range

Loading diagram...

This histogram-based approach enables LightGBM to train on datasets with millions of rows efficiently while maintaining competitive accuracy.

Boosting Algorithms

Relevant Files

src/boosting/gbdt.h
src/boosting/gbdt.cpp
src/boosting/dart.hpp
src/boosting/rf.hpp
src/boosting/bagging.hpp
src/boosting/goss.hpp
src/boosting/sample_strategy.cpp

LightGBM implements multiple boosting algorithms, each optimized for different use cases. The core architecture uses a base GBDT class that other algorithms extend, combined with pluggable sampling strategies.

Core Boosting Algorithms

GBDT (Gradient Boosting Decision Trees) is the foundation. It iteratively trains decision trees to minimize a loss function by fitting residuals. Each iteration computes gradients and hessians from the objective function, then trains a new tree to reduce the loss. Trees are added with a shrinkage rate (learning rate) to prevent overfitting.

DART (Dropouts meet Additive Regression Trees) extends GBDT by randomly dropping previously trained trees during training. This regularization technique reduces overfitting by forcing the model to learn from diverse tree combinations. Trees are weighted, and during each iteration, a subset is randomly dropped based on a configurable drop rate. The learning rate is adjusted based on the number of dropped trees.

Random Forest (RF) uses GBDT infrastructure but operates differently: it trains trees in parallel on random subsets of data and features, then averages predictions. Unlike GBDT, RF sets shrinkage_rate = 1.0 (no shrinkage) and uses average_output = true. It performs all boosting in initialization, then trains one tree per iteration.

Sampling Strategies

LightGBM decouples sampling from boosting via the SampleStrategy interface:

Bagging randomly samples data with replacement at each iteration. It supports balanced bagging for ranking tasks and query-based bagging for LambdaRank. Sampling frequency is controlled by bagging_freq and bagging_fraction.

GOSS (Gradient-based One-Side Sampling) keeps instances with large gradients (top instances) and randomly samples from instances with small gradients. This reduces data size while preserving training signal. GOSS cannot be combined with bagging and skips sampling for the first 1 / learning_rate iterations.

Architecture Flow

Loading diagram...

Key Implementation Details

Score Updates: The ScoreUpdater class maintains cumulative predictions for training and validation data, updated after each tree is trained.
Gradient Computation: Gradients and hessians are computed by the objective function based on current predictions. GOSS modifies gradient values for sampled instances.
Multi-class Support: num_tree_per_iteration trees are trained per iteration for multi-class problems, one per class.
GPU Support: GBDT supports CUDA acceleration for gradient/hessian computation and score updates via CUDAVector and cuda_score_updater.
Early Stopping: Validation metrics are tracked, and training stops if no improvement occurs within early_stopping_round iterations.

Data Loading & I/O

Relevant Files

include/LightGBM/dataset.h
include/LightGBM/dataset_loader.h
include/LightGBM/feature_group.h
src/io/dataset.cpp
src/io/dataset_loader.cpp

LightGBM's data loading pipeline transforms raw input files into optimized binary representations for efficient training. The system handles multiple input formats, supports distributed learning, and applies intelligent feature binning.

Core Components

Dataset Class stores the complete training data structure, including feature groups, metadata (labels, weights, queries), and bin mappings. It provides methods to push data incrementally or load from files, supporting both dense and sparse representations.

DatasetLoader orchestrates the loading process. It reads configuration parameters, parses input files, constructs bin mappers for feature discretization, and manages distributed data partitioning across machines.

Metadata manages non-feature data: labels for training targets, sample weights for weighted learning, query boundaries for ranking tasks, and initial scores for warm-starting models.

FeatureGroup organizes features into groups for efficient memory access and computation. Each group contains bin mappers (for value-to-bin conversion) and bin data structures (dense or sparse).

Data Loading Pipeline

Loading diagram...

Loading Modes

File-based Loading: Reads from text or binary files. The loader samples data to determine optimal bin boundaries, then extracts all features in a second pass. Supports header detection and column name mapping.

Binary Loading: Loads pre-serialized datasets for faster initialization. Includes schema validation to ensure compatibility with training configuration.

Streaming Loading: Accepts data incrementally via PushOneRow() or PushOneValue() APIs. Useful for online learning or real-time data ingestion. Requires manual FinishLoad() call or automatic completion when the last row is pushed.

Distributed Loading: Partitions data across machines by rank. Each machine loads its subset, and the loader handles query-level partitioning for ranking tasks.

Feature Binning

Features are discretized into bins to reduce memory usage and accelerate tree learning. The BinMapper class converts continuous values to bin indices. Categorical features map directly to bins, while numerical features use quantile-based boundaries.

Sparse features (with many missing values) use sparse bin storage. Dense features use dense arrays. The system automatically selects the optimal representation based on sparsity thresholds.

Metadata Handling

Labels, weights, and queries are loaded separately from features. Query boundaries define document groups for ranking tasks. Query weights are auto-calculated from sample weights and query boundaries. Initial scores allow warm-starting from a previous model.

Binary Serialization

Datasets can be saved to binary format for fast reloading. The SerializeReference() method saves only the schema (bin mappers, feature names), while SaveBinaryFile() includes all data. This enables efficient model deployment and distributed training setup.

Key Optimizations

Feature Grouping: Combines compatible features to reduce bin count and improve cache locality
Sparse Representation: Stores only non-zero values for sparse features
Parallel Parsing: Uses OpenMP to parse and extract features concurrently
Lazy Initialization: Defers expensive operations until data is complete

GPU Acceleration

Relevant Files

src/treelearner/cuda/cuda_single_gpu_tree_learner.hpp
src/treelearner/gpu_tree_learner.h
src/cuda/cuda_algorithms.cu
include/LightGBM/cuda/cuda_tree.hpp
src/treelearner/cuda/cuda_histogram_constructor.hpp
src/treelearner/cuda/cuda_best_split_finder.hpp
src/treelearner/cuda/cuda_data_partition.hpp

LightGBM supports GPU acceleration through two complementary backends: CUDA for NVIDIA GPUs and OpenCL for broader GPU compatibility. GPU acceleration dramatically speeds up the most computationally intensive operations in tree learning: histogram construction, split finding, and data partitioning.

CUDA Backend (CUDASingleGPUTreeLearner)

The CUDA implementation provides the highest performance for NVIDIA GPUs. It accelerates the entire tree training pipeline by offloading key operations to the GPU:

Histogram Construction: Computes feature histograms in parallel across thousands of GPU threads, aggregating gradients and Hessians into bins
Split Finding: Scans histograms on GPU to identify optimal split thresholds using gain calculations
Data Partitioning: Repartitions data indices after each split using GPU kernels
Leaf Value Computation: Calculates leaf outputs directly on GPU using accumulated statistics

The CUDASingleGPUTreeLearner class manages GPU memory, launches kernels, and synchronizes with CPU-side tree construction logic. It maintains separate components for each operation: CUDAHistogramConstructor, CUDABestSplitFinder, CUDADataPartition, and CUDALeafSplits.

OpenCL Backend (GPUTreeLearner)

The OpenCL implementation provides broader GPU support across different vendors. It focuses primarily on histogram construction, which is the bottleneck in tree learning:

Kernel Compilation: Compiles OpenCL kernels at runtime with bin-specific optimizations (16, 64, or 256 bins)
Workgroup Tuning: Dynamically selects the number of workgroups per feature based on leaf size for optimal occupancy
Asynchronous Operations: Overlaps data transfers with kernel execution using pinned memory and async queues
Feature Masking: Supports selective feature processing to skip unused features

GPU Memory Management

Both backends use efficient memory allocation strategies:

Pinned Memory: Enables faster CPU-GPU transfers for gradients, Hessians, and indices
Device Buffers: Pre-allocates GPU memory for histograms, feature data, and intermediate results
Stream Management: Uses CUDA streams to parallelize independent operations

Quantized Gradient Training

GPU acceleration supports quantized (integer) gradients to reduce memory bandwidth:

Gradients and Hessians are discretized to 16-bit integers
Histogram construction uses reduced precision arithmetic
Split finding applies scaling factors to recover original gain values
Significantly reduces GPU memory usage and improves cache efficiency

Data Flow

Loading diagram...

Performance Considerations

GPU acceleration provides 10-50x speedups on large datasets, but overhead becomes significant on small datasets. The system automatically selects GPU or CPU execution based on data size and feature count. Categorical features require additional GPU memory for bitset construction, and feature selection by node can reduce GPU utilization.

Distributed & Parallel Learning

Relevant Files

src/treelearner/parallel_tree_learner.h
src/treelearner/data_parallel_tree_learner.cpp
src/treelearner/feature_parallel_tree_learner.cpp
src/treelearner/voting_parallel_tree_learner.cpp
src/network/network.cpp
include/LightGBM/network.h

LightGBM supports three distributed learning algorithms to scale training across multiple machines. Each strategy optimizes for different data and feature distributions.

Feature Parallel Learning

Feature parallel partitions features across machines. Each worker finds the best split on its local feature set, then machines communicate to identify the global best split.

Key characteristics:

Each machine holds the full dataset but processes different features
Communication cost: O(2 * #features * #bins) using collective algorithms
Best for: Small datasets with many features
Implementation: FeatureParallelTreeLearner synchronizes split information via Network::Allreduce()

// Feature distribution across machines
std::vector<std::vector<int>> feature_distribution(num_machines_);
// Each machine finds local best split, then syncs globally
SyncUpGlobalBestSplit(input_buffer, output_buffer, &smaller_best_split, &larger_best_split);

Data Parallel Learning

Data parallel distributes data across machines. Workers construct local histograms, then use Reduce Scatter to merge histograms for different (non-overlapping) features across machines.

Key characteristics:

Each machine processes different data samples
Communication cost: O(0.5 * #features * #bins) via Reduce Scatter
Best for: Large datasets with fewer features
Implementation: DataParallelTreeLearner uses Network::ReduceScatter() for efficient histogram aggregation

// Reduce scatter merges histograms for different features
Network::ReduceScatter(input_buffer, reduce_scatter_size, sizeof(hist_t),
                       block_start, block_len, output_buffer, &HistogramSumReducer);

Voting Parallel Learning

Voting parallel further reduces communication by using two-stage voting to select only the most promising features for histogram aggregation.

Key characteristics:

Combines data parallelism with feature selection via voting
Communication cost: Constant, independent of feature count
Best for: Large datasets with many features
Implementation: VotingParallelTreeLearner performs local voting, then global voting to select top features

// Global voting selects top features based on weighted gain
GlobalVoting(leaf_idx, local_splits, &selected_features);
// Only aggregate histograms for selected features
CopyLocalHistogram(smaller_top_features, larger_top_features);

Network Communication Layer

The Network class provides collective communication primitives:

Allreduce: Combines reduce scatter and allgather for small data; uses allgather alone for tiny payloads
Allgather: Implements Bruck algorithm (O(log n) rounds), recursive doubling, or ring topology
ReduceScatter: Uses recursive halving or ring algorithm to distribute partial results

Loading diagram...

Configuration

Select the parallel strategy via the tree_learner parameter:

serial: Single machine (default)
feature: Feature parallel
data: Data parallel
voting: Voting parallel

The Network::Init() method initializes communication infrastructure with machine rank and count, supporting both built-in socket communication and custom external communication backends.

Python API & Language Bindings

Relevant Files

python-package/lightgbm/basic.py
python-package/lightgbm/engine.py
python-package/lightgbm/sklearn.py
python-package/lightgbm/callback.py
python-package/lightgbm/libpath.py

LightGBM provides a comprehensive Python API with multiple interfaces for different use cases. The API is built as a ctypes-based wrapper around the C library, enabling high-performance gradient boosting while maintaining Pythonic interfaces.

Core Architecture

The Python API consists of four main layers:

C Library Binding (libpath.py, basic.py): Uses ctypes to load and interact with the compiled C library (lib_lightgbm.{dll,dylib,so}). The _LIB object provides direct access to C API functions.
Low-Level API (basic.py): Implements Dataset and Booster classes that wrap C handles. These classes manage memory and provide direct control over training and prediction.
Training API (engine.py): Provides train() and cv() functions for model training with support for custom objectives, evaluation metrics, and callbacks.
Scikit-learn API (sklearn.py): Offers LGBMClassifier, LGBMRegressor, and LGBMRanker classes compatible with scikit-learn's estimator interface.

Data Handling

The Dataset class preprocesses raw data into LightGBM's internal histogram-based representation. It supports multiple input formats:

NumPy arrays and pandas DataFrames
Sparse matrices (CSR, CSC formats)
PyArrow Tables
Direct file loading

Key features include automatic handling of missing values, categorical features, and feature binning. The free_raw_data parameter controls whether raw input is retained after preprocessing.

Training Workflow

import lightgbm as lgb

# Create datasets
train_data = lgb.Dataset(X_train, label=y_train)
valid_data = lgb.Dataset(X_valid, label=y_valid, reference=train_data)

# Train with custom callbacks
booster = lgb.train(
    params={'objective': 'binary', 'metric': 'auc'},
    train_set=train_data,
    num_boost_round=100,
    valid_sets=[valid_data],
    callbacks=[lgb.early_stopping(10), lgb.log_evaluation()]
)

# Make predictions
predictions = booster.predict(X_test)

Callbacks System

Callbacks enable monitoring and control during training. The CallbackEnv dataclass provides access to the model, parameters, iteration count, and evaluation results. Built-in callbacks include:

early_stopping(): Stop training when validation metric plateaus
log_evaluation(): Print evaluation metrics at regular intervals
record_evaluation(): Store evaluation history in a dictionary
reset_parameter(): Dynamically adjust hyperparameters during training

Custom callbacks implement __call__(env: CallbackEnv) to access training state.

Scikit-learn Integration

The sklearn API provides familiar interfaces for model selection and pipelines:

from lightgbm import LGBMClassifier

clf = LGBMClassifier(n_estimators=100, learning_rate=0.05)
clf.fit(X_train, y_train, eval_set=[(X_valid, y_valid)])
predictions = clf.predict(X_test)

This layer handles parameter validation, data preprocessing, and cross-validation compatibility while delegating training to the core train() function.

Type System

The API uses comprehensive type hints with custom type aliases for common patterns:

_LGBM_TrainDataType: Input data formats (arrays, DataFrames, sparse matrices)
_LGBM_CustomObjectiveFunction: Custom loss function signature
_LGBM_CustomEvalFunction: Custom metric function signature

These types ensure type safety and enable IDE autocompletion across the API.

Objectives & Metrics

Relevant Files

include/LightGBM/objective_function.h
include/LightGBM/metric.h
src/objective/objective_function.cpp
src/metric/metric.cpp
src/objective/regression_objective.hpp
src/objective/binary_objective.hpp
src/objective/rank_objective.hpp
src/metric/regression_metric.hpp

LightGBM separates the concepts of objectives (loss functions for training) and metrics (evaluation functions for monitoring). This design allows flexible combinations of training objectives with evaluation metrics.

Objective Functions

Objective functions define the loss to minimize during tree learning. Each objective computes gradients and hessians (first and second derivatives) that guide the boosting algorithm.

Key Interface Methods:

GetGradients() — Computes gradients and hessians for a batch of predictions
IsConstantHessian() — Optimization flag for constant hessian objectives
IsRenewTreeOutput() — Indicates if leaf values need post-processing
ConvertOutput() — Transforms raw scores to final predictions

Supported Objectives:

Regression: regression (L2), regression_l1 (L1), quantile, huber, fair, poisson, mape, gamma, tweedie
Binary Classification: binary (logistic loss)
Multiclass: multiclass (softmax), multiclassova (one-vs-all)
Ranking: lambdarank (NDCG-based), rank_xendcg (XE-NDCG)
Cross-Entropy: cross_entropy, cross_entropy_lambda

Metrics

Metrics evaluate model performance on validation/test sets. Unlike objectives, metrics are not used for training—they provide interpretable performance indicators.

Key Interface Methods:

Init() — Initializes metric with metadata (labels, weights, query info)
Eval() — Computes metric value(s) given predictions
GetName() — Returns metric name(s) for logging
factor_to_bigger_better() — Returns 1.0 if higher is better, -1.0 if lower is better

Supported Metrics:

Regression: l2, rmse, l1, quantile, huber, fair, poisson, mape, gamma, gamma_deviance, tweedie, r2
Binary: binary_logloss, binary_error, auc, average_precision, auc_mu
Multiclass: multi_logloss, multi_error
Ranking: ndcg, map, ndcg@k (with position parameter)
Cross-Entropy: cross_entropy, cross_entropy_lambda, kullback_leibler

Factory Pattern

Both objectives and metrics use factory methods for instantiation:

ObjectiveFunction* obj = ObjectiveFunction::CreateObjectiveFunction("regression", config);
Metric* metric = Metric::CreateMetric("rmse", config);

The factory automatically selects CPU or CUDA implementations based on device configuration. CUDA implementations exist for most common objectives and metrics, with automatic fallback to CPU for unsupported combinations.

Gradient Computation

Objectives compute gradients and hessians in parallel:

void GetGradients(const double* score, score_t* gradients, score_t* hessians) const;

For ranking tasks, a specialized overload handles query-level sampling:

void GetGradients(const double* score, data_size_t num_sampled_queries,
                  const data_size_t* sampled_query_indices,
                  score_t* gradients, score_t* hessians) const;

DCG Calculator

For ranking metrics (NDCG, MAP), LightGBM provides a DCGCalculator utility that computes Discounted Cumulative Gain scores. It supports:

Configurable label gains (default: 2^label - 1)
Position-specific discounts
Batch DCG computation at multiple positions