Overview
Relevant Files
README.rstsklearn/init.pydoc/getting_started.rstdoc/user_guide.rstpyproject.toml
scikit-learn is a mature, production-ready Python machine learning library built on NumPy, SciPy, and joblib. It provides a comprehensive suite of supervised and unsupervised learning algorithms, along with tools for model evaluation, selection, and data preprocessing. The library emphasizes a consistent API design where all estimators follow the same fit-predict pattern.
Core Purpose
scikit-learn aims to make machine learning accessible and practical for both researchers and practitioners. It integrates classical ML algorithms into the scientific Python ecosystem, offering simple yet efficient solutions for learning problems across science and engineering domains.
Key Features
- Unified API: All estimators inherit from
BaseEstimator, providing consistentfit(),predict(), andtransform()methods - Comprehensive Algorithms: Classification, regression, clustering, dimensionality reduction, and feature selection
- Data Preprocessing: Transformers for scaling, encoding, imputation, and feature engineering
- Model Selection: Cross-validation, hyperparameter tuning, and evaluation metrics
- Pipelines: Chain preprocessing and estimators to prevent data leakage and simplify workflows
- Inspection Tools: Feature importance, partial dependence, and model introspection utilities
Architecture Overview
Loading diagram...
Module Organization
The library is organized into functional submodules accessible via lazy imports:
cluster- Clustering algorithms (KMeans, DBSCAN, etc.)ensemble- Ensemble methods (RandomForest, GradientBoosting, etc.)linear_model- Linear regression and classificationtree- Decision treessvm- Support Vector Machinespreprocessing- Data transformation and scalingmodel_selection- Cross-validation and hyperparameter tuningmetrics- Evaluation metrics and scoring functionsdecomposition- Dimensionality reduction (PCA, NMF, etc.)neighbors- Nearest neighbors methodsneural_network- Multi-layer perceptron models
Dependencies
Required: Python (>=3.11), NumPy (>=1.24.1), SciPy (>=1.10.0), joblib (>=1.3.0), threadpoolctl (>=3.2.0)
Optional: Matplotlib (plotting), pandas (data handling), scikit-image (image processing)
Development Status
scikit-learn is actively maintained by a volunteer team and is in production/stable status. The codebase emphasizes code quality, comprehensive testing, and backward compatibility while continuously adding new algorithms and features.
Architecture & Estimator Interface
Relevant Files
sklearn/base.pysklearn/pipeline.pysklearn/utils/validation.pysklearn/utils/_param_validation.pysklearn/utils/metadata_routing.py
Core Estimator Architecture
Scikit-learn's architecture is built on a hierarchy of base classes that define the contract for all estimators. The BaseEstimator class is the foundation, providing parameter management, serialization, and validation capabilities. All estimators inherit from this class and must follow the scikit-learn API convention: estimators are objects with fit() and predict() (or transform()) methods.
The estimator hierarchy uses mixin classes to specify estimator type and behavior:
ClassifierMixin– Addsscore()method (accuracy by default) and marks estimator as a classifierRegressorMixin– Addsscore()method (R² by default) and marks estimator as a regressorTransformerMixin– Addsfit_transform()method and output formatting capabilitiesClusterMixin– Addsfit_predict()method for clustering algorithmsOutlierMixin– Addsfit_predict()for outlier detection
Parameter Management
BaseEstimator provides two critical methods for parameter handling:
get_params(deep=True)– Returns all constructor parameters as a dictionary. Withdeep=True, recursively retrieves nested estimator parameters using__notation (e.g.,pipeline__step__param)set_params(**params)– Sets parameters on the estimator and nested estimators. Enables grid search and hyperparameter tuning
Parameters are introspected from the __init__ signature, so all estimator parameters must be explicit keyword arguments (no *args or **kwargs).
Pipeline & Composition
The Pipeline class chains transformers and a final estimator sequentially. Intermediate steps must implement fit() and transform(), while the final step only needs fit(). Pipelines support:
- Caching – Intermediate transformer results can be cached via the
memoryparameter - Parameter routing – Metadata (sample weights, groups) can be routed to specific steps
- Lazy evaluation – Methods like
fit_predict()andfit_transform()optimize the computation graph
FeatureUnion combines multiple transformers in parallel, concatenating their outputs.
Validation & Constraints
Parameter validation occurs through _parameter_constraints class attributes. The validate_parameter_constraints() function checks types and values against constraint specifications:
_parameter_constraints = {
"C": [Interval(Real, 0, None, closed="neither")],
"kernel": [StrOptions({"linear", "rbf", "poly"})],
"random_state": ["random_state"],
}
Data validation uses check_array(), check_is_fitted(), and feature name validation to ensure inputs meet requirements.
Metadata Routing
Modern scikit-learn supports metadata routing to safely pass metadata (sample weights, groups, etc.) through pipelines and meta-estimators. The MetadataRouter and _MetadataRequester classes manage this flow, allowing estimators to declare which metadata they consume and how it should be routed.
Loading diagram...
Estimator Lifecycle
- Initialization – Constructor sets parameters; no data is processed
- Fitting –
fit()learns from training data; fitted attributes end with_ - Prediction/Transform –
predict()ortransform()applies learned model to new data - Validation –
check_is_fitted()ensures estimator has been fitted before prediction
This pull request includes code written with the assistance of AI. The code has not yet been reviewed by a human.
Supervised Learning Algorithms
Relevant Files
sklearn/linear_model/init.pysklearn/tree/init.pysklearn/ensemble/init.pysklearn/svm/init.pysklearn/neighbors/init.pysklearn/neural_network/init.pysklearn/gaussian_process/init.pysklearn/naive_bayes.pysklearn/discriminant_analysis.py
Supervised learning algorithms learn patterns from labeled training data to make predictions on new, unseen data. Scikit-learn provides a comprehensive collection of algorithms for both classification and regression tasks, organized into distinct modules based on their underlying mathematical principles.
Linear Models
Linear models form the foundation of many machine learning applications. They assume a linear relationship between input features and the target variable.
Regression: LinearRegression fits ordinary least squares, minimizing the residual sum of squares. Ridge and Lasso add L2 and L1 regularization respectively to prevent overfitting. ElasticNet combines both penalties. Specialized variants like BayesianRidge, HuberRegressor, and QuantileRegressor handle different data distributions and outliers.
Classification: LogisticRegression performs binary and multiclass classification using regularized logistic loss. Perceptron and PassiveAggressiveClassifier offer online learning capabilities. SGDClassifier and SGDRegressor enable stochastic gradient descent optimization for large datasets.
Tree-Based Methods
Decision trees recursively partition the feature space, creating interpretable models.
DecisionTreeClassifier and DecisionTreeRegressor support multiple splitting criteria (Gini impurity, entropy, MSE). ExtraTreeClassifier and ExtraTreeRegressor use random thresholds for faster training. Trees are prone to overfitting but serve as building blocks for ensemble methods.
Ensemble Methods
Ensemble methods combine multiple base learners to improve generalization.
Bagging: BaggingClassifier and BaggingRegressor train independent models on random subsets. RandomForestClassifier and RandomForestRegressor are specialized bagging ensembles using decision trees with feature subsampling.
Boosting: AdaBoostClassifier and AdaBoostRegressor sequentially train models, emphasizing misclassified samples. GradientBoostingClassifier and GradientBoostingRegressor fit trees to residuals. HistGradientBoostingClassifier and HistGradientBoostingRegressor use histogram-based learning for efficiency.
Stacking & Voting: StackingClassifier and StackingRegressor train meta-learners on base model predictions. VotingClassifier and VotingRegressor combine predictions via averaging or majority voting.
Support Vector Machines
SVMs find optimal decision boundaries by maximizing the margin between classes.
SVC and SVR support kernel methods (linear, RBF, polynomial). LinearSVC and LinearSVR are optimized for linear kernels. NuSVC and NuSVC use alternative parameterizations. OneClassSVM detects outliers.
Nearest Neighbors
Instance-based methods classify by finding similar training examples.
KNeighborsClassifier and KNeighborsRegressor use k-nearest neighbors voting. RadiusNeighborsClassifier and RadiusNeighborsRegressor use fixed-radius neighborhoods. Efficient spatial indexing via KDTree and BallTree accelerates queries.
Neural Networks
MLPClassifier and MLPRegressor implement multi-layer perceptrons with backpropagation. They support multiple activation functions and solvers (SGD, Adam, L-BFGS) for flexible deep learning on tabular data.
Gaussian Processes
GaussianProcessClassifier and GaussianProcessRegressor provide probabilistic predictions with uncertainty estimates. They use kernel functions to define covariance structures and support various kernels via the kernels module.
Probabilistic Classifiers
GaussianNB, MultinomialNB, BernoulliNB, and CategoricalNB implement Naive Bayes variants assuming feature independence. LinearDiscriminantAnalysis and QuadraticDiscriminantAnalysis model class-conditional distributions using covariance estimation.
Loading diagram...
All estimators follow scikit-learn's consistent API: fit(X, y) for training, predict(X) for inference, and score(X, y) for evaluation. Cross-validation utilities in model_selection help select optimal hyperparameters and assess generalization performance.
Unsupervised Learning & Decomposition
Relevant Files
sklearn/cluster/init.pysklearn/decomposition/init.pysklearn/manifold/init.pysklearn/mixture/init.pysklearn/covariance/init.py
Unsupervised learning discovers patterns in unlabeled data through clustering, decomposition, and manifold learning. scikit-learn provides a comprehensive toolkit for these tasks, organized into five main modules.
Clustering Algorithms
Clustering partitions data into groups based on similarity. The module offers diverse algorithms suited to different data geometries and scales:
- K-Means & Variants: Fast, scalable centroid-based clustering.
KMeansworks well for convex clusters;MiniBatchKMeanshandles large datasets;BisectingKMeansuses hierarchical bisection for efficiency. - Hierarchical Clustering:
AgglomerativeClusteringbuilds dendrograms via bottom-up merging with configurable linkage criteria (Ward, complete, average).FeatureAgglomerationclusters features instead of samples. - Density-Based Methods:
DBSCANfinds arbitrary-shaped clusters and identifies outliers;OPTICSextends this with multi-scale analysis;HDBSCANadds hierarchical structure. - Graph-Based:
SpectralClusteringuses normalized Laplacian eigenvectors for non-convex clusters (e.g., nested circles). - Other Approaches:
MeanShiftfinds modes in the feature space;AffinityPropagationuses message passing;Birchprovides online clustering with memory efficiency.
Matrix Decomposition
Decomposition factorizes data into interpretable components for dimensionality reduction and feature extraction:
- PCA Family:
PCAperforms linear dimensionality reduction via SVD;IncrementalPCAprocesses data in batches;KernelPCAenables non-linear reduction through kernels. - Non-Negative Factorization:
NMFandMiniBatchNMFdecompose into non-negative factors, useful for topic modeling and source separation. - Independent Components:
FastICAextracts statistically independent sources from mixed signals. - Sparse Methods:
SparsePCAandDictionaryLearninglearn sparse representations;SparseCoderencodes data using learned dictionaries. - Probabilistic Models:
FactorAnalysisassumes Gaussian latent factors;LatentDirichletAllocationmodels discrete topics in text.
Manifold Learning
Manifold learning uncovers low-dimensional structure in high-dimensional data by preserving local or global geometry:
- Distance-Preserving:
MDSandClassicalMDSpreserve pairwise distances;Isomappreserves geodesic distances along manifolds. - Neighborhood-Based:
LocallyLinearEmbeddingreconstructs each point from local neighbors;SpectralEmbeddinguses graph Laplacian eigenvectors. - Probabilistic Embedding:
TSNEminimizes Kullback-Leibler divergence for visualization, excelling at revealing local cluster structure.
Mixture Models
Probabilistic clustering via Gaussian mixtures:
GaussianMixturefits soft clusters with EM algorithm;BayesianGaussianMixtureadds Bayesian priors for automatic model selection.
Covariance Estimation
Robust covariance and precision matrix estimation for Gaussian graphical models:
EmpiricalCovariancecomputes standard covariance;ShrunkCovariance,LedoitWolf, andOASapply shrinkage for stability.GraphicalLassolearns sparse precision matrices via L1 regularization.MinCovDetandEllipticEnvelopedetect outliers using robust covariance.
Loading diagram...
Choosing an Algorithm
Select based on your data and goal: use K-Means for speed on large datasets with convex clusters; DBSCAN for arbitrary shapes and outlier detection; hierarchical clustering for dendrograms; manifold learning for visualization; NMF for interpretable non-negative factors; PCA for fast linear reduction.
This pull request includes code written with the assistance of AI. The code has not yet been reviewed by a human.
Preprocessing & Feature Engineering
Relevant Files
sklearn/preprocessing/init.pysklearn/preprocessing/_data.pysklearn/feature_extraction/init.pysklearn/feature_selection/init.pysklearn/impute/init.pysklearn/compose/init.pysklearn/pipeline.py
Preprocessing and feature engineering are foundational steps in machine learning pipelines. scikit-learn provides a comprehensive suite of tools organized into five main modules that handle data transformation, feature extraction, feature selection, missing value imputation, and pipeline composition.
Data Scaling & Normalization
The preprocessing module offers multiple scalers for normalizing feature ranges. StandardScaler applies z-score normalization (mean=0, std=1), while MinMaxScaler rescales features to a fixed range like [0, 1]. RobustScaler uses median and interquartile range, making it resistant to outliers. MaxAbsScaler scales by the maximum absolute value, preserving sparsity. Normalizer applies L1 or L2 normalization per sample. QuantileTransformer maps features to uniform or normal distributions, useful for skewed data.
Encoding & Discretization
Categorical features require encoding before model training. OneHotEncoder converts categorical variables into binary columns, with options for handling unknown categories and sparse output. OrdinalEncoder maps categories to integers, suitable for ordinal data. LabelEncoder encodes target labels. KBinsDiscretizer bins continuous features into discrete intervals using equal-width, equal-frequency, or k-means strategies. TargetEncoder encodes categories based on target statistics, reducing dimensionality while capturing predictive information.
Feature Extraction
The feature_extraction module handles raw data conversion. TfidfVectorizer and CountVectorizer (in the text submodule) extract features from text documents. DictVectorizer converts dictionaries to sparse matrices. FeatureHasher uses hashing for memory-efficient feature extraction. Image utilities like img_to_graph and grid_to_graph extract spatial features from images.
Missing Value Imputation
The impute module provides strategies for handling missing data. SimpleImputer fills missing values using mean, median, most frequent, or constant strategies. KNNImputer uses k-nearest neighbors to estimate missing values, preserving local structure. MissingIndicator creates binary features indicating missing values, useful for capturing missingness patterns.
Feature Selection
The feature_selection module reduces dimensionality by identifying relevant features. Univariate methods like SelectKBest and SelectPercentile rank features using statistical tests (f_classif, f_regression, chi2, mutual_info_classif). VarianceThreshold removes low-variance features. RFE and RFECV recursively eliminate features based on model weights. SelectFromModel selects features with importance above a threshold. SequentialFeatureSelector uses forward or backward selection.
Pipeline Composition
ColumnTransformer applies different transformers to different column subsets, essential for heterogeneous data. Pipeline chains transformers sequentially, ensuring fit/transform consistency and preventing data leakage. TransformedTargetRegressor applies transformations to target variables. Helper functions like make_column_transformer and make_pipeline simplify construction.
Loading diagram...
Design Patterns
All transformers inherit from TransformerMixin and BaseEstimator, implementing fit(), transform(), and fit_transform() methods. This consistent interface enables composition in pipelines. Transformers support both dense and sparse matrices. The _fit_context decorator manages state validation. Metadata routing enables parameter passing through pipelines for cross-validation and sample weighting.
Model Selection & Evaluation
Relevant Files
sklearn/model_selection/init.pysklearn/model_selection/_search.pysklearn/model_selection/_split.pysklearn/model_selection/_validation.pysklearn/metrics/init.pysklearn/metrics/_scorer.pysklearn/inspection/init.pysklearn/calibration.py
Model selection and evaluation are critical for building robust machine learning systems. scikit-learn provides comprehensive tools for hyperparameter tuning, cross-validation, performance metrics, and model inspection.
Cross-Validation Strategies
Cross-validation splits data into multiple train-test folds to assess model generalization. The framework supports various splitting strategies:
- K-Fold variants:
KFold,StratifiedKFold(preserves class distribution),RepeatedKFold - Group-aware splits:
GroupKFold,StratifiedGroupKFoldfor grouped data - Leave-One-Out:
LeaveOneOut,LeaveOneGroupOutfor exhaustive evaluation - Shuffle splits:
ShuffleSplit,StratifiedShuffleSplitfor random partitions - Time series:
TimeSeriesSplitfor temporal data respecting order
Use cross_val_score() for quick evaluation or cross_validate() for multiple metrics and detailed results.
Hyperparameter Tuning
Two primary search strategies optimize model parameters:
Grid Search (GridSearchCV): Exhaustively evaluates all parameter combinations. Best for small parameter spaces.
Randomized Search (RandomizedSearchCV): Samples random combinations. More efficient for large spaces.
Both support parallel execution via n_jobs, early stopping, and custom scoring functions. Results include best parameters, cross-validation scores, and fitted estimators.
Performance Metrics
Classification metrics include accuracy, precision, recall, F1-score, ROC-AUC, and confusion matrices. Regression metrics cover MSE, MAE, R², and specialized losses. Clustering metrics evaluate unsupervised quality (silhouette, Davies-Bouldin, adjusted Rand index).
Use make_scorer() to wrap custom metrics for use in search and validation functions.
Model Inspection
Understand model decisions through:
- Permutation importance: Feature importance via prediction degradation
- Partial dependence: Feature-target relationships
- Decision boundaries: Visual classification regions
- Calibration: Probability reliability assessment via
CalibratedClassifierCV
from sklearn.model_selection import cross_val_score, GridSearchCV
from sklearn.metrics import make_scorer, f1_score
# Quick evaluation
scores = cross_val_score(model, X, y, cv=5, scoring='f1')
# Hyperparameter tuning
grid = GridSearchCV(
model,
{'C': [0.1, 1, 10]},
cv=5,
scoring=make_scorer(f1_score)
)
grid.fit(X, y)
This pull request includes code written with the assistance of AI. The code has not yet been reviewed by a human.
Utilities & Infrastructure
Relevant Files
sklearn/utils/init.pysklearn/utils/validation.pysklearn/utils/_param_validation.pysklearn/_config.pysklearn/exceptions.pysklearn/datasets/init.py
Scikit-learn provides a comprehensive utilities and infrastructure layer that underpins the entire library. This layer handles data validation, configuration management, exception handling, and dataset loading—critical functions that ensure consistency and reliability across all estimators and algorithms.
Data Validation & Input Checking
The validation module (sklearn/utils/validation.py) is the backbone of scikit-learn's input handling. Key functions include:
check_array()- Validates and converts input arrays, ensuring they meet requirements (2D, finite values, correct dtype, etc.)check_X_y()- Validates feature matrix X and target y together, enforcing consistent length and proper shapescolumn_or_1d()- Ensures 1D arrays or column vectors for target variablesassert_all_finite()- Checks for NaN and infinite valuescheck_consistent_length()- Verifies all inputs have matching sample counts
These functions accept parameters like accept_sparse, dtype, ensure_2d, and ensure_min_samples to customize validation behavior.
Parameter Validation & Constraints
The _param_validation.py module provides a decorator-based system for validating function and method parameters:
@validate_params- Decorator that enforces parameter type and value constraints- Constraint types -
Interval,StrOptions,Options,HasMethods,MissingValues, and more InvalidParameterError- Custom exception for invalid parameters
This system enables early error detection with clear, user-friendly messages.
Global Configuration
The _config.py module manages scikit-learn's global settings via thread-local storage:
get_config()- Retrieve current configurationset_config()- Modify global settings (e.g.,assume_finite,working_memory,display)config_context()- Context manager for temporary configuration changes
Key settings include transform_output (pandas/polars support), enable_metadata_routing, and skip_parameter_validation.
Exception Hierarchy
Custom exceptions in sklearn/exceptions.py provide semantic error handling:
NotFittedError- Raised when using unfitted estimatorsConvergenceWarning- Convergence issues in iterative algorithmsDataConversionWarning- Implicit type conversionsEfficiencyWarning- Inefficient computation patternsUnsetMetadataPassedError- Metadata routing violations
Dataset Loading & Generation
The sklearn/datasets module provides utilities for loading real and synthetic datasets:
- Loaders -
load_iris(),load_digits(),load_wine(),load_breast_cancer() - Fetchers -
fetch_openml(),fetch_california_housing(),fetch_20newsgroups() - Generators -
make_classification(),make_regression(),make_blobs(),make_moons()
Utility Functions
Additional utilities in sklearn/utils/__init__.py:
Bunch- Dictionary-like object for dataset containersget_tags()- Retrieve estimator capability tagscompute_class_weight()- Balance class weights for imbalanced dataresample(),shuffle()- Data manipulation helpersestimator_html_repr()- HTML representation for Jupyter notebooks
Loading diagram...
This infrastructure ensures that all estimators operate on clean, validated data with consistent configuration, making scikit-learn robust and user-friendly.
Specialized Techniques & Extensions
Relevant Files
sklearn/multiclass.pysklearn/multioutput.pysklearn/semi_supervised/init.pysklearn/semi_supervised/_label_propagation.pysklearn/semi_supervised/_self_training.pysklearn/frozen/init.pysklearn/experimental/init.py
Multiclass Classification Strategies
Scikit-learn provides three meta-estimators for extending binary classifiers to multiclass problems, each with distinct trade-offs:
One-vs-Rest (OvR) trains n_classes binary classifiers, where each classifier distinguishes one class from all others. This is the most commonly used strategy due to its computational efficiency (O(n_classes) complexity) and interpretability. Each class has exactly one dedicated classifier, making it easy to inspect class-specific patterns.
One-vs-One (OvO) trains n_classes * (n_classes - 1) / 2 binary classifiers, one for each class pair. At prediction time, the class receiving the most votes wins. While slower (O(n_classes²) complexity), OvO is advantageous for kernel-based algorithms that don't scale well with sample size, since each binary problem uses only a subset of data.
Error-Correcting Output Codes (ECOC) represents each class as a binary code and trains one classifier per bit. The code_size parameter controls the number of classifiers: values between 0 and 1 compress the model, while values > 1 add redundancy for error correction. This provides flexible trade-offs between model size and robustness.
from sklearn.multiclass import OneVsRestClassifier, OneVsOneClassifier, OutputCodeClassifier
from sklearn.svm import LinearSVC
# One-vs-Rest: n_classes classifiers
ovr = OneVsRestClassifier(LinearSVC(random_state=0))
# One-vs-One: n_classes * (n_classes - 1) / 2 classifiers
ovo = OneVsOneClassifier(LinearSVC(random_state=0))
# Error-Correcting Output Codes: code_size * n_classes classifiers
ecoc = OutputCodeClassifier(LinearSVC(random_state=0), code_size=1.5)
Multi-Output Learning
The multioutput module extends single-output estimators to handle multiple targets simultaneously:
MultiOutputClassifier and MultiOutputRegressor fit one independent estimator per target variable. This is useful when targets are unrelated and can be predicted independently.
ClassifierChain and RegressorChain model target dependencies by training estimators sequentially, where each estimator uses previous predictions as additional features. The order parameter controls the sequence; 'random' or custom arrays enable experimentation with different orderings.
from sklearn.multioutput import MultiOutputClassifier, ClassifierChain
from sklearn.ensemble import RandomForestClassifier
# Independent targets
multi = MultiOutputClassifier(RandomForestClassifier())
# Dependent targets with learned ordering
chain = ClassifierChain(RandomForestClassifier(), order='random')
Semi-Supervised Learning
Semi-supervised algorithms leverage unlabeled data alongside limited labeled data:
LabelPropagation and LabelSpreading construct a graph connecting all samples and propagate labels through it. They support RBF and KNN kernels; KNN is faster for large datasets. The alpha parameter controls label clamping: hard-clamping (1.0) prevents label changes, while soft-clamping (<1.0) allows gradual adjustments.
SelfTrainingClassifier wraps any supervised classifier with predict_proba to iteratively add high-confidence pseudo-labels. The criterion parameter selects labels by threshold or k-best; max_iter controls iterations until convergence.
from sklearn.semi_supervised import LabelPropagation, SelfTrainingClassifier
from sklearn.linear_model import LogisticRegression
# Graph-based propagation
lp = LabelPropagation(kernel='knn', n_neighbors=7)
# Iterative pseudo-labeling
st = SelfTrainingClassifier(LogisticRegression(), threshold=0.75)
Frozen Estimators
FrozenEstimator wraps a fitted estimator to prevent re-fitting. Calling fit() becomes a no-op, and fit_predict/fit_transform are disabled. This is essential when using pre-trained models as transformers in pipelines—it ensures pipeline.fit() doesn't accidentally retrain the frozen step.
from sklearn.frozen import FrozenEstimator
from sklearn.linear_model import LogisticRegression
clf = LogisticRegression().fit(X_train, y_train)
frozen = FrozenEstimator(clf)
frozen.fit(X_new, y_new) # No-op; clf remains unchanged
Experimental Features
The experimental module provides access to unstable features not yet ready for production. These estimators may change without deprecation cycles. Always check documentation before using experimental features in production systems.