Supported values are FeatureImportance, PredictionValuesChange, LossFunctionChange, PredictionDiff
if fstrType is PredictionDiff it is required and must contain 2 samples if fstrType is PredictionValuesChange this param is required in case if model was explicitly trained with flag to store no leaf weights. otherwise it can be null
Used only for PredictionValuesChange. Possible values:
array of feature importances (index corresponds to the order of features in the model)
array of feature interaction scores
Supported values are FeatureImportance, PredictionValuesChange, LossFunctionChange, PredictionDiff
if fstrType is PredictionDiff it is required and must contain 2 samples if fstrType is PredictionValuesChange this param is required in case if model was explicitly trained with flag to store no leaf weights. otherwise it can be null
Used only for PredictionValuesChange. Possible values:
array of feature importances sorted in descending order by importance
SHAP interaction values are calculated for all features pairs if nor featureIndices nor featureNames are specified.
SHAP interaction values are calculated for all features pairs if nor featureIndices nor featureNames are specified.
dataset to calculate SHAP interaction values
(optional) pair of feature indices to calculate SHAP interaction values for.
(optional) pair of feature names to calculate SHAP interaction values for.
Possible values:
Possible values:
columns from data to add to output DataFrame, if null - add all columns
dataset to calculate SHAP values for
Possible values:
Possible values:
reference data for Independent Tree SHAP values from https://arxiv.org/abs/1905.04610v1 if referenceData is not null, then Independent Tree SHAP values are calculated
columns from data to add to output DataFrame, if null - add all columns
Prefer batch computations operating on datasets as a whole for efficiency
Prefer batch computations operating on datasets as a whole for efficiency
Prefer batch computations operating on datasets as a whole for efficiency
Prefer batch computations operating on datasets as a whole for efficiency
Save the model to a local file.
Save the model to a local file.
The path to the output model.
The output format of the model. Possible values:
CatboostBinary | CatBoost binary format (default). |
AppleCoreML | Apple CoreML format (only datasets without categorical features are currently supported). |
Cpp | Standalone C++ code (multiclassification models are not currently supported). See the C++ section for details on applying the resulting model. |
Python | Standalone Python code (multiclassification models are not currently supported). See the Python section for details on applying the resulting model. |
Json | JSON format. Refer to the CatBoost JSON model tutorial for format details. |
Onnx | ONNX-ML format (only datasets without categorical features are currently supported). Refer to https://onnx.ai for details. |
Pmml | PMML version 4.3 format. Categorical features must be interpreted as one-hot encoded during the training if present in the training dataset. This can be accomplished by setting the --one-hot-max-size/one_hot_max_size parameter to a value that is greater than the maximum number of unique categorical feature values among all categorical features in the dataset. Note. Multiclassification models are not currently supported. See the PMML section for details on applying the resulting model. |
Additional format-dependent parameters for AppleCoreML, Onnx or Pmml formats. See python API documentation for details.
The dataset previously used for training. This parameter is required if the model contains categorical features and the output format is Cpp, Python, or Json.
val spark = SparkSession.builder() .master("local[*]") .appName("testSaveLocalModel") .getOrCreate() val pool = Pool.load( spark, "dsv:///home/user/datasets/my_dataset/train.dsv", columnDescription = "/home/user/datasets/my_dataset/cd" ) val regressor = new CatBoostRegressor() val model = regressor.fit(pool) // save in CatBoostBinary format model.saveNativeModel("/home/user/model/model.cbm") // save in ONNX format with metadata model.saveNativeModel( "/home/user/model/model.onnx", EModelType.Onnx, Map( "onnx_domain" -> "ai.catboost", "onnx_model_version" -> 1, "onnx_doc_string" -> "test model for regression", "onnx_graph_name" -> "CatBoostModel_for_regression" ) )
This function is useful when the dataset has been already quantized but works with any Pool
This function is useful when the dataset has been already quantized but works with any Pool
Regression model trained by CatBoost. Use CatBoostRegressor to train it
Serialization
Supports standard Spark MLLib serialization. Data can be saved to distributed filesystem like HDFS or local files. When saved to
path
two files are created: -<path>/metadata
which contains Spark-specific metadata in JSON format -<path>/model
which contains model in usual CatBoost format which can be read using other local CatBoost APIs (if stored in a distributed filesystem it has to be copied to the local filesystem first).Saving to and loading from local files in standard CatBoost model formats is also supported.
Load native model
Save as a native model
Load model
Save model