Load dataset in one of CatBoost's natively supported formats:
SparkSession
Path with scheme to dataset in CatBoost format.
For example, dsv:///home/user/datasets/my_dataset/train.dsv
or
libsvm:///home/user/datasets/my_dataset/train.libsvm
Path to column description file
Additional params specifying data format.
(optional) Path with scheme to dataset pairs in CatBoost format.
Only "dsv-grouped" format is supported for now.
For example, dsv-grouped:///home/user/datasets/my_dataset/train_pairs.dsv
Pool containing loaded data.
val spark = SparkSession.builder() .master("local[*]") .appName("testLoadDSVSimple") .getOrCreate() val pool = Pool.load( spark, "dsv:///home/user/datasets/my_dataset/train.dsv", columnDescription = "/home/user/datasets/my_dataset/cd" ) val poolWithPairs = Pool.load( spark, "dsv:///home/user/datasets/my_dataset_with_pairs/train.dsv", columnDescription = "/home/user/datasets/my_dataset_with_pairs/cd", pairsDataPathWithScheme = "dsv-grouped:///home/user/datasets/my_dataset_with_pairs/train_pairs.dsv" )
Companion object for Pool class that is CatBoost's abstraction of a dataset