package spark
- Alphabetic
- Public
- Protected
Type Members
- class DatasetFromStream extends AnyRef
- class DatasetMapper[T] extends Serializable
A generic class for mapping and transforming datasets from text-based input to structured data of type
Tusing Apache Spark.A generic class for mapping and transforming datasets from text-based input to structured data of type
Tusing Apache Spark.This class enables parsing and conversion of raw textual data into Spark
Datasets, applying a user-defined mapping function with error handling for missing or malformed data.- T
the type of the elements in the resulting Spark
Dataset.
- class DatasetParser[T] extends AnyRef
The
DatasetParserclass is a utility designed to parse structured data and create a SparkDatasetof a specific typeT.The
DatasetParserclass is a utility designed to parse structured data and create a SparkDatasetof a specific typeT. It relies on an implicit Spark session and an implicitTableParserfor transforming raw data into aTablerepresentation before converting it into a SparkDataset. This class provides a safe and reusable way to load and parse datasets from resources such as URLs.- T
the type of the elements in the resulting
Dataset. It should have an implicit Encoder available for serialization. Usage example:implicit val spark: SparkSession = SparkSession.builder.appName("DatasetParser").master("local[*]").getOrCreate() import spark.implicits._ // TableParser instance for the specific data type implicit val movieTableParser: StringTableParser[Table[Movie]] = implicitly[MovieTableParser] val parser = new DatasetParser[Movie]() parser.createDataset[DatasetParser[_]]("movie_metadata.csv") match { case Success(ds) => ds.show(10) case Failure(error) => throw error }
Value Members
- object DatasetMapper extends App with Serializable
The
DatasetMapperobject is the entry point for mapping and processing datasets using Spark.The
DatasetMapperobject is the entry point for mapping and processing datasets using Spark. It leverages theDatasetMappergeneric class to parse and transform text-based data into structured datasets. This object is specifically tailored to process data of typeMovieusing the provided parser and configuration.It sets up the necessary implicit Spark session and encoders, and executes the main logic for dataset transformation.
Functionality Overview: - Initializes a Spark session configured for local execution. - Configures the
DatasetMapperto use theMovieDatabaseparser for processing rows of movie data. - Defines default handling for missing or malformed data using theMovie.missingvalue. - Invokes the dataset processing logic with a specified input file containing raw text data, displaying the first 20 rows of the resulting dataset.Note: This object is designed to run as a standalone Spark application to demonstrate dataset mapping functionality.
- object DatasetParser extends App
The
DatasetParserobject demonstrates how to parse and process a CSV file containing movie metadata into a structured dataset using Spark and an implicitStringTableParser.The
DatasetParserobject demonstrates how to parse and process a CSV file containing movie metadata into a structured dataset using Spark and an implicitStringTableParser. It showcases capabilities to handle parsing, error management, and displaying results.This object extends the
Apptrait, making it directly executable as a Scala program. - object MovieDatabase
The MovieDatabase object provides utility functions and constants for handling movie metadata.
The MovieDatabase object provides utility functions and constants for handling movie metadata. It serves as a central repository for defining parsers and file paths required for processing movie-related data.
This object includes mechanisms for parsing movie data and manages access to a CSV file containing movie metadata.