JSONConnector

Connector that loads JSON files and returns the results as a DataFrame.

You can set the following JSON-specific options to deal with non-standard JSON files:

primitivesAsString (default false): infers all primitive values as a string type
prefersDecimal (default false): infers all floating-point values as a decimal type. If the values do not fit in decimal, then it infers them as doubles.
allowComments (default false): ignores Java/C++ style comment in JSON records
allowUnquotedFieldNames (default false): allows unquoted JSON field names
allowSingleQuotes (default true): allows single quotes in addition to double quotes
allowNumericLeadingZeros (default false): allows leading zeros in numbers (e.g. 00012)
allowBackslashEscapingAnyCharacter (default false): allows accepting quoting of all character using backslash quoting mechanism
allowUnquotedControlChars (default false): allows JSON Strings to contain unquoted control characters (ASCII characters with value less than 32, including tab and line feed characters) or not.
mode (default PERMISSIVE): allows a mode for dealing with corrupt records during parsing.
PERMISSIVE : when it meets a corrupted record, puts the malformed string into a field configured by columnNameOfCorruptRecord, and sets other fields to null. To keep corrupt records, an user can set a string type field named columnNameOfCorruptRecord in an user-defined schema. If a schema does not have the field, it drops corrupt records during parsing. When inferring a schema, it implicitly adds a columnNameOfCorruptRecord field in an output schema.
DROPMALFORMED : ignores the whole corrupted records.
FAILFAST : throws an exception when it meets corrupted records.
columnNameOfCorruptRecord (default is the value specified in spark.sql.columnNameOfCorruptRecord): allows renaming the new field having malformed string created by PERMISSIVE mode. This overrides spark.sql.columnNameOfCorruptRecord.
dateFormat (default yyyy-MM-dd): sets the string that indicates a date format. Custom date formats follow the formats at java.text.SimpleDateFormat. This applies to date type.
timestampFormat (default yyyy-MM-dd'T'HH:mm:ss.SSSXXX): sets the string that indicates a timestamp format. Custom date formats follow the formats at java.text.SimpleDateFormat. This applies to timestamp type.
multiLine (default false): parse one record, which may span multiple lines, per file
encoding (by default it is not set): allows to forcibly set one of standard basic or extended encoding for the JSON files. For example UTF-16BE, UTF-32LE. If the encoding is not specified and multiLine is set to true, it will be detected automatically.
lineSep (default covers all \r, \r\n and \n): defines the line separator that should be used for parsing.
samplingRatio (default is 1.0): defines fraction of input JSON objects used for schema inferring.
dropFieldIfAllNull (default false): whether to ignore column of all null values or empty array/struct during schema inference.

Annotations: @Evolving()

Linear Supertypes

FileConnector, HasSparkSession, Connector, Logging, AnyRef, Any

Instance Constructors

new JSONConnector(conf: Conf)
new JSONConnector(config: Config)
new JSONConnector(options: Map[String, String])
new JSONConnector(options: FileConnectorConf)
new JSONConnector(spark: SparkSession, conf: Conf)

Annotations
@deprecated
Deprecated
(Since version 0.3.4) use the constructor with no spark session
new JSONConnector(spark: SparkSession, config: Config)

Annotations
@deprecated
Deprecated
(Since version 0.3.4) use the constructor with no spark session
new JSONConnector(spark: SparkSession, options: Map[String, String])

Annotations
@deprecated
Deprecated
(Since version 0.3.4) use the constructor with no spark session
new JSONConnector(spark: SparkSession, options: FileConnectorConf)

Annotations
@deprecated
Deprecated
(Since version 0.3.4) use the constructor with no spark session

Value Members

final def !=(arg0: Any): Boolean

Definition Classes
AnyRef → Any
final def ##(): Int

Definition Classes
AnyRef → Any
final def ==(arg0: Any): Boolean

Definition Classes
AnyRef → Any
final def asInstanceOf[T0]: T0

Definition Classes
Any
lazy val basePath: Path

Get the basePath of the current path.
Get the basePath of the current path. If the value path is a file path, then its basePath will be it's parent's path. Otherwise it will be the current path itself.

Definition Classes
FileConnector
def canWrite: Boolean

Definition Classes
FileConnector
def clone(): AnyRef

Attributes
protected[java.lang]
Definition Classes
AnyRef
Annotations
@throws( ... )
def delete(): Unit

Delete the current file or directory
Delete the current file or directory

Definition Classes
FileConnector
def deleteStandardJSON(): JSONConnector.this.type
def dropUserDefinedSuffix: Boolean

Get the boolean value of dropUserDefinedSuffix.
Get the boolean value of dropUserDefinedSuffix.
returns
true if the column will be dropped, false otherwise

Definition Classes
FileConnector
def dropUserDefinedSuffix(boo: Boolean): JSONConnector.this.type

Set to true to drop the column containing user defined suffix (default name _user_defined_suffix)
Set to true to drop the column containing user defined suffix (default name _user_defined_suffix)
boo
true to drop, false to keep

Definition Classes
FileConnector
final def eq(arg0: AnyRef): Boolean

Definition Classes
AnyRef
def equals(arg0: Any): Boolean

Definition Classes
AnyRef → Any
def filesToLoad(detailed: Boolean): Array[Path]

List files to be loaded.
List files to be loaded.
If the current connector has a non-empty filename pattern, then return a list of file paths that match the pattern.
When the filename pattern is not set: If the absolute path of this connector is a directory, return the path of the directory if detailed is set to false. Otherwise, return a list of file paths in the directory
detailed
true to return a list of file paths if the current absolute path is a directory

Definition Classes
FileConnector
def finalize(): Unit

Attributes
protected[java.lang]
Definition Classes
AnyRef
Annotations
@throws( classOf[java.lang.Throwable] )
final def getClass(): Class[_]

Definition Classes
AnyRef → Any
def getFileSystem: FileSystem

Get the current filesystem based on the path URI
Get the current filesystem based on the path URI

Definition Classes
FileConnector
def getSize: Long

Get the sum of file size
Get the sum of file size
returns
size in byte

Definition Classes
FileConnector
def getStandardJSONPath: Path
def getUserDefinedSuffixKey: String

Get the value of user defined suffix column name
Get the value of user defined suffix column name

Definition Classes
FileConnector
def getWriteCount: Long

Definition Classes
FileConnector
def hashCode(): Int

Definition Classes
AnyRef → Any
final def isInstanceOf[T0]: Boolean

Definition Classes
Any
def listFiles(): Array[String]

List ALL the file paths (in format of string) of the current path of connector
List ALL the file paths (in format of string) of the current path of connector

Definition Classes
FileConnector
def listFilesToLoad(detailed: Boolean = true): Array[String]

List all the file path (in format of string) to be loaded.
List all the file path (in format of string) to be loaded.
If the current connector has a non-empty filename pattern, then return a list of file paths that match the pattern.
When the filename pattern is not set: If the absolute path of this connector is a directory, return the path of the directory if detailed is set to false. Otherwise, return a list of file paths in the directory
When the filename pattern IS set, a list of file paths will always be returned
detailed
true to list all file paths when the absolute path points to a directory otherwise return only the directory path.

Definition Classes
FileConnector
def listPaths(): Array[Path]

List ALL the file paths of the current path of connector
List ALL the file paths of the current path of connector

Definition Classes
FileConnector
def log: Logger

Attributes
protected
Definition Classes
Logging
def logName: String

Attributes
protected
Definition Classes
Logging
final def ne(arg0: AnyRef): Boolean

Definition Classes
AnyRef
final def notify(): Unit

Definition Classes
AnyRef
final def notifyAll(): Unit

Definition Classes
AnyRef
val options: FileConnectorConf

Definition Classes
JSONConnector → FileConnector
def partitionBy(columns: String*): JSONConnector.this.type

Definition Classes
FileConnector
def read(): DataFrame

Read a DataFrame from a file with the path defined during the instantiation.
Read a DataFrame from a file with the path defined during the instantiation.

Definition Classes
FileConnector → Connector
Annotations
@throws( s"$absolutePath doesn't exist" ) @throws( s"$absolutePath doesn't exist" )
def readStandardJSON(): String
lazy val reader: DataFrameReader

DataFrame reader for the current path of connector
DataFrame reader for the current path of connector

Definition Classes
FileConnector → Connector
def resetSuffix(force: Boolean = false): JSONConnector.this.type

Reset suffix to None
Reset suffix to None
force
set to true to ignore the validity check of suffix value

Definition Classes
FileConnector
val schema: Option[StructType]

Definition Classes
FileConnector
def setSuffix(suffix: Option[String]): JSONConnector.this.type

The current version of FileConnector doesn't support a mix of suffix and non-suffix write when the DataFrame is partitioned.
The current version of FileConnector doesn't support a mix of suffix and non-suffix write when the DataFrame is partitioned.
This method will detect, in the case of a partitioned table, if user try to use both suffix write and non-suffix write
suffix
an option of suffix in string format

Definition Classes
FileConnector
def setUserDefinedSuffixKey(key: String): JSONConnector.this.type

Set the name of user defined suffix column (by default is _user_defined_suffix
Set the name of user defined suffix column (by default is _user_defined_suffix
key
name of the new key

Definition Classes
FileConnector
val spark: SparkSession

Definition Classes
HasSparkSession
val storage: Storage

Definition Classes
JSONConnector → Connector
final def synchronized[T0](arg0: ⇒ T0): T0

Definition Classes
AnyRef
def toString(): String

Definition Classes
AnyRef → Any
final def wait(): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )
final def wait(arg0: Long, arg1: Int): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )
final def wait(arg0: Long): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )
def write(t: DataFrame): Unit

Definition Classes
FileConnector → Connector
def write(df: DataFrame, suffix: Option[String]): Unit

Write a DataFrame into file
Write a DataFrame into file
df
dataframe to be written
suffix
optional, String, write the df in a sub-directory of the defined path

Definition Classes
FileConnector → Connector
def writeStandardJSON(df: DataFrame): JSONConnector.this.type

Write a JSON file in the standard format.
Write a JSON file in the standard format.
This method will collect all the DataFrame partitions to the spark driver so it may impact the performance when the amount of data to write is huge.
df
DataFrame to be written
def writeToPath(df: DataFrame, filepath: String): Unit

Write a DataFrame into the given path with the given save mode
Write a DataFrame into the given path with the given save mode

Definition Classes
FileConnector
val writer: (DataFrame) ⇒ DataFrameWriter[Row]

Initialize a DataFrame writer.
Initialize a DataFrame writer. A new writer will be initiate only if the hashcode of input DataFrame is different than the last written DataFrame.

Definition Classes
FileConnector → Connector

Related Doc: package connector

class JSONConnector extends FileConnector

Instance Constructors

new JSONConnector(conf: Conf)

new JSONConnector(config: Config)

new JSONConnector(options: Map[String, String])

new JSONConnector(options: FileConnectorConf)

new JSONConnector(spark: SparkSession, conf: Conf)

new JSONConnector(spark: SparkSession, config: Config)

new JSONConnector(spark: SparkSession, options: Map[String, String])

new JSONConnector(spark: SparkSession, options: FileConnectorConf)

Value Members

final def !=(arg0: Any): Boolean

final def ##(): Int

final def ==(arg0: Any): Boolean

final def asInstanceOf[T0]: T0

lazy val basePath: Path

def canWrite: Boolean

def clone(): AnyRef

def delete(): Unit

def deleteStandardJSON(): JSONConnector.this.type

def dropUserDefinedSuffix: Boolean

def dropUserDefinedSuffix(boo: Boolean): JSONConnector.this.type

final def eq(arg0: AnyRef): Boolean

def equals(arg0: Any): Boolean

def filesToLoad(detailed: Boolean): Array[Path]

def finalize(): Unit

final def getClass(): Class[_]

def getFileSystem: FileSystem

def getSize: Long

def getStandardJSONPath: Path

def getUserDefinedSuffixKey: String

def getWriteCount: Long

def hashCode(): Int

final def isInstanceOf[T0]: Boolean

def listFiles(): Array[String]

def listFilesToLoad(detailed: Boolean = true): Array[String]

def listPaths(): Array[Path]

def log: Logger

def logName: String

final def ne(arg0: AnyRef): Boolean

final def notify(): Unit

final def notifyAll(): Unit

val options: FileConnectorConf

def partitionBy(columns: String*): JSONConnector.this.type

def read(): DataFrame

def readStandardJSON(): String

lazy val reader: DataFrameReader

def resetSuffix(force: Boolean = false): JSONConnector.this.type

val schema: Option[StructType]

def setSuffix(suffix: Option[String]): JSONConnector.this.type

def setUserDefinedSuffixKey(key: String): JSONConnector.this.type

val spark: SparkSession

val storage: Storage

final def synchronized[T0](arg0: ⇒ T0): T0

def toString(): String

final def wait(): Unit

final def wait(arg0: Long, arg1: Int): Unit

final def wait(arg0: Long): Unit

def write(t: DataFrame): Unit

def write(df: DataFrame, suffix: Option[String]): Unit

def writeStandardJSON(df: DataFrame): JSONConnector.this.type

def writeToPath(df: DataFrame, filepath: String): Unit

val writer: (DataFrame) ⇒ DataFrameWriter[Row]

Inherited from FileConnector

Inherited from HasSparkSession

Inherited from Connector

Inherited from Logging

Inherited from AnyRef

Inherited from Any

Ungrouped