java.lang.Object
- org.hpccsystems.spark.HpccFileWriter

All Implemented Interfaces:

Serializable
```
public class HpccFileWriter
extends Object
implements Serializable
```
A helper class that creates a job in Spark that writes a given RDD to HPCC Systems.

See Also:

Serialized Form

Constructor Summary

Constructors
Constructor	Description
`HpccFileWriter(String connectionString, String user, String pass)`	HpccFileWriter Constructor Attempts to open a connection to the specified HPCC cluster and validates the user.
`HpccFileWriter(org.hpccsystems.ws.client.utils.Connection espconninfo)`

Method Summary

All Methods Instance Methods Concrete Methods
Modifier and Type	Method	Description
`org.apache.spark.sql.types.StructType`	`inferSchema(List<PySparkField> exampleFields)`	Generates an inferred schema based on an example Map of FieldNames to Example Field Objects.
`long`	`saveToHPCC(org.apache.spark.api.java.JavaRDD<org.apache.spark.sql.Row> javaRDD, String clusterName, String fileName)`	Saves the provided RDD to the specified file within the specified cluster.
`long`	`saveToHPCC(org.apache.spark.api.java.JavaRDD<org.apache.spark.sql.Row> javaRDD, String clusterName, String fileName, org.hpccsystems.dfs.client.CompressionAlgorithm fileCompression, boolean overwrite)`	Saves the provided RDD to the specified file within the specified cluster Note: PySpark datasets can be written to HPCC by first calling inferSchema to generate a valid Java Schema and converting the PySpark RDD to a JavaRDD via the _py2java() helper
`long`	`saveToHPCC(org.apache.spark.rdd.RDD<org.apache.spark.sql.Row> scalaRDD, String clusterName, String fileName)`	Saves the provided RDD to the specified file within the specified cluster.
`long`	`saveToHPCC(org.apache.spark.rdd.RDD<org.apache.spark.sql.Row> scalaRDD, String clusterName, String fileName, org.hpccsystems.dfs.client.CompressionAlgorithm fileCompression, boolean overwrite)`	Saves the provided RDD to the specified file within the specified cluster Note: PySpark datasets can be written to HPCC by first calling inferSchema to generate a valid Java Schema and converting the PySpark RDD to a JavaRDD via the _py2java() helper
`long`	`saveToHPCC(org.apache.spark.SparkContext sc, org.apache.spark.api.java.JavaRDD<org.apache.spark.sql.Row> javaRDD, String clusterName, String fileName)`	Saves the provided RDD to the specified file within the specified cluster.
`long`	`saveToHPCC(org.apache.spark.SparkContext sc, org.apache.spark.rdd.RDD<org.apache.spark.sql.Row> scalaRDD, String clusterName, String fileName)`	Saves the provided RDD to the specified file within the specified cluster.
`long`	`saveToHPCC(org.apache.spark.SparkContext sc, org.apache.spark.rdd.RDD<org.apache.spark.sql.Row> scalaRDD, String clusterName, String fileName, org.hpccsystems.dfs.client.CompressionAlgorithm fileCompression, boolean overwrite)`	Saves the provided RDD to the specified file within the specified cluster Note: PySpark datasets can be written to HPCC by first calling inferSchema to generate a valid Java Schema and converting the PySpark RDD to a JavaRDD via the _py2java() helper
`long`	`saveToHPCC(org.apache.spark.SparkContext sc, org.apache.spark.sql.types.StructType rddSchema, org.apache.spark.api.java.JavaRDD<org.apache.spark.sql.Row> rdd, String clusterName, String fileName, org.hpccsystems.dfs.client.CompressionAlgorithm fileCompression, boolean overwrite)`	Saves the provided RDD to the specified file within the specified cluster Note: PySpark datasets can be written to HPCC by first calling inferSchema to generate a valid Java Schema and converting the PySpark RDD to a JavaRDD via the _py2java() helper
`long`	`saveToHPCC(org.apache.spark.sql.types.StructType schema, org.apache.spark.api.java.JavaRDD<org.apache.spark.sql.Row> javaRDD, String clusterName, String fileName)`	Saves the provided RDD to the specified file within the specified cluster.
`long`	`saveToHPCC(org.apache.spark.sql.types.StructType schema, org.apache.spark.api.java.JavaRDD<org.apache.spark.sql.Row> javaRDD, String clusterName, String fileName, org.hpccsystems.dfs.client.CompressionAlgorithm fileCompression, boolean overwrite)`	Saves the provided RDD to the specified file within the specified cluster Note: PySpark datasets can be written to HPCC by first calling inferSchema to generate a valid Java Schema and converting the PySpark RDD to a JavaRDD via the _py2java() helper
`long`	`saveToHPCC(org.apache.spark.sql.types.StructType schema, org.apache.spark.rdd.RDD<org.apache.spark.sql.Row> scalaRDD, String clusterName, String fileName)`	Saves the provided RDD to the specified file within the specified cluster.
`long`	`saveToHPCC(org.apache.spark.sql.types.StructType schema, org.apache.spark.rdd.RDD<org.apache.spark.sql.Row> scalaRDD, String clusterName, String fileName, org.hpccsystems.dfs.client.CompressionAlgorithm fileCompression, boolean overwrite)`	Saves the provided RDD to the specified file within the specified cluster Note: PySpark datasets can be written to HPCC by first calling inferSchema to generate a valid Java Schema and converting the PySpark RDD to a JavaRDD via the _py2java() helper

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Constructor Detail
  - HpccFileWriter
```
public HpccFileWriter(org.hpccsystems.ws.client.utils.Connection espconninfo)
               throws org.hpccsystems.commons.errors.HpccFileException
```
    Throws:
    
    org.hpccsystems.commons.errors.HpccFileException
  - HpccFileWriter
```
public HpccFileWriter(String connectionString,
                      String user,
                      String pass)
               throws Exception
```
    HpccFileWriter Constructor Attempts to open a connection to the specified HPCC cluster and validates the user.
    
    Parameters:
    
    connectionString - of format {http|https}://{HOST}:{PORT}. Host and port are the same as the ecl watch host and port.
    
    user - a valid ecl watch account
    
    pass - the password for the provided user
    
    Throws:
    
    Exception - general exception
- Method Detail
  - saveToHPCC
```
public long saveToHPCC(org.apache.spark.rdd.RDD<org.apache.spark.sql.Row> scalaRDD,
                       String clusterName,
                       String fileName)
                throws Exception,
                       org.hpccsystems.ws.client.wrappers.ArrayOfEspExceptionWrapper
```
    Saves the provided RDD to the specified file within the specified cluster. Will use HPCC default file compression. Note: PySpark datasets can be written to HPCC by first calling inferSchema to generate a valid Java Schema and converting the PySpark RDD to a JavaRDD via the _py2java() helper
    
    Parameters:
    
    scalaRDD - The RDD to save to HPCC
    
    clusterName - The name of the cluster to save to.
    
    fileName - The name of the logical file in HPCC to create. Follows HPCC file name conventions.
    
    Returns:
    
    Returns the number of records written
    
    Throws:
    
    Exception - general exception
    
    org.hpccsystems.ws.client.wrappers.ArrayOfEspExceptionWrapper - array of esp exception wrapper
  - saveToHPCC
```
public long saveToHPCC(org.apache.spark.sql.types.StructType schema,
                       org.apache.spark.rdd.RDD<org.apache.spark.sql.Row> scalaRDD,
                       String clusterName,
                       String fileName)
                throws Exception,
                       org.hpccsystems.ws.client.wrappers.ArrayOfEspExceptionWrapper
```
    Saves the provided RDD to the specified file within the specified cluster. Will use HPCC default file compression. Note: PySpark datasets can be written to HPCC by first calling inferSchema to generate a valid Java Schema and converting the PySpark RDD to a JavaRDD via the _py2java() helper
    
    Parameters:
    
    schema - The Schema of the provided RDD
    
    scalaRDD - The RDD to save to HPCC
    
    clusterName - The name of the cluster to save to.
    
    fileName - The name of the logical file in HPCC to create. Follows HPCC file name conventions.
    
    Returns:
    
    Returns the number of records written
    
    Throws:
    
    Exception - general exception
    
    org.hpccsystems.ws.client.wrappers.ArrayOfEspExceptionWrapper - array of esp exception wrapper
  - saveToHPCC
```
public long saveToHPCC(org.apache.spark.api.java.JavaRDD<org.apache.spark.sql.Row> javaRDD,
                       String clusterName,
                       String fileName)
                throws Exception,
                       org.hpccsystems.ws.client.wrappers.ArrayOfEspExceptionWrapper
```
    Saves the provided RDD to the specified file within the specified cluster. Will use HPCC default file compression. Note: PySpark datasets can be written to HPCC by first calling inferSchema to generate a valid Java Schema and converting the PySpark RDD to a JavaRDD via the _py2java() helper
    
    Parameters:
    
    javaRDD - The RDD to save to HPCC
    
    clusterName - The name of the cluster to save to.
    
    fileName - The name of the logical file in HPCC to create. Follows HPCC file name conventions.
    
    Returns:
    
    Returns the number of records written
    
    Throws:
    
    Exception - general exception
    
    org.hpccsystems.ws.client.wrappers.ArrayOfEspExceptionWrapper - array of esp exception wrapper
  - saveToHPCC
```
public long saveToHPCC(org.apache.spark.sql.types.StructType schema,
                       org.apache.spark.api.java.JavaRDD<org.apache.spark.sql.Row> javaRDD,
                       String clusterName,
                       String fileName)
                throws Exception,
                       org.hpccsystems.ws.client.wrappers.ArrayOfEspExceptionWrapper
```
    Saves the provided RDD to the specified file within the specified cluster. Will use HPCC default file compression. Note: PySpark datasets can be written to HPCC by first calling inferSchema to generate a valid Java Schema and converting the PySpark RDD to a JavaRDD via the _py2java() helper
    
    Parameters:
    
    schema - The Schema of the provided RDD
    
    javaRDD - The RDD to save to HPCC
    
    clusterName - The name of the cluster to save to.
    
    fileName - The name of the logical file in HPCC to create. Follows HPCC file name conventions.
    
    Returns:
    
    Returns the number of records written
    
    Throws:
    
    Exception - general exception
    
    org.hpccsystems.ws.client.wrappers.ArrayOfEspExceptionWrapper - array of esp exception
  - saveToHPCC
```
public long saveToHPCC(org.apache.spark.rdd.RDD<org.apache.spark.sql.Row> scalaRDD,
                       String clusterName,
                       String fileName,
                       org.hpccsystems.dfs.client.CompressionAlgorithm fileCompression,
                       boolean overwrite)
                throws Exception,
                       org.hpccsystems.ws.client.wrappers.ArrayOfEspExceptionWrapper
```
    Saves the provided RDD to the specified file within the specified cluster Note: PySpark datasets can be written to HPCC by first calling inferSchema to generate a valid Java Schema and converting the PySpark RDD to a JavaRDD via the _py2java() helper
    
    Parameters:
    
    scalaRDD - The RDD to save to HPCC
    
    clusterName - The name of the cluster to save to.
    
    fileName - The name of the logical file in HPCC to create. Follows HPCC file name conventions.
    
    fileCompression - compression algorithm to use on files
    
    overwrite - overwrite flag
    
    Returns:
    
    Returns the number of records written
    
    Throws:
    
    Exception - general exception
    
    org.hpccsystems.ws.client.wrappers.ArrayOfEspExceptionWrapper - array of esp exception wrapper
  - saveToHPCC
```
public long saveToHPCC(org.apache.spark.sql.types.StructType schema,
                       org.apache.spark.rdd.RDD<org.apache.spark.sql.Row> scalaRDD,
                       String clusterName,
                       String fileName,
                       org.hpccsystems.dfs.client.CompressionAlgorithm fileCompression,
                       boolean overwrite)
                throws Exception,
                       org.hpccsystems.ws.client.wrappers.ArrayOfEspExceptionWrapper
```
    Saves the provided RDD to the specified file within the specified cluster Note: PySpark datasets can be written to HPCC by first calling inferSchema to generate a valid Java Schema and converting the PySpark RDD to a JavaRDD via the _py2java() helper
    
    Parameters:
    
    schema - The Schema of the provided RDD
    
    scalaRDD - The RDD to save to HPCC
    
    clusterName - The name of the cluster to save to.
    
    fileName - The name of the logical file in HPCC to create. Follows HPCC file name conventions.
    
    fileCompression - compression algorithm to use on files
    
    overwrite - overwrite flag
    
    Returns:
    
    Returns the number of records written
    
    Throws:
    
    Exception - general exception
    
    org.hpccsystems.ws.client.wrappers.ArrayOfEspExceptionWrapper - array of esp exception wrapper
  - saveToHPCC
```
public long saveToHPCC(org.apache.spark.api.java.JavaRDD<org.apache.spark.sql.Row> javaRDD,
                       String clusterName,
                       String fileName,
                       org.hpccsystems.dfs.client.CompressionAlgorithm fileCompression,
                       boolean overwrite)
                throws Exception,
                       org.hpccsystems.ws.client.wrappers.ArrayOfEspExceptionWrapper
```
    Saves the provided RDD to the specified file within the specified cluster Note: PySpark datasets can be written to HPCC by first calling inferSchema to generate a valid Java Schema and converting the PySpark RDD to a JavaRDD via the _py2java() helper
    
    Parameters:
    
    javaRDD - The RDD to save to HPCC
    
    clusterName - The name of the cluster to save to.
    
    fileName - The name of the logical file in HPCC to create. Follows HPCC file name conventions.
    
    fileCompression - compression algorithm to use on files
    
    overwrite - overwrite flag
    
    Returns:
    
    Returns the number of records written
    
    Throws:
    
    Exception - general exception
    
    org.hpccsystems.ws.client.wrappers.ArrayOfEspExceptionWrapper - array of esp exception wrapper
  - saveToHPCC
```
public long saveToHPCC(org.apache.spark.sql.types.StructType schema,
                       org.apache.spark.api.java.JavaRDD<org.apache.spark.sql.Row> javaRDD,
                       String clusterName,
                       String fileName,
                       org.hpccsystems.dfs.client.CompressionAlgorithm fileCompression,
                       boolean overwrite)
                throws Exception,
                       org.hpccsystems.ws.client.wrappers.ArrayOfEspExceptionWrapper
```
    Saves the provided RDD to the specified file within the specified cluster Note: PySpark datasets can be written to HPCC by first calling inferSchema to generate a valid Java Schema and converting the PySpark RDD to a JavaRDD via the _py2java() helper
    
    Parameters:
    
    schema - The Schema of the provided RDD
    
    javaRDD - The RDD to save to HPCC
    
    clusterName - The name of the cluster to save to.
    
    fileName - The name of the logical file in HPCC to create. Follows HPCC file name conventions.
    
    fileCompression - compression algorithm to use on files
    
    overwrite - overwrite flag
    
    Returns:
    
    Returns the number of records written
    
    Throws:
    
    Exception - general exception
    
    org.hpccsystems.ws.client.wrappers.ArrayOfEspExceptionWrapper - array of esp exception wrapper
  - saveToHPCC
```
public long saveToHPCC(org.apache.spark.SparkContext sc,
                       org.apache.spark.rdd.RDD<org.apache.spark.sql.Row> scalaRDD,
                       String clusterName,
                       String fileName)
                throws Exception,
                       org.hpccsystems.ws.client.wrappers.ArrayOfEspExceptionWrapper
```
    Saves the provided RDD to the specified file within the specified cluster. Will use HPCC default file compression. Note: PySpark datasets can be written to HPCC by first calling inferSchema to generate a valid Java Schema and converting the PySpark RDD to a JavaRDD via the _py2java() helper
    
    Parameters:
    
    sc - The current SparkContext
    
    scalaRDD - The RDD to save to HPCC
    
    clusterName - The name of the cluster to save to.
    
    fileName - The name of the logical file in HPCC to create. Follows HPCC file name conventions.
    
    Returns:
    
    Returns the number of records written
    
    Throws:
    
    Exception - general exception
    
    org.hpccsystems.ws.client.wrappers.ArrayOfEspExceptionWrapper - array of esp exception wrapper
  - saveToHPCC
```
public long saveToHPCC(org.apache.spark.SparkContext sc,
                       org.apache.spark.api.java.JavaRDD<org.apache.spark.sql.Row> javaRDD,
                       String clusterName,
                       String fileName)
                throws Exception,
                       org.hpccsystems.ws.client.wrappers.ArrayOfEspExceptionWrapper
```
    Saves the provided RDD to the specified file within the specified cluster. Will use HPCC default file compression. Note: PySpark datasets can be written to HPCC by first calling inferSchema to generate a valid Java Schema and converting the PySpark RDD to a JavaRDD via the _py2java() helper
    
    Parameters:
    
    sc - The current SparkContext
    
    javaRDD - The RDD to save to HPCC
    
    clusterName - The name of the cluster to save to.
    
    fileName - The name of the logical file in HPCC to create. Follows HPCC file name conventions.
    
    Returns:
    
    Returns the number of records written
    
    Throws:
    
    Exception - general exception
    
    org.hpccsystems.ws.client.wrappers.ArrayOfEspExceptionWrapper - array of esp exception wrapper
  - saveToHPCC
```
public long saveToHPCC(org.apache.spark.SparkContext sc,
                       org.apache.spark.rdd.RDD<org.apache.spark.sql.Row> scalaRDD,
                       String clusterName,
                       String fileName,
                       org.hpccsystems.dfs.client.CompressionAlgorithm fileCompression,
                       boolean overwrite)
                throws Exception,
                       org.hpccsystems.ws.client.wrappers.ArrayOfEspExceptionWrapper
```
    Saves the provided RDD to the specified file within the specified cluster Note: PySpark datasets can be written to HPCC by first calling inferSchema to generate a valid Java Schema and converting the PySpark RDD to a JavaRDD via the _py2java() helper
    
    Parameters:
    
    sc - The current SparkContext
    
    scalaRDD - The RDD to save to HPCC
    
    clusterName - The name of the cluster to save to.
    
    fileName - The name of the logical file in HPCC to create. Follows HPCC file name conventions.
    
    fileCompression - compression algorithm to use on files
    
    overwrite - overwrite flag
    
    Returns:
    
    Returns the number of records written
    
    Throws:
    
    Exception - general exception
    
    org.hpccsystems.ws.client.wrappers.ArrayOfEspExceptionWrapper - array of esp exception wrapper
  - saveToHPCC
```
public long saveToHPCC(org.apache.spark.SparkContext sc,
                       org.apache.spark.sql.types.StructType rddSchema,
                       org.apache.spark.api.java.JavaRDD<org.apache.spark.sql.Row> rdd,
                       String clusterName,
                       String fileName,
                       org.hpccsystems.dfs.client.CompressionAlgorithm fileCompression,
                       boolean overwrite)
                throws Exception,
                       org.hpccsystems.ws.client.wrappers.ArrayOfEspExceptionWrapper
```
    Saves the provided RDD to the specified file within the specified cluster Note: PySpark datasets can be written to HPCC by first calling inferSchema to generate a valid Java Schema and converting the PySpark RDD to a JavaRDD via the _py2java() helper
    
    Parameters:
    
    sc - The current SparkContext
    
    rddSchema - rdd schema
    
    rdd - java rdd row
    
    clusterName - The name of the cluster to save to.
    
    fileName - The name of the logical file in HPCC to create. Follows HPCC file name conventions.
    
    fileCompression - compression algorithm to use on files
    
    overwrite - ovewrite flag
    
    Returns:
    
    Returns the number of records written
    
    Throws:
    
    Exception - general exception
    
    org.hpccsystems.ws.client.wrappers.ArrayOfEspExceptionWrapper - array of esp exception wrapper
  - inferSchema
```
public org.apache.spark.sql.types.StructType inferSchema(List<PySparkField> exampleFields)
                                                  throws Exception
```
    Generates an inferred schema based on an example Map of FieldNames to Example Field Objects. This function is targeted primary at helping PySpark users write datasets back to HPCC.
    
    Parameters:
    
    exampleFields - list of python spark fields
    
    Returns:
    
    Returns a valid Spark schema based on the example rowDictionary
    
    Throws:
    
    Exception - general exception

Class HpccFileWriter

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Constructor Detail

HpccFileWriter

HpccFileWriter

Method Detail

saveToHPCC

saveToHPCC

saveToHPCC

saveToHPCC

saveToHPCC

saveToHPCC

saveToHPCC

saveToHPCC

saveToHPCC

saveToHPCC

saveToHPCC

saveToHPCC

inferSchema