Class HpccFileWriter

    • Constructor Summary

      Constructors 
      Constructor Description
      HpccFileWriter​(String connectionString, String user, String pass)
      HpccFileWriter Constructor Attempts to open a connection to the specified HPCC cluster and validates the user.
      HpccFileWriter​(org.hpccsystems.ws.client.utils.Connection espconninfo)  
    • Method Summary

      All Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      org.apache.spark.sql.types.StructType inferSchema​(List<PySparkField> exampleFields)
      Generates an inferred schema based on an example Map of FieldNames to Example Field Objects.
      long saveToHPCC​(org.apache.spark.api.java.JavaRDD<org.apache.spark.sql.Row> javaRDD, String clusterName, String fileName)
      Saves the provided RDD to the specified file within the specified cluster.
      long saveToHPCC​(org.apache.spark.api.java.JavaRDD<org.apache.spark.sql.Row> javaRDD, String clusterName, String fileName, org.hpccsystems.dfs.client.CompressionAlgorithm fileCompression, boolean overwrite)
      Saves the provided RDD to the specified file within the specified cluster Note: PySpark datasets can be written to HPCC by first calling inferSchema to generate a valid Java Schema and converting the PySpark RDD to a JavaRDD via the _py2java() helper
      long saveToHPCC​(org.apache.spark.rdd.RDD<org.apache.spark.sql.Row> scalaRDD, String clusterName, String fileName)
      Saves the provided RDD to the specified file within the specified cluster.
      long saveToHPCC​(org.apache.spark.rdd.RDD<org.apache.spark.sql.Row> scalaRDD, String clusterName, String fileName, org.hpccsystems.dfs.client.CompressionAlgorithm fileCompression, boolean overwrite)
      Saves the provided RDD to the specified file within the specified cluster Note: PySpark datasets can be written to HPCC by first calling inferSchema to generate a valid Java Schema and converting the PySpark RDD to a JavaRDD via the _py2java() helper
      long saveToHPCC​(org.apache.spark.SparkContext sc, org.apache.spark.api.java.JavaRDD<org.apache.spark.sql.Row> javaRDD, String clusterName, String fileName)
      Saves the provided RDD to the specified file within the specified cluster.
      long saveToHPCC​(org.apache.spark.SparkContext sc, org.apache.spark.rdd.RDD<org.apache.spark.sql.Row> scalaRDD, String clusterName, String fileName)
      Saves the provided RDD to the specified file within the specified cluster.
      long saveToHPCC​(org.apache.spark.SparkContext sc, org.apache.spark.rdd.RDD<org.apache.spark.sql.Row> scalaRDD, String clusterName, String fileName, org.hpccsystems.dfs.client.CompressionAlgorithm fileCompression, boolean overwrite)
      Saves the provided RDD to the specified file within the specified cluster Note: PySpark datasets can be written to HPCC by first calling inferSchema to generate a valid Java Schema and converting the PySpark RDD to a JavaRDD via the _py2java() helper
      long saveToHPCC​(org.apache.spark.SparkContext sc, org.apache.spark.sql.types.StructType rddSchema, org.apache.spark.api.java.JavaRDD<org.apache.spark.sql.Row> rdd, String clusterName, String fileName, org.hpccsystems.dfs.client.CompressionAlgorithm fileCompression, boolean overwrite)
      Saves the provided RDD to the specified file within the specified cluster Note: PySpark datasets can be written to HPCC by first calling inferSchema to generate a valid Java Schema and converting the PySpark RDD to a JavaRDD via the _py2java() helper
      long saveToHPCC​(org.apache.spark.sql.types.StructType schema, org.apache.spark.api.java.JavaRDD<org.apache.spark.sql.Row> javaRDD, String clusterName, String fileName)
      Saves the provided RDD to the specified file within the specified cluster.
      long saveToHPCC​(org.apache.spark.sql.types.StructType schema, org.apache.spark.api.java.JavaRDD<org.apache.spark.sql.Row> javaRDD, String clusterName, String fileName, org.hpccsystems.dfs.client.CompressionAlgorithm fileCompression, boolean overwrite)
      Saves the provided RDD to the specified file within the specified cluster Note: PySpark datasets can be written to HPCC by first calling inferSchema to generate a valid Java Schema and converting the PySpark RDD to a JavaRDD via the _py2java() helper
      long saveToHPCC​(org.apache.spark.sql.types.StructType schema, org.apache.spark.rdd.RDD<org.apache.spark.sql.Row> scalaRDD, String clusterName, String fileName)
      Saves the provided RDD to the specified file within the specified cluster.
      long saveToHPCC​(org.apache.spark.sql.types.StructType schema, org.apache.spark.rdd.RDD<org.apache.spark.sql.Row> scalaRDD, String clusterName, String fileName, org.hpccsystems.dfs.client.CompressionAlgorithm fileCompression, boolean overwrite)
      Saves the provided RDD to the specified file within the specified cluster Note: PySpark datasets can be written to HPCC by first calling inferSchema to generate a valid Java Schema and converting the PySpark RDD to a JavaRDD via the _py2java() helper
    • Constructor Detail

      • HpccFileWriter

        public HpccFileWriter​(org.hpccsystems.ws.client.utils.Connection espconninfo)
                       throws org.hpccsystems.commons.errors.HpccFileException
        Throws:
        org.hpccsystems.commons.errors.HpccFileException
      • HpccFileWriter

        public HpccFileWriter​(String connectionString,
                              String user,
                              String pass)
                       throws Exception
        HpccFileWriter Constructor Attempts to open a connection to the specified HPCC cluster and validates the user.
        Parameters:
        connectionString - of format {http|https}://{HOST}:{PORT}. Host and port are the same as the ecl watch host and port.
        user - a valid ecl watch account
        pass - the password for the provided user
        Throws:
        Exception - general exception
    • Method Detail

      • saveToHPCC

        public long saveToHPCC​(org.apache.spark.rdd.RDD<org.apache.spark.sql.Row> scalaRDD,
                               String clusterName,
                               String fileName)
                        throws Exception,
                               org.hpccsystems.ws.client.wrappers.ArrayOfEspExceptionWrapper
        Saves the provided RDD to the specified file within the specified cluster. Will use HPCC default file compression. Note: PySpark datasets can be written to HPCC by first calling inferSchema to generate a valid Java Schema and converting the PySpark RDD to a JavaRDD via the _py2java() helper
        Parameters:
        scalaRDD - The RDD to save to HPCC
        clusterName - The name of the cluster to save to.
        fileName - The name of the logical file in HPCC to create. Follows HPCC file name conventions.
        Returns:
        Returns the number of records written
        Throws:
        Exception - general exception
        org.hpccsystems.ws.client.wrappers.ArrayOfEspExceptionWrapper - array of esp exception wrapper
      • saveToHPCC

        public long saveToHPCC​(org.apache.spark.sql.types.StructType schema,
                               org.apache.spark.rdd.RDD<org.apache.spark.sql.Row> scalaRDD,
                               String clusterName,
                               String fileName)
                        throws Exception,
                               org.hpccsystems.ws.client.wrappers.ArrayOfEspExceptionWrapper
        Saves the provided RDD to the specified file within the specified cluster. Will use HPCC default file compression. Note: PySpark datasets can be written to HPCC by first calling inferSchema to generate a valid Java Schema and converting the PySpark RDD to a JavaRDD via the _py2java() helper
        Parameters:
        schema - The Schema of the provided RDD
        scalaRDD - The RDD to save to HPCC
        clusterName - The name of the cluster to save to.
        fileName - The name of the logical file in HPCC to create. Follows HPCC file name conventions.
        Returns:
        Returns the number of records written
        Throws:
        Exception - general exception
        org.hpccsystems.ws.client.wrappers.ArrayOfEspExceptionWrapper - array of esp exception wrapper
      • saveToHPCC

        public long saveToHPCC​(org.apache.spark.api.java.JavaRDD<org.apache.spark.sql.Row> javaRDD,
                               String clusterName,
                               String fileName)
                        throws Exception,
                               org.hpccsystems.ws.client.wrappers.ArrayOfEspExceptionWrapper
        Saves the provided RDD to the specified file within the specified cluster. Will use HPCC default file compression. Note: PySpark datasets can be written to HPCC by first calling inferSchema to generate a valid Java Schema and converting the PySpark RDD to a JavaRDD via the _py2java() helper
        Parameters:
        javaRDD - The RDD to save to HPCC
        clusterName - The name of the cluster to save to.
        fileName - The name of the logical file in HPCC to create. Follows HPCC file name conventions.
        Returns:
        Returns the number of records written
        Throws:
        Exception - general exception
        org.hpccsystems.ws.client.wrappers.ArrayOfEspExceptionWrapper - array of esp exception wrapper
      • saveToHPCC

        public long saveToHPCC​(org.apache.spark.sql.types.StructType schema,
                               org.apache.spark.api.java.JavaRDD<org.apache.spark.sql.Row> javaRDD,
                               String clusterName,
                               String fileName)
                        throws Exception,
                               org.hpccsystems.ws.client.wrappers.ArrayOfEspExceptionWrapper
        Saves the provided RDD to the specified file within the specified cluster. Will use HPCC default file compression. Note: PySpark datasets can be written to HPCC by first calling inferSchema to generate a valid Java Schema and converting the PySpark RDD to a JavaRDD via the _py2java() helper
        Parameters:
        schema - The Schema of the provided RDD
        javaRDD - The RDD to save to HPCC
        clusterName - The name of the cluster to save to.
        fileName - The name of the logical file in HPCC to create. Follows HPCC file name conventions.
        Returns:
        Returns the number of records written
        Throws:
        Exception - general exception
        org.hpccsystems.ws.client.wrappers.ArrayOfEspExceptionWrapper - array of esp exception
      • saveToHPCC

        public long saveToHPCC​(org.apache.spark.rdd.RDD<org.apache.spark.sql.Row> scalaRDD,
                               String clusterName,
                               String fileName,
                               org.hpccsystems.dfs.client.CompressionAlgorithm fileCompression,
                               boolean overwrite)
                        throws Exception,
                               org.hpccsystems.ws.client.wrappers.ArrayOfEspExceptionWrapper
        Saves the provided RDD to the specified file within the specified cluster Note: PySpark datasets can be written to HPCC by first calling inferSchema to generate a valid Java Schema and converting the PySpark RDD to a JavaRDD via the _py2java() helper
        Parameters:
        scalaRDD - The RDD to save to HPCC
        clusterName - The name of the cluster to save to.
        fileName - The name of the logical file in HPCC to create. Follows HPCC file name conventions.
        fileCompression - compression algorithm to use on files
        overwrite - overwrite flag
        Returns:
        Returns the number of records written
        Throws:
        Exception - general exception
        org.hpccsystems.ws.client.wrappers.ArrayOfEspExceptionWrapper - array of esp exception wrapper
      • saveToHPCC

        public long saveToHPCC​(org.apache.spark.sql.types.StructType schema,
                               org.apache.spark.rdd.RDD<org.apache.spark.sql.Row> scalaRDD,
                               String clusterName,
                               String fileName,
                               org.hpccsystems.dfs.client.CompressionAlgorithm fileCompression,
                               boolean overwrite)
                        throws Exception,
                               org.hpccsystems.ws.client.wrappers.ArrayOfEspExceptionWrapper
        Saves the provided RDD to the specified file within the specified cluster Note: PySpark datasets can be written to HPCC by first calling inferSchema to generate a valid Java Schema and converting the PySpark RDD to a JavaRDD via the _py2java() helper
        Parameters:
        schema - The Schema of the provided RDD
        scalaRDD - The RDD to save to HPCC
        clusterName - The name of the cluster to save to.
        fileName - The name of the logical file in HPCC to create. Follows HPCC file name conventions.
        fileCompression - compression algorithm to use on files
        overwrite - overwrite flag
        Returns:
        Returns the number of records written
        Throws:
        Exception - general exception
        org.hpccsystems.ws.client.wrappers.ArrayOfEspExceptionWrapper - array of esp exception wrapper
      • saveToHPCC

        public long saveToHPCC​(org.apache.spark.api.java.JavaRDD<org.apache.spark.sql.Row> javaRDD,
                               String clusterName,
                               String fileName,
                               org.hpccsystems.dfs.client.CompressionAlgorithm fileCompression,
                               boolean overwrite)
                        throws Exception,
                               org.hpccsystems.ws.client.wrappers.ArrayOfEspExceptionWrapper
        Saves the provided RDD to the specified file within the specified cluster Note: PySpark datasets can be written to HPCC by first calling inferSchema to generate a valid Java Schema and converting the PySpark RDD to a JavaRDD via the _py2java() helper
        Parameters:
        javaRDD - The RDD to save to HPCC
        clusterName - The name of the cluster to save to.
        fileName - The name of the logical file in HPCC to create. Follows HPCC file name conventions.
        fileCompression - compression algorithm to use on files
        overwrite - overwrite flag
        Returns:
        Returns the number of records written
        Throws:
        Exception - general exception
        org.hpccsystems.ws.client.wrappers.ArrayOfEspExceptionWrapper - array of esp exception wrapper
      • saveToHPCC

        public long saveToHPCC​(org.apache.spark.sql.types.StructType schema,
                               org.apache.spark.api.java.JavaRDD<org.apache.spark.sql.Row> javaRDD,
                               String clusterName,
                               String fileName,
                               org.hpccsystems.dfs.client.CompressionAlgorithm fileCompression,
                               boolean overwrite)
                        throws Exception,
                               org.hpccsystems.ws.client.wrappers.ArrayOfEspExceptionWrapper
        Saves the provided RDD to the specified file within the specified cluster Note: PySpark datasets can be written to HPCC by first calling inferSchema to generate a valid Java Schema and converting the PySpark RDD to a JavaRDD via the _py2java() helper
        Parameters:
        schema - The Schema of the provided RDD
        javaRDD - The RDD to save to HPCC
        clusterName - The name of the cluster to save to.
        fileName - The name of the logical file in HPCC to create. Follows HPCC file name conventions.
        fileCompression - compression algorithm to use on files
        overwrite - overwrite flag
        Returns:
        Returns the number of records written
        Throws:
        Exception - general exception
        org.hpccsystems.ws.client.wrappers.ArrayOfEspExceptionWrapper - array of esp exception wrapper
      • saveToHPCC

        public long saveToHPCC​(org.apache.spark.SparkContext sc,
                               org.apache.spark.rdd.RDD<org.apache.spark.sql.Row> scalaRDD,
                               String clusterName,
                               String fileName)
                        throws Exception,
                               org.hpccsystems.ws.client.wrappers.ArrayOfEspExceptionWrapper
        Saves the provided RDD to the specified file within the specified cluster. Will use HPCC default file compression. Note: PySpark datasets can be written to HPCC by first calling inferSchema to generate a valid Java Schema and converting the PySpark RDD to a JavaRDD via the _py2java() helper
        Parameters:
        sc - The current SparkContext
        scalaRDD - The RDD to save to HPCC
        clusterName - The name of the cluster to save to.
        fileName - The name of the logical file in HPCC to create. Follows HPCC file name conventions.
        Returns:
        Returns the number of records written
        Throws:
        Exception - general exception
        org.hpccsystems.ws.client.wrappers.ArrayOfEspExceptionWrapper - array of esp exception wrapper
      • saveToHPCC

        public long saveToHPCC​(org.apache.spark.SparkContext sc,
                               org.apache.spark.api.java.JavaRDD<org.apache.spark.sql.Row> javaRDD,
                               String clusterName,
                               String fileName)
                        throws Exception,
                               org.hpccsystems.ws.client.wrappers.ArrayOfEspExceptionWrapper
        Saves the provided RDD to the specified file within the specified cluster. Will use HPCC default file compression. Note: PySpark datasets can be written to HPCC by first calling inferSchema to generate a valid Java Schema and converting the PySpark RDD to a JavaRDD via the _py2java() helper
        Parameters:
        sc - The current SparkContext
        javaRDD - The RDD to save to HPCC
        clusterName - The name of the cluster to save to.
        fileName - The name of the logical file in HPCC to create. Follows HPCC file name conventions.
        Returns:
        Returns the number of records written
        Throws:
        Exception - general exception
        org.hpccsystems.ws.client.wrappers.ArrayOfEspExceptionWrapper - array of esp exception wrapper
      • saveToHPCC

        public long saveToHPCC​(org.apache.spark.SparkContext sc,
                               org.apache.spark.rdd.RDD<org.apache.spark.sql.Row> scalaRDD,
                               String clusterName,
                               String fileName,
                               org.hpccsystems.dfs.client.CompressionAlgorithm fileCompression,
                               boolean overwrite)
                        throws Exception,
                               org.hpccsystems.ws.client.wrappers.ArrayOfEspExceptionWrapper
        Saves the provided RDD to the specified file within the specified cluster Note: PySpark datasets can be written to HPCC by first calling inferSchema to generate a valid Java Schema and converting the PySpark RDD to a JavaRDD via the _py2java() helper
        Parameters:
        sc - The current SparkContext
        scalaRDD - The RDD to save to HPCC
        clusterName - The name of the cluster to save to.
        fileName - The name of the logical file in HPCC to create. Follows HPCC file name conventions.
        fileCompression - compression algorithm to use on files
        overwrite - overwrite flag
        Returns:
        Returns the number of records written
        Throws:
        Exception - general exception
        org.hpccsystems.ws.client.wrappers.ArrayOfEspExceptionWrapper - array of esp exception wrapper
      • saveToHPCC

        public long saveToHPCC​(org.apache.spark.SparkContext sc,
                               org.apache.spark.sql.types.StructType rddSchema,
                               org.apache.spark.api.java.JavaRDD<org.apache.spark.sql.Row> rdd,
                               String clusterName,
                               String fileName,
                               org.hpccsystems.dfs.client.CompressionAlgorithm fileCompression,
                               boolean overwrite)
                        throws Exception,
                               org.hpccsystems.ws.client.wrappers.ArrayOfEspExceptionWrapper
        Saves the provided RDD to the specified file within the specified cluster Note: PySpark datasets can be written to HPCC by first calling inferSchema to generate a valid Java Schema and converting the PySpark RDD to a JavaRDD via the _py2java() helper
        Parameters:
        sc - The current SparkContext
        rddSchema - rdd schema
        rdd - java rdd row
        clusterName - The name of the cluster to save to.
        fileName - The name of the logical file in HPCC to create. Follows HPCC file name conventions.
        fileCompression - compression algorithm to use on files
        overwrite - ovewrite flag
        Returns:
        Returns the number of records written
        Throws:
        Exception - general exception
        org.hpccsystems.ws.client.wrappers.ArrayOfEspExceptionWrapper - array of esp exception wrapper
      • inferSchema

        public org.apache.spark.sql.types.StructType inferSchema​(List<PySparkField> exampleFields)
                                                          throws Exception
        Generates an inferred schema based on an example Map of FieldNames to Example Field Objects. This function is targeted primary at helping PySpark users write datasets back to HPCC.
        Parameters:
        exampleFields - list of python spark fields
        Returns:
        Returns a valid Spark schema based on the example rowDictionary
        Throws:
        Exception - general exception