Class HpccFile

  • All Implemented Interfaces:
    Serializable

    public class HpccFile
    extends org.hpccsystems.dfs.client.HPCCFile
    implements Serializable
    Access to file content on a collection of one or more HPCC clusters.
    See Also:
    Serialized Form
    • Constructor Summary

      Constructors 
      Constructor Description
      HpccFile​(String fileName, String connectionString, String user, String pass)
      Constructor for the HpccFile.
      HpccFile​(String fileName, org.hpccsystems.ws.client.utils.Connection espconninfo)
      Constructor for the HpccFile.
      HpccFile​(String fileName, org.hpccsystems.ws.client.utils.Connection espconninfo, String targetColumnList, String filter, org.hpccsystems.dfs.cluster.RemapInfo remap_info, int maxParts, String targetfilecluster)
      Constructor for the HpccFile.
    • Method Summary

      All Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> getDataframe​(org.apache.spark.sql.SparkSession session)
      Make a Spark Dataframe (Dataset (Row)) of THOR data available.
      int getFilePartRecordLimit()
      Returns the current file part record limit
      HpccRDD getRDD()
      Make a Spark Resilient Distributed Dataset (RDD) that provides access to THOR based datasets.
      HpccRDD getRDD​(org.apache.spark.SparkContext sc)
      Make a Spark Resilient Distributed Dataset (RDD) that provides access to THOR based datasets.
      void setFilePartRecordLimit​(int limit)
      Set file part record limit
      • Methods inherited from class org.hpccsystems.dfs.client.HPCCFile

        findMatchingPartitions, getClusterRemapInfo, getFileAccessExpirySecs, getFileName, getFileParts, getFilter, getOriginalFileMetadata, getPartitionProcessor, getProjectedRecordDefinition, getProjectList, getRecordDefinition, getTargetfilecluster, isIndex, setClusterRemapInfo, setFileAccessExpirySecs, setFilter, setFilter, setProjectList, setTargetfilecluster
    • Constructor Detail

      • HpccFile

        public HpccFile​(String fileName,
                        org.hpccsystems.ws.client.utils.Connection espconninfo)
                 throws org.hpccsystems.commons.errors.HpccFileException
        Constructor for the HpccFile. Captures HPCC logical file information from the DALI Server for the clusters behind the ESP named by the Connection.
        Parameters:
        fileName - The HPCC file name
        espconninfo - The ESP connection info (protocol,address,port,user,pass)
        Throws:
        org.hpccsystems.commons.errors.HpccFileException - hpcc file exection
      • HpccFile

        public HpccFile​(String fileName,
                        String connectionString,
                        String user,
                        String pass)
                 throws MalformedURLException,
                        org.hpccsystems.commons.errors.HpccFileException
        Constructor for the HpccFile. Captures HPCC logical file information from the DALI Server for the clusters behind the ESP named by the Connection.
        Parameters:
        fileName - The HPCC file name
        connectionString - to eclwatch. Format: {http|https}://{HOST}:{PORT}.
        user - username
        pass - password
        Throws:
        MalformedURLException - Malformed URL exception
        org.hpccsystems.commons.errors.HpccFileException - hpcc file exception
      • HpccFile

        public HpccFile​(String fileName,
                        org.hpccsystems.ws.client.utils.Connection espconninfo,
                        String targetColumnList,
                        String filter,
                        org.hpccsystems.dfs.cluster.RemapInfo remap_info,
                        int maxParts,
                        String targetfilecluster)
                 throws org.hpccsystems.commons.errors.HpccFileException
        Constructor for the HpccFile. Captures HPCC logical file information from the DALI Server for the clusters behind the ESP named by the IP address and re-maps the address information for the THOR nodes to visible addresses when the THOR clusters are virtual.
        Parameters:
        fileName - The HPCC file name
        espconninfo - esp connection information object
        targetColumnList - a comma separated list of column names in dotted notation for columns within compound columns.
        filter - a file filter to select records of interest
        remap_info - address and port re-mapping info for THOR cluster
        maxParts - optional the maximum number of partitions or zero for no max
        targetfilecluster - optional - the hpcc cluster the target file resides in
        Throws:
        org.hpccsystems.commons.errors.HpccFileException - hpcc file exception
    • Method Detail

      • setFilePartRecordLimit

        public void setFilePartRecordLimit​(int limit)
        Set file part record limit
        Parameters:
        limit - fire part record limit
      • getFilePartRecordLimit

        public int getFilePartRecordLimit()
        Returns the current file part record limit
        Returns:
        returns file part record limit
      • getRDD

        public HpccRDD getRDD()
                       throws org.hpccsystems.commons.errors.HpccFileException
        Make a Spark Resilient Distributed Dataset (RDD) that provides access to THOR based datasets. Uses existing SparkContext, allows this function to be used from PySpark.
        Returns:
        An RDD of THOR data.
        Throws:
        org.hpccsystems.commons.errors.HpccFileException - When there are errors reaching the THOR data
      • getRDD

        public HpccRDD getRDD​(org.apache.spark.SparkContext sc)
                       throws org.hpccsystems.commons.errors.HpccFileException
        Make a Spark Resilient Distributed Dataset (RDD) that provides access to THOR based datasets.
        Parameters:
        sc - Spark Context
        Returns:
        An RDD of THOR data.
        Throws:
        org.hpccsystems.commons.errors.HpccFileException - When there are errors reaching the THOR data
      • getDataframe

        public org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> getDataframe​(org.apache.spark.sql.SparkSession session)
                                                                            throws org.hpccsystems.commons.errors.HpccFileException
        Make a Spark Dataframe (Dataset (Row)) of THOR data available.
        Parameters:
        session - the Spark Session object
        Returns:
        a Dataframe of THOR data
        Throws:
        org.hpccsystems.commons.errors.HpccFileException - when htere are errors reaching the THOR data.