Package org.hpccsystems.spark
Class HpccFile
- java.lang.Object
-
- org.hpccsystems.dfs.client.HPCCFile
-
- org.hpccsystems.spark.HpccFile
-
- All Implemented Interfaces:
Serializable
public class HpccFile extends org.hpccsystems.dfs.client.HPCCFile implements Serializable
Access to file content on a collection of one or more HPCC clusters.- See Also:
- Serialized Form
-
-
Constructor Summary
Constructors Constructor Description HpccFile(String fileName, String connectionString, String user, String pass)
Constructor for the HpccFile.HpccFile(String fileName, org.hpccsystems.ws.client.utils.Connection espconninfo)
Constructor for the HpccFile.HpccFile(String fileName, org.hpccsystems.ws.client.utils.Connection espconninfo, String targetColumnList, String filter, org.hpccsystems.dfs.cluster.RemapInfo remap_info, int maxParts, String targetfilecluster)
Constructor for the HpccFile.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>
getDataframe(org.apache.spark.sql.SparkSession session)
Make a Spark Dataframe (Dataset (Row)) of THOR data available.int
getFilePartRecordLimit()
Returns the current file part record limitHpccRDD
getRDD()
Make a Spark Resilient Distributed Dataset (RDD) that provides access to THOR based datasets.HpccRDD
getRDD(org.apache.spark.SparkContext sc)
Make a Spark Resilient Distributed Dataset (RDD) that provides access to THOR based datasets.void
setFilePartRecordLimit(int limit)
Set file part record limit-
Methods inherited from class org.hpccsystems.dfs.client.HPCCFile
findMatchingPartitions, getClusterRemapInfo, getFileAccessExpirySecs, getFileName, getFileParts, getFilter, getOriginalFileMetadata, getPartitionProcessor, getProjectedRecordDefinition, getProjectList, getRecordDefinition, getTargetfilecluster, isIndex, setClusterRemapInfo, setFileAccessExpirySecs, setFilter, setFilter, setProjectList, setTargetfilecluster
-
-
-
-
Constructor Detail
-
HpccFile
public HpccFile(String fileName, org.hpccsystems.ws.client.utils.Connection espconninfo) throws org.hpccsystems.commons.errors.HpccFileException
Constructor for the HpccFile. Captures HPCC logical file information from the DALI Server for the clusters behind the ESP named by the Connection.- Parameters:
fileName
- The HPCC file nameespconninfo
- The ESP connection info (protocol,address,port,user,pass)- Throws:
org.hpccsystems.commons.errors.HpccFileException
- hpcc file exection
-
HpccFile
public HpccFile(String fileName, String connectionString, String user, String pass) throws MalformedURLException, org.hpccsystems.commons.errors.HpccFileException
Constructor for the HpccFile. Captures HPCC logical file information from the DALI Server for the clusters behind the ESP named by the Connection.- Parameters:
fileName
- The HPCC file nameconnectionString
- to eclwatch. Format: {http|https}://{HOST}:{PORT}.user
- usernamepass
- password- Throws:
MalformedURLException
- Malformed URL exceptionorg.hpccsystems.commons.errors.HpccFileException
- hpcc file exception
-
HpccFile
public HpccFile(String fileName, org.hpccsystems.ws.client.utils.Connection espconninfo, String targetColumnList, String filter, org.hpccsystems.dfs.cluster.RemapInfo remap_info, int maxParts, String targetfilecluster) throws org.hpccsystems.commons.errors.HpccFileException
Constructor for the HpccFile. Captures HPCC logical file information from the DALI Server for the clusters behind the ESP named by the IP address and re-maps the address information for the THOR nodes to visible addresses when the THOR clusters are virtual.- Parameters:
fileName
- The HPCC file nameespconninfo
- esp connection information objecttargetColumnList
- a comma separated list of column names in dotted notation for columns within compound columns.filter
- a file filter to select records of interestremap_info
- address and port re-mapping info for THOR clustermaxParts
- optional the maximum number of partitions or zero for no maxtargetfilecluster
- optional - the hpcc cluster the target file resides in- Throws:
org.hpccsystems.commons.errors.HpccFileException
- hpcc file exception
-
-
Method Detail
-
setFilePartRecordLimit
public void setFilePartRecordLimit(int limit)
Set file part record limit- Parameters:
limit
- fire part record limit
-
getFilePartRecordLimit
public int getFilePartRecordLimit()
Returns the current file part record limit- Returns:
- returns file part record limit
-
getRDD
public HpccRDD getRDD() throws org.hpccsystems.commons.errors.HpccFileException
Make a Spark Resilient Distributed Dataset (RDD) that provides access to THOR based datasets. Uses existing SparkContext, allows this function to be used from PySpark.- Returns:
- An RDD of THOR data.
- Throws:
org.hpccsystems.commons.errors.HpccFileException
- When there are errors reaching the THOR data
-
getRDD
public HpccRDD getRDD(org.apache.spark.SparkContext sc) throws org.hpccsystems.commons.errors.HpccFileException
Make a Spark Resilient Distributed Dataset (RDD) that provides access to THOR based datasets.- Parameters:
sc
- Spark Context- Returns:
- An RDD of THOR data.
- Throws:
org.hpccsystems.commons.errors.HpccFileException
- When there are errors reaching the THOR data
-
getDataframe
public org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> getDataframe(org.apache.spark.sql.SparkSession session) throws org.hpccsystems.commons.errors.HpccFileException
Make a Spark Dataframe (Dataset (Row)) of THOR data available.- Parameters:
session
- the Spark Session object- Returns:
- a Dataframe of THOR data
- Throws:
org.hpccsystems.commons.errors.HpccFileException
- when htere are errors reaching the THOR data.
-
-