Class MetadataTransfer
- java.lang.Object
-
- com.digitalpebble.stormcrawler.util.MetadataTransfer
-
public class MetadataTransfer extends Object
Implements the logic of how the metadata should be passed to the outlinks, what should be stored back in the persistence layer etc...
-
-
Field Summary
Fields Modifier and Type Field Description static String
depthKeyName
Metadata key name for tracking the depthstatic String
maxDepthKeyName
Metadata key name for tracking a non-default max depthstatic String
metadataPersistParamName
Parameter name indicating which metadata to persist for a given document but not transfer to outlinks.static String
metadataTransferClassParamName
Class to use for transfering metadata to outlinks.static String
metadataTransferParamName
Parameter name indicating which metadata to transfer to the outlinks and persist for a given document.static String
trackDepthParamName
Parameter name indicating whether to track the depth from seed.static String
trackPathParamName
Parameter name indicating whether to track the url path or not.static String
urlPathKeyName
Metadata key name for tracking the source URLs
-
Constructor Summary
Constructors Constructor Description MetadataTransfer()
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description protected void
configure(Map<String,Object> conf)
Metadata
filter(Metadata metadata)
Determine which metadata should be persisted for a given document including those which are not necessarily transferred to the outlinksstatic MetadataTransfer
getInstance(Map<String,Object> conf)
Metadata
getMetaForOutlink(String targetURL, String sourceURL, Metadata parentMD)
Determine which metadata should be transferred to an outlink.
-
-
-
Field Detail
-
metadataTransferClassParamName
public static final String metadataTransferClassParamName
Class to use for transfering metadata to outlinks. Must extend the class MetadataTransfer.- See Also:
- Constant Field Values
-
metadataTransferParamName
public static final String metadataTransferParamName
Parameter name indicating which metadata to transfer to the outlinks and persist for a given document. Value is either a vector or a single valued String.- See Also:
- Constant Field Values
-
metadataPersistParamName
public static final String metadataPersistParamName
Parameter name indicating which metadata to persist for a given document but not transfer to outlinks. Value is either a vector or a single valued String.- See Also:
- Constant Field Values
-
trackPathParamName
public static final String trackPathParamName
Parameter name indicating whether to track the url path or not. Boolean value, true by default.- See Also:
- Constant Field Values
-
trackDepthParamName
public static final String trackDepthParamName
Parameter name indicating whether to track the depth from seed. Boolean value, true by default.- See Also:
- Constant Field Values
-
urlPathKeyName
public static final String urlPathKeyName
Metadata key name for tracking the source URLs- See Also:
- Constant Field Values
-
depthKeyName
public static final String depthKeyName
Metadata key name for tracking the depth- See Also:
- Constant Field Values
-
maxDepthKeyName
public static final String maxDepthKeyName
Metadata key name for tracking a non-default max depth- See Also:
- Constant Field Values
-
-
Method Detail
-
getInstance
public static MetadataTransfer getInstance(Map<String,Object> conf)
-
getMetaForOutlink
public Metadata getMetaForOutlink(String targetURL, String sourceURL, Metadata parentMD)
Determine which metadata should be transferred to an outlink. Adds additional metadata like the URL path.
-
-