Package org.archive.modules.recrawl
Class FetchHistoryProcessor
java.lang.Object
org.archive.modules.Processor
org.archive.modules.recrawl.FetchHistoryProcessor
- All Implemented Interfaces:
org.archive.checkpointing.Checkpointable
,org.archive.spring.HasKeyedProperties
,org.springframework.beans.factory.Aware
,org.springframework.beans.factory.BeanNameAware
,org.springframework.context.Lifecycle
public class FetchHistoryProcessor extends Processor
Maintain a history of fetch information inside the CrawlURI's attributes.
- Version:
- $Date: 2006-09-25 20:19:54 +0000 (Mon, 25 Sep 2006) $, $Revision: 4654 $
- Author:
- gojomo
-
Field Summary
Fields Modifier and Type Field Description protected int
historyLength
Desired history array length. -
Constructor Summary
Constructors Constructor Description FetchHistoryProcessor()
-
Method Summary
Modifier and Type Method Description int
getHistoryLength()
static boolean
hasIdenticalDigest(CrawlURI curi)
Utility method for testing if a CrawlURI's last two history entries (one being the most recent fetch) have identical content-digest information.protected HashMap<String,Object>[]
historyRealloc(CrawlURI curi)
Get or create proper-sized history arrayprotected void
innerProcess(CrawlURI puri)
Actually performs the process.protected void
saveHeader(CrawlURI curi, Map<String,Object> map, String key)
Save a header from the given HTTP operation into the Map.void
setHistoryLength(int length)
protected boolean
shouldProcess(CrawlURI curi)
Determines whether the given uri should be processed by this processor.Methods inherited from class org.archive.modules.Processor
doCheckpoint, finishCheckpoint, flattenVia, fromCheckpointJson, getBeanName, getEnabled, getKeyedProperties, getRecordedSize, getShouldProcessRule, getURICount, hasHttpAuthenticationCredential, innerProcessResult, innerRejectProcess, isRunning, isSuccess, process, report, setBeanName, setEnabled, setRecoveryCheckpoint, setShouldProcessRule, start, startCheckpoint, stop, toCheckpointJson
-
Field Details
-
historyLength
protected int historyLengthDesired history array length.
-
-
Constructor Details
-
FetchHistoryProcessor
public FetchHistoryProcessor()
-
-
Method Details
-
getHistoryLength
public int getHistoryLength() -
setHistoryLength
public void setHistoryLength(int length) -
innerProcess
Description copied from class:Processor
Actually performs the process. By the time this method is invoked, it is known that the given URI passes theProcessor.getEnabled()
, theProcessor.getShouldProcessRule()
and theProcessor.shouldProcess(CrawlURI)
tests.- Specified by:
innerProcess
in classProcessor
- Parameters:
puri
- the URI to process- Throws:
InterruptedException
- if the thread is interrupted
-
hasIdenticalDigest
Utility method for testing if a CrawlURI's last two history entries (one being the most recent fetch) have identical content-digest information.- Parameters:
curi
- CrawlURI to test- Returns:
- true if last two history entries have identical digests, otherwise false
-
historyRealloc
Get or create proper-sized history array -
saveHeader
Save a header from the given HTTP operation into the Map. -
shouldProcess
Description copied from class:Processor
Determines whether the given uri should be processed by this processor. For instance, a processor that only works on HTML content might reject the URI if its content type is not "text/html", if its content length is zero, and so on.- Specified by:
shouldProcess
in classProcessor
- Parameters:
curi
- the URI to test- Returns:
- true if this processor should process that uri; false if not
-