Package org.archive.crawler.postprocessor


package org.archive.crawler.postprocessor
  • Classes
    Class
    Description
    Processor which sends all candidate outlinks through the CandidateChain, scheduling those with non-negative status codes to the frontier.
    A step, late in the processing of a CrawlURI, for marking-up the CrawlURI with values to affect frontier disposition, and updating information that may have been affected by the fetch.
    Deprecated.
    Is highly system dependent.
    The most simple forced-rescheduling step possible: use a local setting (perhaps overlaid to vary based on the URI) to set an exact future reschedule time, as a delay from now.
    Run CrawlURI links carried in the passed CrawlURI through a filter and 'handle' rejections.