org.archive.modules.net (Heritrix 3: 'modules' subproject (reusable components) 3.4.0-20210621 API)

Class Summary
Class	Description
BdbServerCache	ServerCache backed by BDB big maps; the usual choice for crawls.
CrawlHost	Represents a single remote "host".
CrawlServer	Represents a single remote "server".
CustomRobotsPolicy	Follow a custom-written robots policy, rather than the site's own declarations Does not support overlays of different custom-robots; instead it is recommended each custom policy be declared as a separate bean, with a distinct name.
DefaultTempDirProvider
FirstNamedRobotsPolicy	Working from an ordered list of potential User-Agents, consisting of first the regularly-configured User-Agent and then those in the candidateUserAgents list, consider each potential agent in order.
IgnoreRobotsPolicy	Policy to ignore robots.
MostFavoredRobotsPolicy	Follow a most-favored robots policy -- allowing an URL if either the conventionally-configured User-Agent, or any of a number of alternate User-Agents (from the candidateUserAgents list) would be allowed.
ObeyRobotsPolicy	Classic obey-robots-as-declared policy.
RobotsDirectives	Represents the directives that apply to a user-agent (or set of user-agents)
RobotsPolicy	RobotsPolicy represents the strategy used by the crawler for determining how robots.txt files will be honored.
Robotstxt	Utility class for parsing and representing 'robots.txt' format directives, into a list of named user-agents and map from user-agents to RobotsDirectives.
ServerCache	Abstract class for crawl-global registry of CrawlServer (host:port) and CrawlHost (hostname) objects.

Package org.archive.modules.net