Package | Description |
---|---|
org.archive.modules.canonicalize |
Modifier and Type | Class and Description |
---|---|
class |
BaseRule
Base of all rules applied canonicalizing a URL that are configurable
via the Heritrix settings system.
|
class |
FixupQueryString
Strip any trailing question mark.
|
class |
LowercaseRule
Lowercases the URL.
|
class |
RegexRule
General conversion rule.
|
class |
StripExtraSlashes
Strip any extra slashes, '/', found in the path.
|
class |
StripSessionCFIDs
Strip cold fusion session ids.
|
class |
StripSessionIDs
Strip known session ids.
|
class |
StripUserinfoRule
Strip any 'userinfo' found on http/https URLs.
|
class |
StripWWWNRule
Strip any 'www[0-9]*' found on http/https URLs IF they have some
path/query component (content after third slash).
|
class |
StripWWWRule
Strip any 'www' found on http/https URLs, IF they have some
path/query component (content after third slash).
|
Modifier and Type | Method and Description |
---|---|
static List<CanonicalizationRule> |
RulesCanonicalizationPolicy.getDefaultRules()
A reasonable set of default rules to use, if no others are
provided by operator configuration.
|
List<CanonicalizationRule> |
RulesCanonicalizationPolicy.getRules() |
Modifier and Type | Method and Description |
---|---|
void |
RulesCanonicalizationPolicy.setRules(List<CanonicalizationRule> rules) |
Copyright © 2003–2019 Internet Archive. All rights reserved.