public final class URLCanonicalizer extends Object
Modifier and Type | Method and Description |
---|---|
static String |
buildCleanedParametersURIRepresentation(org.apache.commons.httpclient.URI uri,
SpiderParam.HandleParametersOption handleParameters,
boolean handleODataParametersVisited)
Builds a String representation of the URI with cleaned parameters, that can be used when checking if an
URI was already visited.
|
static String |
getCanonicalURL(String url)
Gets the canonical url.
|
static String |
getCanonicalURL(String url,
String baseURL)
Gets the canonical url, starting from a relative or absolute url found in a given context (baseURL).
|
public static String getCanonicalURL(String url)
url
- the urlpublic static String getCanonicalURL(String url, String baseURL)
url
- the url string defining the referencebaseURL
- the context in which this url was foundpublic static String buildCleanedParametersURIRepresentation(org.apache.commons.httpclient.URI uri, SpiderParam.HandleParametersOption handleParameters, boolean handleODataParametersVisited) throws org.apache.commons.httpclient.URIException
getCanonicalURL(String)
.
When building the URI representation, the same format should be used for all the cases, as it may affect the number of times the pages are visited and reported if the option HandleParametersOption is changed while the spider is running.
uri
- the urihandleParameters
- the handle parameters optionhandleODataParametersVisited
- Should we handle specific OData parametersorg.apache.commons.httpclient.URIException
- the URI exception