Class HostnameQueueAssignmentPolicyWithLimits

All Implemented Interfaces:
Serializable, org.archive.spring.HasKeyedProperties

public class HostnameQueueAssignmentPolicyWithLimits extends HostnameQueueAssignmentPolicy
A variation on @link HostnameQueueAssignmentPolicy that allows the operator (per sheet) to specify the maximum number of domains and sub-domains to use for the queue name.
See Also:
  • Field Details

  • Constructor Details

    • HostnameQueueAssignmentPolicyWithLimits

      public HostnameQueueAssignmentPolicyWithLimits()
  • Method Details

    • setLimit

      public void setLimit(int limit)
      Set the maximum number of domains and sub-domains to include in the queue name.

      E.g. if limit is set to 2 than the following assignments are made:
      example.com -> example.com
      www.example.com -> example.com
      subdomain.example.com -> example.com
      www.subdomain.example.com -> example.com
      otherdomain.com -> otherdomain.com

      Note: No accommodation is made for TLDs, like .co.uk that always use two levels. Operators should use use SurtPrefixesSheetAssociation sheets to apply these limits appropriately if crawling a mixture of TLDs with and without the mandatory second level or only apply the limit on specific domains.

      Parameters:
      limit - The limit on number of domains to use in assigning a queue name to a URI.
    • getLimit

      public int getLimit()
    • getCoreKey

      protected String getCoreKey(org.archive.net.UURI basis)
      Overrides:
      getCoreKey in class HostnameQueueAssignmentPolicy
    • getLimitedHostname

      protected String getLimitedHostname(String hostname, int limit)