java.lang.Object

com.metreeca.xml.actions.Crawl

All Implemented Interfaces:: Function<String,Stream<String>>

public final class Crawl extends Object implements Function<String,Stream<String>>

Site crawling.

Maps site root URLs to streams of URLs for HTML site pages.

Constructor Summary

Constructors

Constructor

Description

Crawl()
Method Summary

Modifier and Type

Method

Description

Stream<String>

apply(String root)

Crawls a site.

Crawl

fetch(Fetch fetch)

Configures the fetch action (defaults to Fetch.

Crawl

focus(Function<? super Node,Optional<Node>> focus)

Configures the content focus action (defaults to the identity function).

Crawl

prune(BiPredicate<String,String> prune)

Configures the prune action (defaults to always pass).

Crawl

threads(int threads)

Configures the number of concurrent requests (defaults to the number of processors)

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Methods inherited from interface java.util.function.Function
andThen, compose

Constructor Details
- Crawl
  
  public Crawl()
Method Details
- threads
  
  public Crawl threads(int threads)
  
  Configures the number of concurrent requests (defaults to the number of processors)
  
  Parameters:
  
  threads - the maximum number of concurrent resource fetches; equivalent to the number of system processors if equal to zero
  
  Returns:
  
  this action
  
  Throws:
  
  IllegalArgumentException - if threads is negative
- fetch
  
  public Crawl fetch(Fetch fetch)
  
  Configures the fetch action (defaults to Fetch.
  
  Parameters:
  
  fetch - the action used to fetch pages
  
  Returns:
  
  this action
  
  Throws:
  
  NullPointerException - if fetch is null
- focus
  
  public Crawl focus(Function<? super Node,Optional<Node>> focus)
  
  Configures the content focus action (defaults to the identity function).
  
  Parameters:
  
  focus - a function taking as argument an element and returning an optional partial/restructured focus element, if one was identified, or an empty optional, otherwise
  
  Returns:
  
  this action
  
  Throws:
  
  NullPointerException - if focus is null
- prune
  
  public Crawl prune(BiPredicate<String,String> prune)
  
  Configures the prune action (defaults to always pass).
  
  Parameters:
  
  prune - a bi-predicate taking as arguments the site root URL and a link URL and returning true if the link targets a site page or false otherwise
  
  Returns:
  
  this action
  
  Throws:
  
  NullPointerException - if prune is null
- apply
  
  public Stream<String> apply(String root)
  
  Crawls a site.
  
  Specified by:
  
  apply in interface Function<String,Stream<String>>
  
  Parameters:
  
  root - the root URL of the site to be crawled
  
  Returns:
  
  a stream of links to nested HTML pages reachable from the root root; empty if root is null or empty

Class Crawl

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Methods inherited from interface java.util.function.Function

Constructor Details

Crawl

Method Details

threads

fetch

focus

prune

apply