Package ai.preferred.venom.fetcher
Class AsyncFetcher.Builder
- java.lang.Object
-
- ai.preferred.venom.fetcher.AsyncFetcher.Builder
-
- Enclosing class:
- AsyncFetcher
public static final class AsyncFetcher.Builder extends java.lang.Object
A builder for async fetcher class.
-
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description AsyncFetcher
build()
Builds the fetcher with the options specified.AsyncFetcher.Builder
disableCompression()
Disables request for compress pages and to decompress pages after it is fetched.AsyncFetcher.Builder
disableCookies()
Disables cookie storage.AsyncFetcher.Builder
register(@NotNull Callback callback)
Register any callbacks that will be called when a page has been fetched.AsyncFetcher.Builder
setConnectionRequestTimeout(int connectionRequestTimeout)
The timeout in milliseconds used when requesting a connection from the connection manager.AsyncFetcher.Builder
setConnectTimeout(int connectTimeout)
Determines the timeout in milliseconds until a connection is established.AsyncFetcher.Builder
setFileManager(FileManager fileManager)
Sets the FileManager to be used.AsyncFetcher.Builder
setHeaders(@NotNull java.util.Map<java.lang.String,java.lang.String> headers)
Sets the headers to be used when fetching items.AsyncFetcher.Builder
setMaxConnections(int maxConnections)
Sets the maximum allowable connections at an instance.AsyncFetcher.Builder
setMaxRouteConnections(int maxRouteConnections)
Sets the maximum allowable connections at an instance for a particular route (host).AsyncFetcher.Builder
setNumIoThreads(int numIoThreads)
Number of httpclient dispatcher threads.AsyncFetcher.Builder
setProxyProvider(ProxyProvider proxyProvider)
Sets the ProxyProvider to be used.AsyncFetcher.Builder
setRedirectStrategy(org.apache.http.client.RedirectStrategy redirectStrategy)
Sets the redirection strategy for a response received by the fetcher.AsyncFetcher.Builder
setSocketTimeout(int socketTimeout)
Defines the socket timeout (SO_TIMEOUT
) in milliseconds, which is the timeout for waiting for data or, put differently, a maximum period inactivity between two consecutive data packets).AsyncFetcher.Builder
setSslContext(javax.net.ssl.SSLContext sslContext)
Sets the ssl context for an encrypted response.AsyncFetcher.Builder
setStopCodes(@javax.validation.constraints.NotNull int... codes)
Set a list of stop code that will interrupt crawling.AsyncFetcher.Builder
setThreadFactory(@NotNull java.util.concurrent.ThreadFactory threadFactory)
Set the thread factory that creates the httpclient dispatcher threads.AsyncFetcher.Builder
setUserAgent(@NotNull UserAgent userAgent)
Sets the UserAgent to be used, if not set, default will be chosen.AsyncFetcher.Builder
setValidator(@NotNull Validator validator)
Sets the Validator to be used.AsyncFetcher.Builder
setValidator(@NotNull Validator... validators)
Sets the multiple validators to be used.AsyncFetcher.Builder
setValidatorRouter(ValidatorRouter router)
Sets ValidatorRouter to be used.
-
-
-
Method Detail
-
register
public AsyncFetcher.Builder register(@NotNull @NotNull Callback callback)
Register any callbacks that will be called when a page has been fetched.Please note that blocking callbacks will significantly reduce the rate at which request are processed. Please implement your own executors on I/O blocking callbacks.
- Parameters:
callback
- A set of FetcherCallback.- Returns:
- this
-
disableCookies
public AsyncFetcher.Builder disableCookies()
Disables cookie storage.- Returns:
- this
-
setFileManager
public AsyncFetcher.Builder setFileManager(FileManager fileManager)
Sets the FileManager to be used. Defaults to none.If fileManager is set, all items fetched will be saved to storage.
- Parameters:
fileManager
- file manager to be used.- Returns:
- this
-
setHeaders
public AsyncFetcher.Builder setHeaders(@NotNull @NotNull java.util.Map<java.lang.String,java.lang.String> headers)
Sets the headers to be used when fetching items. Defaults to none.- Parameters:
headers
- a map to headers to be used.- Returns:
- this
-
setNumIoThreads
public AsyncFetcher.Builder setNumIoThreads(int numIoThreads)
Number of httpclient dispatcher threads.- Parameters:
numIoThreads
- number of threads.- Returns:
- this
-
setMaxConnections
public AsyncFetcher.Builder setMaxConnections(int maxConnections)
Sets the maximum allowable connections at an instance.- Parameters:
maxConnections
- the max allowable connections.- Returns:
- this
-
setMaxRouteConnections
public AsyncFetcher.Builder setMaxRouteConnections(int maxRouteConnections)
Sets the maximum allowable connections at an instance for a particular route (host).- Parameters:
maxRouteConnections
- the max allowable connections per route.- Returns:
- this
-
setProxyProvider
public AsyncFetcher.Builder setProxyProvider(ProxyProvider proxyProvider)
Sets the ProxyProvider to be used. Defaults to none.- Parameters:
proxyProvider
- proxy provider to be used.- Returns:
- this
-
setSslContext
public AsyncFetcher.Builder setSslContext(javax.net.ssl.SSLContext sslContext)
Sets the ssl context for an encrypted response.- Parameters:
sslContext
- SSLContext to be used.- Returns:
- this
-
setStopCodes
public AsyncFetcher.Builder setStopCodes(@NotNull @javax.validation.constraints.NotNull int... codes)
Set a list of stop code that will interrupt crawling.- Parameters:
codes
- A list of stop codes.- Returns:
- this
-
setThreadFactory
public AsyncFetcher.Builder setThreadFactory(@NotNull @NotNull java.util.concurrent.ThreadFactory threadFactory)
Set the thread factory that creates the httpclient dispatcher threads.- Parameters:
threadFactory
- an instance of ThreadFactory.- Returns:
- this
-
setUserAgent
public AsyncFetcher.Builder setUserAgent(@NotNull @NotNull UserAgent userAgent)
Sets the UserAgent to be used, if not set, default will be chosen.- Parameters:
userAgent
- user agent generator to be used.- Returns:
- this
-
setValidator
public AsyncFetcher.Builder setValidator(@NotNull @NotNull Validator validator)
Sets the Validator to be used. Defaults to StatusOkValidator and EmptyContentValidator.This will validate the fetched page and retry if page is not consistent with the specification set by the validator.
- Parameters:
validator
- validator to be used.- Returns:
- this
-
setValidator
public AsyncFetcher.Builder setValidator(@NotNull @NotNull Validator... validators)
Sets the multiple validators to be used. Defaults to StatusOkValidator and EmptyContentValidator.This will validate the fetched page and retry if page is not consistent with the specification set by the validator.
- Parameters:
validators
- validator to be used.- Returns:
- this
-
setRedirectStrategy
public AsyncFetcher.Builder setRedirectStrategy(org.apache.http.client.RedirectStrategy redirectStrategy)
Sets the redirection strategy for a response received by the fetcher.- Parameters:
redirectStrategy
- redirection strategy to be used.- Returns:
- this
-
setValidatorRouter
public AsyncFetcher.Builder setValidatorRouter(ValidatorRouter router)
Sets ValidatorRouter to be used. Defaults to none. Validator rules set in validator will always be used.- Parameters:
router
- router validator setValidatorRouter to be used.- Returns:
- this
-
setConnectionRequestTimeout
public AsyncFetcher.Builder setConnectionRequestTimeout(int connectionRequestTimeout)
The timeout in milliseconds used when requesting a connection from the connection manager. A timeout value of zero is interpreted as an infinite timeout.- Parameters:
connectionRequestTimeout
- timeout.- Returns:
- this
-
setConnectTimeout
public AsyncFetcher.Builder setConnectTimeout(int connectTimeout)
Determines the timeout in milliseconds until a connection is established. A timeout value of zero is interpreted as an infinite timeout.- Parameters:
connectTimeout
- timeout.- Returns:
- this
-
setSocketTimeout
public AsyncFetcher.Builder setSocketTimeout(int socketTimeout)
Defines the socket timeout (SO_TIMEOUT
) in milliseconds, which is the timeout for waiting for data or, put differently, a maximum period inactivity between two consecutive data packets).- Parameters:
socketTimeout
- timeout.- Returns:
- this
-
disableCompression
public AsyncFetcher.Builder disableCompression()
Disables request for compress pages and to decompress pages after it is fetched. Defaults to true.- Returns:
- this
-
build
public AsyncFetcher build()
Builds the fetcher with the options specified.- Returns:
- an instance of Fetcher.
-
-