public class FetchHTTPRequest extends Object
Modifier and Type | Class and Description |
---|---|
protected static class |
FetchHTTPRequest.RecordingHttpClientConnection |
protected static class |
FetchHTTPRequest.ServerCacheResolver
Implementation of
DnsResolver that uses the server cache which is
normally expected to have been populated by FetchDNS. |
Modifier and Type | Field and Description |
---|---|
protected boolean |
addedCredentials |
protected org.apache.http.conn.HttpClientConnectionManager |
connMan |
protected CrawlURI |
curi |
protected FetchHTTP |
fetcher |
protected org.apache.http.impl.client.HttpClientBuilder |
httpClientBuilder |
protected org.apache.http.client.protocol.HttpClientContext |
httpClientContext |
protected org.apache.http.HttpHost |
proxyHost |
protected org.apache.http.client.methods.AbstractExecutionAwareRequest |
request |
protected org.apache.http.client.config.RequestConfig.Builder |
requestConfigBuilder |
protected static org.apache.http.conn.routing.HttpRoutePlanner |
ROUTE_PLANNER |
protected org.apache.http.HttpHost |
targetHost |
Constructor and Description |
---|
FetchHTTPRequest(FetchHTTP fetcher,
CrawlURI curi) |
Modifier and Type | Method and Description |
---|---|
protected org.apache.http.conn.HttpClientConnectionManager |
buildConnectionManager() |
protected org.apache.http.HttpEntity |
buildPostRequestEntity(CrawlURI curi) |
protected void |
configureHttpClientBuilder() |
protected void |
configureRequest() |
protected void |
configureRequestHeaders() |
static String |
escapeForMultipart(String str)
Returns a copy of the string with non-ascii characters replaced by their
html numeric character reference in decimal (e.g.
|
org.apache.http.HttpResponse |
execute() |
protected void |
initHttpClientBuilder() |
boolean |
isDisableSNI() |
protected void |
maybeAddConditionalGetHeader(boolean conditional,
String sourceHeader,
String targetHeader)
Add the given conditional-GET header, if the setting is enabled and
a suitable value is available in the URI history.
|
protected boolean |
populateHtmlFormCredential(HtmlFormCredential cred) |
protected void |
populateHttpCredential(org.apache.http.HttpHost host,
org.apache.http.auth.AuthScheme authScheme,
String user,
String password) |
protected void |
populateHttpProxyCredential() |
protected boolean |
populateTargetCredential()
Add credentials if any to passed
method . |
void |
setDisableSNI(boolean disableSNI) |
protected FetchHTTP fetcher
protected CrawlURI curi
protected org.apache.http.impl.client.HttpClientBuilder httpClientBuilder
protected org.apache.http.client.config.RequestConfig.Builder requestConfigBuilder
protected org.apache.http.client.protocol.HttpClientContext httpClientContext
protected org.apache.http.client.methods.AbstractExecutionAwareRequest request
protected org.apache.http.HttpHost targetHost
protected boolean addedCredentials
protected org.apache.http.HttpHost proxyHost
protected org.apache.http.conn.HttpClientConnectionManager connMan
protected static final org.apache.http.conn.routing.HttpRoutePlanner ROUTE_PLANNER
public boolean isDisableSNI()
public void setDisableSNI(boolean disableSNI)
public static String escapeForMultipart(String str)
The purpose of this is to produce a multipart/formdata submission that any server should be able to handle, based on experiments using a modern browser (chromium 47.0.2526.106 for mac). What chromium posts depends on what it considers the character encoding of the page containing the form, and maybe other factors. It would be too complicated to try to simulate that behavior in heritrix.
Instead what we do is approximately what the browser does when the form page is plain ascii. It html-escapes characters outside of the latin1/cp1252 range. Characters in the U+0080-U+00FF range are encoded in latin1/cp1252. That is the one way that we differ from chromium. We html-escape those characters (U+0080-U+00FF) as well. That way the http post is plain ascii, and should work regardless of which encoding the server expects.
N.b. chromium doesn't indicate the encoding of the request in any way (no charset in the content-type or anything like that). Also of note is that when it considers the form page to be utf-8, it submits in utf-8. That's part of the complicated behavior we don't want to try to simulate.
protected org.apache.http.HttpEntity buildPostRequestEntity(CrawlURI curi)
protected void configureRequestHeaders()
protected void maybeAddConditionalGetHeader(boolean conditional, String sourceHeader, String targetHeader)
conditional
- true/false enablement setting name to consultsourceHeader
- header to consult in URI historytargetHeader
- header to set if possibleprotected void configureRequest()
protected boolean populateTargetCredential()
method
.
Do credential handling. Credentials are in two places. 1. Credentials
that succeeded are added to the CrawlServer (Or rather, avatars for
credentials are whats added because its not safe to keep around
references to credentials). 2. Credentials to be tried are in the curi.
Returns true if found credentials to be tried.method
with credentials AND
the credentials came from the curi
, not from the
CrawlServer. The former is special in that if the
curi
credentials
succeed, then the caller needs to promote them from the CrawlURI to the
CrawlServer so they are available for all subsequent CrawlURIs on this
server.protected void populateHttpProxyCredential()
protected boolean populateHtmlFormCredential(HtmlFormCredential cred)
protected void populateHttpCredential(org.apache.http.HttpHost host, org.apache.http.auth.AuthScheme authScheme, String user, String password)
protected void configureHttpClientBuilder() throws org.apache.commons.httpclient.URIException
org.apache.commons.httpclient.URIException
protected org.apache.http.conn.HttpClientConnectionManager buildConnectionManager()
protected void initHttpClientBuilder()
public org.apache.http.HttpResponse execute() throws org.apache.http.client.ClientProtocolException, IOException
org.apache.http.client.ClientProtocolException
IOException
Copyright © 2003–2021 Internet Archive. All rights reserved.