A Browser implementation based on HtmlUnit, a GUI-less browser for Java programs. HtmlUnitBrowser
simulates thoroughly a web browser, executing JavaScript code in the pages besides parsing and modelling its HTML content. It supports several compatibility modes, allowing it to emulate browsers such as Internet Explorer.
Both the net.ruippeixotog.scalascraper.model.Document and the net.ruippeixotog.scalascraper.model.Element instances obtained from HtmlUnitBrowser
can be mutated in the background. JavaScript code can at any time change attributes and the content of elements, reflected both in queries to Document
and on previously stored references to Element
s. The Document
instance will always represent the current page in the browser's "window". This means the Document
's location
value can change, together with its root element, in the event of client-side page refreshes or redirections. However, Element
instances belong to a fixed DOM tree and they stop being meaningful as soon as they are removed from the DOM or a client-side page reload occurs.
Value parameters
- browserType
-
the browser type and version to simulate
- proxy
-
an optional proxy configuration to use
Attributes
- Companion
- object
- Graph
-
- Supertypes
Members list
Type members
Types
The concrete type of documents created by this browser.
The concrete type of documents created by this browser.
Attributes
Value members
Concrete methods
Clears the cookie store of this browser.
Clears the cookie store of this browser.
Attributes
Closes all windows opened in this browser.
Closes all windows opened in this browser.
Attributes
Returns the current set of cookies stored in this browser for a given URL.
Returns the current set of cookies stored in this browser for a given URL.
Value parameters
- url
-
the URL whose stored cookies are to be returned
Attributes
- Returns
-
a mapping of cookie names to their respective values.
Retrieves and parses a web page using a GET request.
Retrieves and parses a web page using a GET request.
Value parameters
- url
-
the URL of the page to retrieve
Attributes
- Returns
-
a
Document
containing the retrieved web page.
Parses a local HTML file with a specified charset.
Parses a local HTML file with a specified charset.
Value parameters
- charset
-
the charset of the file
- file
-
the HTML file to parse
Attributes
- Returns
-
a
Document
containing the parsed web page.
Parses an input stream with its content in a specified charset. The provided input stream is always closed before this method returns or throws an exception.
Parses an input stream with its content in a specified charset. The provided input stream is always closed before this method returns or throws an exception.
Value parameters
- charset
-
the charset of the input stream content
- inputStream
-
the input stream to parse
Attributes
- Returns
-
a
Document
containing the parsed web page.
Parses an HTML string.
Parses an HTML string.
Value parameters
- html
-
the HTML string to parse
Attributes
- Returns
-
a
Document
containing the parsed web page.
Submits a form via a POST request and parses the resulting page.
Submits a form via a POST request and parses the resulting page.
Value parameters
- form
-
a map containing the form fields to submit with their respective values
- url
-
the URL of the page to retrieve
Attributes
- Returns
-
a
Document
containing the resulting web page.
The user agent used by this browser to retrieve HTML pages from the web.
The user agent used by this browser to retrieve HTML pages from the web.
Attributes
Returns a new browser that uses the provided proxy for all connections.
Returns a new browser that uses the provided proxy for all connections.
Attributes
Inherited methods
Parses a local HTML file encoded in UTF-8.
Parses a local HTML file encoded in UTF-8.
Value parameters
- path
-
the path in the local filesystem where the HTML file is located
Attributes
- Returns
-
a
Document
containing the parsed web page. - Inherited from:
- Browser
Parses a local HTML file with a specified charset.
Parses a local HTML file with a specified charset.
Value parameters
- charset
-
the charset of the file
- path
-
the path in the local filesystem where the HTML file is located
Attributes
- Returns
-
a
Document
containing the parsed web page. - Inherited from:
- Browser
Parses a local HTML file encoded in UTF-8.
Parses a local HTML file encoded in UTF-8.
Value parameters
- file
-
the HTML file to parse
Attributes
- Returns
-
a
Document
containing the parsed web page. - Inherited from:
- Browser
Parses a resource with a specified charset.
Parses a resource with a specified charset.
Value parameters
- charset
-
the charset of the resource
- name
-
the name of the resource to parse
Attributes
- Returns
-
a
Document
containing the parsed web page. - Inherited from:
- Browser