HtmlUnitBrowser

net.ruippeixotog.scalascraper.browser.HtmlUnitBrowser
See theHtmlUnitBrowser companion object
class HtmlUnitBrowser(browserType: BrowserVersion, proxy: Option[ProxyConfig]) extends Browser

A Browser implementation based on HtmlUnit, a GUI-less browser for Java programs. HtmlUnitBrowser simulates thoroughly a web browser, executing JavaScript code in the pages besides parsing and modelling its HTML content. It supports several compatibility modes, allowing it to emulate browsers such as Internet Explorer.

Both the net.ruippeixotog.scalascraper.model.Document and the net.ruippeixotog.scalascraper.model.Element instances obtained from HtmlUnitBrowser can be mutated in the background. JavaScript code can at any time change attributes and the content of elements, reflected both in queries to Document and on previously stored references to Elements. The Document instance will always represent the current page in the browser's "window". This means the Document's location value can change, together with its root element, in the event of client-side page refreshes or redirections. However, Element instances belong to a fixed DOM tree and they stop being meaningful as soon as they are removed from the DOM or a client-side page reload occurs.

Value parameters

browserType

the browser type and version to simulate

proxy

an optional proxy configuration to use

Attributes

Companion
object
Graph
Supertypes
trait Browser
class Object
trait Matchable
class Any

Members list

Type members

Types

The concrete type of documents created by this browser.

The concrete type of documents created by this browser.

Attributes

Value members

Concrete methods

def clearCookies(): Unit

Clears the cookie store of this browser.

Clears the cookie store of this browser.

Attributes

def closeAll(): Unit

Closes all windows opened in this browser.

Closes all windows opened in this browser.

Attributes

def cookies(url: String): Map[String, String]

Returns the current set of cookies stored in this browser for a given URL.

Returns the current set of cookies stored in this browser for a given URL.

Value parameters

url

the URL whose stored cookies are to be returned

Attributes

Returns

a mapping of cookie names to their respective values.

def exec(req: WebRequest): HtmlUnitDocument
def get(url: String): HtmlUnitDocument

Retrieves and parses a web page using a GET request.

Retrieves and parses a web page using a GET request.

Value parameters

url

the URL of the page to retrieve

Attributes

Returns

a Document containing the retrieved web page.

def parseFile(file: File, charset: String): HtmlUnitDocument

Parses a local HTML file with a specified charset.

Parses a local HTML file with a specified charset.

Value parameters

charset

the charset of the file

file

the HTML file to parse

Attributes

Returns

a Document containing the parsed web page.

def parseInputStream(inputStream: InputStream, charset: String): HtmlUnitDocument

Parses an input stream with its content in a specified charset. The provided input stream is always closed before this method returns or throws an exception.

Parses an input stream with its content in a specified charset. The provided input stream is always closed before this method returns or throws an exception.

Value parameters

charset

the charset of the input stream content

inputStream

the input stream to parse

Attributes

Returns

a Document containing the parsed web page.

def parseString(html: String): HtmlUnitDocument

Parses an HTML string.

Parses an HTML string.

Value parameters

html

the HTML string to parse

Attributes

Returns

a Document containing the parsed web page.

def post(url: String, form: Map[String, String]): HtmlUnitDocument

Submits a form via a POST request and parses the resulting page.

Submits a form via a POST request and parses the resulting page.

Value parameters

form

a map containing the form fields to submit with their respective values

url

the URL of the page to retrieve

Attributes

Returns

a Document containing the resulting web page.

def userAgent: String

The user agent used by this browser to retrieve HTML pages from the web.

The user agent used by this browser to retrieve HTML pages from the web.

Attributes

Returns a new browser that uses the provided proxy for all connections.

Returns a new browser that uses the provided proxy for all connections.

Attributes

Inherited methods

def parseFile(path: String): DocumentType

Parses a local HTML file encoded in UTF-8.

Parses a local HTML file encoded in UTF-8.

Value parameters

path

the path in the local filesystem where the HTML file is located

Attributes

Returns

a Document containing the parsed web page.

Inherited from:
Browser
def parseFile(path: String, charset: String): DocumentType

Parses a local HTML file with a specified charset.

Parses a local HTML file with a specified charset.

Value parameters

charset

the charset of the file

path

the path in the local filesystem where the HTML file is located

Attributes

Returns

a Document containing the parsed web page.

Inherited from:
Browser
def parseFile(file: File): DocumentType

Parses a local HTML file encoded in UTF-8.

Parses a local HTML file encoded in UTF-8.

Value parameters

file

the HTML file to parse

Attributes

Returns

a Document containing the parsed web page.

Inherited from:
Browser
def parseResource(name: String, charset: String): DocumentType

Parses a resource with a specified charset.

Parses a resource with a specified charset.

Value parameters

charset

the charset of the resource

name

the name of the resource to parse

Attributes

Returns

a Document containing the parsed web page.

Inherited from:
Browser

Concrete fields

lazy val underlying: WebClient