Browser

net.ruippeixotog.scalascraper.browser.Browser
trait Browser

A client able to retrieve and parse HTML pages from the web and from local resources.

An implementation of Browser can fetch pages via HTTP GET or POST requests, parse the downloaded page and return a net.ruippeixotog.scalascraper.model.Document instance, which can be queried via the scraper DSL or using its methods.

Different net.ruippeixotog.scalascraper.browser.Browser implementations can embed pages with different runtime behavior. For example, some browsers may limit themselves to parse the HTML content inside the page without executing any scripts inside, while others may run JavaScript and allow for Document instances with dynamic content. The documentation of each implementation should be read for more information on the semantics of its Document and net.ruippeixotog.scalascraper.model.Element implementations.

Attributes

Graph
Supertypes
class Object
trait Matchable
class Any
Known subtypes

Members list

Type members

Types

The concrete type of documents created by this browser.

The concrete type of documents created by this browser.

Attributes

Value members

Abstract methods

def clearCookies(): Unit

Clears the cookie store of this browser.

Clears the cookie store of this browser.

Attributes

def cookies(url: String): Map[String, String]

Returns the current set of cookies stored in this browser for a given URL.

Returns the current set of cookies stored in this browser for a given URL.

Value parameters

url

the URL whose stored cookies are to be returned

Attributes

Returns

a mapping of cookie names to their respective values.

def get(url: String): DocumentType

Retrieves and parses a web page using a GET request.

Retrieves and parses a web page using a GET request.

Value parameters

url

the URL of the page to retrieve

Attributes

Returns

a Document containing the retrieved web page.

def parseFile(file: File, charset: String): DocumentType

Parses a local HTML file with a specified charset.

Parses a local HTML file with a specified charset.

Value parameters

charset

the charset of the file

file

the HTML file to parse

Attributes

Returns

a Document containing the parsed web page.

def parseInputStream(inputStream: InputStream, charset: String): DocumentType

Parses an input stream with its content in a specified charset. The provided input stream is always closed before this method returns or throws an exception.

Parses an input stream with its content in a specified charset. The provided input stream is always closed before this method returns or throws an exception.

Value parameters

charset

the charset of the input stream content

inputStream

the input stream to parse

Attributes

Returns

a Document containing the parsed web page.

def parseString(html: String): DocumentType

Parses an HTML string.

Parses an HTML string.

Value parameters

html

the HTML string to parse

Attributes

Returns

a Document containing the parsed web page.

def post(url: String, form: Map[String, String]): DocumentType

Submits a form via a POST request and parses the resulting page.

Submits a form via a POST request and parses the resulting page.

Value parameters

form

a map containing the form fields to submit with their respective values

url

the URL of the page to retrieve

Attributes

Returns

a Document containing the resulting web page.

def userAgent: String

The user agent used by this browser to retrieve HTML pages from the web.

The user agent used by this browser to retrieve HTML pages from the web.

Attributes

def withProxy(proxy: Proxy): Browser

Returns a new browser that uses the provided proxy for all connections.

Returns a new browser that uses the provided proxy for all connections.

Attributes

Concrete methods

def parseFile(file: File): DocumentType

Parses a local HTML file encoded in UTF-8.

Parses a local HTML file encoded in UTF-8.

Value parameters

file

the HTML file to parse

Attributes

Returns

a Document containing the parsed web page.

def parseFile(path: String, charset: String): DocumentType

Parses a local HTML file with a specified charset.

Parses a local HTML file with a specified charset.

Value parameters

charset

the charset of the file

path

the path in the local filesystem where the HTML file is located

Attributes

Returns

a Document containing the parsed web page.

def parseFile(path: String): DocumentType

Parses a local HTML file encoded in UTF-8.

Parses a local HTML file encoded in UTF-8.

Value parameters

path

the path in the local filesystem where the HTML file is located

Attributes

Returns

a Document containing the parsed web page.

def parseResource(name: String, charset: String): DocumentType

Parses a resource with a specified charset.

Parses a resource with a specified charset.

Value parameters

charset

the charset of the resource

name

the name of the resource to parse

Attributes

Returns

a Document containing the parsed web page.