JsoupBrowser

net.ruippeixotog.scalascraper.browser.JsoupBrowser
See theJsoupBrowser companion object
class JsoupBrowser(val userAgent: String, val proxy: Proxy) extends Browser

A Browser implementation based on jsoup, a Java HTML parser library. JsoupBrowser provides powerful and efficient document querying, but it doesn't run JavaScript in the pages. As such, it is limited to working strictly with the HTML send in the page source.

Currently, JsoupBrowser does not keep separate cookie stores for different domains and paths. In each request all cookies set previously will be sent, regardless of the domain they were set on. If you do requests to different domains and do not want this behavior, use different JsoupBrowser instances.

As the documents parsed by JsoupBrowser instances are not changed after loading, Document and Element instances obtained from them are guaranteed to be immutable.

Value parameters

proxy

an optional proxy configuration to use

userAgent

the user agent with which requests should be made

Attributes

Companion
object
Graph
Supertypes
trait Browser
class Object
trait Matchable
class Any

Members list

Type members

Types

The concrete type of documents created by this browser.

The concrete type of documents created by this browser.

Attributes

Value members

Concrete methods

def clearCookies(): Unit

Clears the cookie store of this browser.

Clears the cookie store of this browser.

Attributes

def cookies(url: String): Map[String, String]

Returns the current set of cookies stored in this browser for a given URL.

Returns the current set of cookies stored in this browser for a given URL.

Value parameters

url

the URL whose stored cookies are to be returned

Attributes

Returns

a mapping of cookie names to their respective values.

def get(url: String): JsoupDocument

Retrieves and parses a web page using a GET request.

Retrieves and parses a web page using a GET request.

Value parameters

url

the URL of the page to retrieve

Attributes

Returns

a Document containing the retrieved web page.

def parseFile(file: File, charset: String): JsoupDocument

Parses a local HTML file with a specified charset.

Parses a local HTML file with a specified charset.

Value parameters

charset

the charset of the file

file

the HTML file to parse

Attributes

Returns

a Document containing the parsed web page.

def parseInputStream(inputStream: InputStream, charset: String): JsoupDocument

Parses an input stream with its content in a specified charset. The provided input stream is always closed before this method returns or throws an exception.

Parses an input stream with its content in a specified charset. The provided input stream is always closed before this method returns or throws an exception.

Value parameters

charset

the charset of the input stream content

inputStream

the input stream to parse

Attributes

Returns

a Document containing the parsed web page.

def parseString(html: String): JsoupDocument

Parses an HTML string.

Parses an HTML string.

Value parameters

html

the HTML string to parse

Attributes

Returns

a Document containing the parsed web page.

def post(url: String, form: Map[String, String]): JsoupDocument

Submits a form via a POST request and parses the resulting page.

Submits a form via a POST request and parses the resulting page.

Value parameters

form

a map containing the form fields to submit with their respective values

url

the URL of the page to retrieve

Attributes

Returns

a Document containing the resulting web page.

def requestSettings(conn: Connection): Connection
def setCookie(url: String, key: String, value: String): Map[String, String]
def setCookies(url: String, m: Map[String, String]): Map[String, String]

Returns a new browser that uses the provided proxy for all connections.

Returns a new browser that uses the provided proxy for all connections.

Attributes

Inherited methods

def parseFile(path: String): DocumentType

Parses a local HTML file encoded in UTF-8.

Parses a local HTML file encoded in UTF-8.

Value parameters

path

the path in the local filesystem where the HTML file is located

Attributes

Returns

a Document containing the parsed web page.

Inherited from:
Browser
def parseFile(path: String, charset: String): DocumentType

Parses a local HTML file with a specified charset.

Parses a local HTML file with a specified charset.

Value parameters

charset

the charset of the file

path

the path in the local filesystem where the HTML file is located

Attributes

Returns

a Document containing the parsed web page.

Inherited from:
Browser
def parseFile(file: File): DocumentType

Parses a local HTML file encoded in UTF-8.

Parses a local HTML file encoded in UTF-8.

Value parameters

file

the HTML file to parse

Attributes

Returns

a Document containing the parsed web page.

Inherited from:
Browser
def parseResource(name: String, charset: String): DocumentType

Parses a resource with a specified charset.

Parses a resource with a specified charset.

Value parameters

charset

the charset of the resource

name

the name of the resource to parse

Attributes

Returns

a Document containing the parsed web page.

Inherited from:
Browser

Concrete fields

val proxy: Proxy
val userAgent: String

The user agent used by this browser to retrieve HTML pages from the web.

The user agent used by this browser to retrieve HTML pages from the web.

Attributes