Trait

net.ruippeixotog.scalascraper.browser

Browser

Related Doc: package browser

Permalink

trait Browser extends AnyRef

A client able to retrieve and parse HTML pages from the web and from local resources.

An implementation of Browser can fetch pages via HTTP GET or POST requests, parse the downloaded page and return a net.ruippeixotog.scalascraper.model.Document instance, which can be queried via the scraper DSL or using its methods.

Different net.ruippeixotog.scalascraper.browser.Browser implementations can embed pages with different runtime behavior. For example, some browsers may limit themselves to parse the HTML content inside the page without executing any scripts inside, while others may run JavaScript and allow for Document instances with dynamic content. The documentation of each implementation should be read for more information on the semantics of its Document and net.ruippeixotog.scalascraper.model.Element implementations.

Linear Supertypes
AnyRef, Any
Known Subclasses
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. Browser
  2. AnyRef
  3. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Type Members

  1. abstract type DocumentType <: Document

    Permalink

    The concrete type of documents created by this browser.

Abstract Value Members

  1. abstract def clearCookies(): Unit

    Permalink

    Clears the cookie store of this browser.

  2. abstract def cookies(url: String): Map[String, String]

    Permalink

    Returns the current set of cookies stored in this browser for a given URL.

    Returns the current set of cookies stored in this browser for a given URL.

    url

    the URL whose stored cookies are to be returned

    returns

    a mapping of cookie names to their respective values.

  3. abstract def get(url: String): DocumentType

    Permalink

    Retrieves and parses a web page using a GET request.

    Retrieves and parses a web page using a GET request.

    url

    the URL of the page to retrieve

    returns

    a Document containing the retrieved web page.

  4. abstract def parseFile(file: File, charset: String): DocumentType

    Permalink

    Parses a local HTML file with a specified charset.

    Parses a local HTML file with a specified charset.

    file

    the HTML file to parse

    charset

    the charset of the file

    returns

    a Document containing the parsed web page.

  5. abstract def parseInputStream(inputStream: InputStream, charset: String = "UTF-8"): DocumentType

    Permalink

    Parses an input stream with its content in a specified charset.

    Parses an input stream with its content in a specified charset. The provided input stream is always closed before this method returns or throws an exception.

    inputStream

    the input stream to parse

    charset

    the charset of the input stream content

    returns

    a Document containing the parsed web page.

  6. abstract def parseString(html: String): DocumentType

    Permalink

    Parses an HTML string.

    Parses an HTML string.

    html

    the HTML string to parse

    returns

    a Document containing the parsed web page.

  7. abstract def post(url: String, form: Map[String, String]): DocumentType

    Permalink

    Submits a form via a POST request and parses the resulting page.

    Submits a form via a POST request and parses the resulting page.

    url

    the URL of the page to retrieve

    form

    a map containing the form fields to submit with their respective values

    returns

    a Document containing the resulting web page.

  8. abstract def userAgent: String

    Permalink

    The user agent used by this browser to retrieve HTML pages from the web.

Concrete Value Members

  1. final def !=(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  4. final def asInstanceOf[T0]: T0

    Permalink
    Definition Classes
    Any
  5. def clone(): AnyRef

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  6. final def eq(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  7. def equals(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  8. def finalize(): Unit

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  9. final def getClass(): Class[_]

    Permalink
    Definition Classes
    AnyRef → Any
  10. def hashCode(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  11. final def isInstanceOf[T0]: Boolean

    Permalink
    Definition Classes
    Any
  12. final def ne(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  13. final def notify(): Unit

    Permalink
    Definition Classes
    AnyRef
  14. final def notifyAll(): Unit

    Permalink
    Definition Classes
    AnyRef
  15. def parseFile(path: String): DocumentType

    Permalink

    Parses a local HTML file encoded in UTF-8.

    Parses a local HTML file encoded in UTF-8.

    path

    the path in the local filesystem where the HTML file is located

    returns

    a Document containing the parsed web page.

  16. def parseFile(path: String, charset: String): DocumentType

    Permalink

    Parses a local HTML file with a specified charset.

    Parses a local HTML file with a specified charset.

    path

    the path in the local filesystem where the HTML file is located

    charset

    the charset of the file

    returns

    a Document containing the parsed web page.

  17. def parseFile(file: File): DocumentType

    Permalink

    Parses a local HTML file encoded in UTF-8.

    Parses a local HTML file encoded in UTF-8.

    file

    the HTML file to parse

    returns

    a Document containing the parsed web page.

  18. def parseResource(name: String, charset: String = "UTF-8"): DocumentType

    Permalink

    Parses a resource with a specified charset.

    Parses a resource with a specified charset.

    name

    the name of the resource to parse

    charset

    the charset of the resource

    returns

    a Document containing the parsed web page.

  19. final def synchronized[T0](arg0: ⇒ T0): T0

    Permalink
    Definition Classes
    AnyRef
  20. def toString(): String

    Permalink
    Definition Classes
    AnyRef → Any
  21. final def wait(): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  22. final def wait(arg0: Long, arg1: Int): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  23. final def wait(arg0: Long): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )

Inherited from AnyRef

Inherited from Any

Ungrouped