Class ITER_CSSPath

  • All Implemented Interfaces:
    IteratorFunction

    public class ITER_CSSPath
    extends IteratorFunctionBase
    Iterator function iter:CSSPath extracts parts of a HTML document, using CSS-Selector-like queries.

    See Live example

    • Param 1: (html): the URI of the HTML document (a URI), or the HTML document itself (a String);
    • Param 2: (cssSelector) is the CSS Selector. See https://jsoup.org/apidocs/org/jsoup/select/Selector.html for the base syntax specification.
    • Param 3 .. N : (auxCssSelector ... ) other CSS Selectors, which will be executed over each of the results of the execution of xPath, exactly as if the binding function fun:CSSPath was applied. By default, the output is the outer HTML of the first matched element. However, two additions to the CSS Selector syntax can change this behaviour:
      • (if the selector ends with /text()) the output is the combined text of the first matched element and all its children. Whitespaces are normalized and trimmed.
      • (if the selector ends with @attributeName) the output is the value of the attribute attributeName for the first matched element.
    The following variables may be bound:
    • Output 1: (string) outer HTML of the matched element;
    • Output 2 .. N-1: (string) result of the execution of the auxiliary CSS Selector queries on Output 1, encoded as literals;
    • Output N: (integer) the position of the result in the list;
    • Output N+1: (boolean) true if this result has a next result in the list.
    Output N and N+1 can be used to generate RDF lists from the input, but the use of keyword LIST( ?var ) as the object of a triple pattern covers most cases more elegantly.
    Author:
    Noorani Bakerally , Maxime Lefrançois