Class HTMLConfiguration

  • All Implemented Interfaces:
    org.apache.xerces.xni.parser.XMLComponentManager, org.apache.xerces.xni.parser.XMLParserConfiguration, org.apache.xerces.xni.parser.XMLPullParserConfiguration

    public class HTMLConfiguration
    extends org.apache.xerces.util.ParserConfigurationSettings
    implements org.apache.xerces.xni.parser.XMLPullParserConfiguration
    An XNI-based parser configuration that can be used to parse HTML documents. This configuration can be used directly in order to parse HTML documents or can be used in conjunction with any XNI based tools, such as the Xerces2 implementation.

    This configuration recognizes the following features:

    • http://cyberneko.org/html/features/augmentations
    • http://cyberneko.org/html/features/report-errors
    • http://cyberneko.org/html/features/report-errors/simple
    • http://cyberneko.org/html/features/balance-tags
    • and
    • the features supported by the scanner and tag balancer components.

    This configuration recognizes the following properties:

    • http://cyberneko.org/html/properties/names/elems
    • http://cyberneko.org/html/properties/names/attrs
    • http://cyberneko.org/html/properties/filters
    • http://cyberneko.org/html/properties/error-reporter
    • and
    • the properties supported by the scanner and tag balancer.

    For complete usage information, refer to the documentation.

    Version:
    $Id: HTMLConfiguration.java,v 1.9 2005/02/14 03:56:54 andyc Exp $
    Author:
    Andy Clark
    See Also:
    HTMLScanner, HTMLTagBalancer, HTMLErrorReporter
    • Field Detail

      • NAMES_ELEMS

        protected static final String NAMES_ELEMS
        Modify HTML element names: { "upper", "lower", "default" }.
        See Also:
        Constant Field Values
      • NAMES_ATTRS

        protected static final String NAMES_ATTRS
        Modify HTML attribute names: { "upper", "lower", "default" }.
        See Also:
        Constant Field Values
      • fDocumentHandler

        protected org.apache.xerces.xni.XMLDocumentHandler fDocumentHandler
        Document handler.
      • fDTDHandler

        protected org.apache.xerces.xni.XMLDTDHandler fDTDHandler
        DTD handler.
      • fDTDContentModelHandler

        protected org.apache.xerces.xni.XMLDTDContentModelHandler fDTDContentModelHandler
        DTD content model handler.
      • fErrorHandler

        protected org.apache.xerces.xni.parser.XMLErrorHandler fErrorHandler
        Error handler.
      • fEntityResolver

        protected org.apache.xerces.xni.parser.XMLEntityResolver fEntityResolver
        Entity resolver.
      • fLocale

        protected Locale fLocale
        Locale.
      • fCloseStream

        protected boolean fCloseStream
        Stream opened by parser. Therefore, must close stream manually upon termination of parsing.
      • fDocumentScanner

        protected final HTMLScanner fDocumentScanner
        Document scanner.
      • fTagBalancer

        protected final HTMLTagBalancer fTagBalancer
        HTML tag balancer.
      • fNamespaceBinder

        protected final NamespaceBinder fNamespaceBinder
        Namespace binder.
    • Constructor Detail

      • HTMLConfiguration

        public HTMLConfiguration()
        Default constructor.
    • Method Detail

      • createDocumentScanner

        protected HTMLScanner createDocumentScanner()
      • pushInputSource

        public void pushInputSource​(org.apache.xerces.xni.parser.XMLInputSource inputSource)
        Pushes an input source onto the current entity stack. This enables the scanner to transparently scan new content (e.g. the output written by an embedded script). At the end of the current entity, the scanner returns where it left off at the time this entity source was pushed.

        Hint: To use this feature to insert the output of <SCRIPT> tags, remember to buffer the entire output of the processed instructions before pushing a new input source. Otherwise, events may appear out of sequence.

        Parameters:
        inputSource - The new input source to start scanning.
        See Also:
        evaluateInputSource(XMLInputSource)
      • evaluateInputSource

        public void evaluateInputSource​(org.apache.xerces.xni.parser.XMLInputSource inputSource)
        Immediately evaluates an input source and add the new content (e.g. the output written by an embedded script).
        Parameters:
        inputSource - The new input source to start scanning.
        See Also:
        pushInputSource(XMLInputSource)
      • setFeature

        public void setFeature​(String featureId,
                               boolean state)
        Sets a feature.
        Specified by:
        setFeature in interface org.apache.xerces.xni.parser.XMLParserConfiguration
        Overrides:
        setFeature in class org.apache.xerces.util.ParserConfigurationSettings
      • setProperty

        public void setProperty​(String propertyId,
                                Object value)
        Sets a property.
        Specified by:
        setProperty in interface org.apache.xerces.xni.parser.XMLParserConfiguration
        Overrides:
        setProperty in class org.apache.xerces.util.ParserConfigurationSettings
      • setDocumentHandler

        public void setDocumentHandler​(org.apache.xerces.xni.XMLDocumentHandler handler)
        Sets the document handler.
        Specified by:
        setDocumentHandler in interface org.apache.xerces.xni.parser.XMLParserConfiguration
      • getDocumentHandler

        public org.apache.xerces.xni.XMLDocumentHandler getDocumentHandler()
        Returns the document handler.
        Specified by:
        getDocumentHandler in interface org.apache.xerces.xni.parser.XMLParserConfiguration
      • setDTDHandler

        public void setDTDHandler​(org.apache.xerces.xni.XMLDTDHandler handler)
        Sets the DTD handler.
        Specified by:
        setDTDHandler in interface org.apache.xerces.xni.parser.XMLParserConfiguration
      • getDTDHandler

        public org.apache.xerces.xni.XMLDTDHandler getDTDHandler()
        Returns the DTD handler.
        Specified by:
        getDTDHandler in interface org.apache.xerces.xni.parser.XMLParserConfiguration
      • setDTDContentModelHandler

        public void setDTDContentModelHandler​(org.apache.xerces.xni.XMLDTDContentModelHandler handler)
        Sets the DTD content model handler.
        Specified by:
        setDTDContentModelHandler in interface org.apache.xerces.xni.parser.XMLParserConfiguration
      • getDTDContentModelHandler

        public org.apache.xerces.xni.XMLDTDContentModelHandler getDTDContentModelHandler()
        Returns the DTD content model handler.
        Specified by:
        getDTDContentModelHandler in interface org.apache.xerces.xni.parser.XMLParserConfiguration
      • setErrorHandler

        public void setErrorHandler​(org.apache.xerces.xni.parser.XMLErrorHandler handler)
        Sets the error handler.
        Specified by:
        setErrorHandler in interface org.apache.xerces.xni.parser.XMLParserConfiguration
      • getErrorHandler

        public org.apache.xerces.xni.parser.XMLErrorHandler getErrorHandler()
        Returns the error handler.
        Specified by:
        getErrorHandler in interface org.apache.xerces.xni.parser.XMLParserConfiguration
      • setEntityResolver

        public void setEntityResolver​(org.apache.xerces.xni.parser.XMLEntityResolver resolver)
        Sets the entity resolver.
        Specified by:
        setEntityResolver in interface org.apache.xerces.xni.parser.XMLParserConfiguration
      • getEntityResolver

        public org.apache.xerces.xni.parser.XMLEntityResolver getEntityResolver()
        Returns the entity resolver.
        Specified by:
        getEntityResolver in interface org.apache.xerces.xni.parser.XMLParserConfiguration
      • setLocale

        public void setLocale​(Locale locale)
        Sets the locale.
        Specified by:
        setLocale in interface org.apache.xerces.xni.parser.XMLParserConfiguration
      • getLocale

        public Locale getLocale()
        Returns the locale.
        Specified by:
        getLocale in interface org.apache.xerces.xni.parser.XMLParserConfiguration
      • parse

        public void parse​(org.apache.xerces.xni.parser.XMLInputSource source)
                   throws IOException
        Parses a document.
        Specified by:
        parse in interface org.apache.xerces.xni.parser.XMLParserConfiguration
        Throws:
        IOException
      • setInputSource

        public void setInputSource​(org.apache.xerces.xni.parser.XMLInputSource inputSource)
                            throws IOException
        Sets the input source for the document to parse.
        Specified by:
        setInputSource in interface org.apache.xerces.xni.parser.XMLPullParserConfiguration
        Parameters:
        inputSource - The document's input source.
        Throws:
        org.apache.xerces.xni.parser.XMLConfigurationException - Thrown if there is a configuration error when initializing the parser.
        IOException - Thrown on I/O error.
        See Also:
        parse(boolean)
      • parse

        public boolean parse​(boolean complete)
                      throws IOException
        Parses the document in a pull parsing fashion.
        Specified by:
        parse in interface org.apache.xerces.xni.parser.XMLPullParserConfiguration
        Parameters:
        complete - True if the pull parser should parse the remaining document completely.
        Returns:
        True if there is more document to parse.
        Throws:
        org.apache.xerces.xni.XNIException - Any XNI exception, possibly wrapping another exception.
        IOException - An IO exception from the parser, possibly from a byte stream or character stream supplied by the parser.
        See Also:
        setInputSource(org.apache.xerces.xni.parser.XMLInputSource)
      • cleanup

        public void cleanup()
        If the application decides to terminate parsing before the xml document is fully parsed, the application should call this method to free any resource allocated during parsing. For example, close all opened streams.
        Specified by:
        cleanup in interface org.apache.xerces.xni.parser.XMLPullParserConfiguration
      • addComponent

        protected void addComponent​(HTMLComponent component)
        Adds a component.
      • reset

        protected void reset()
        Resets the parser configuration.