Package org.ccil.cowan.tagsoup
Class Parser
- java.lang.Object
-
- org.xml.sax.helpers.DefaultHandler
-
- org.ccil.cowan.tagsoup.Parser
-
- All Implemented Interfaces:
ScanHandler
,org.xml.sax.ContentHandler
,org.xml.sax.DTDHandler
,org.xml.sax.EntityResolver
,org.xml.sax.ErrorHandler
,org.xml.sax.ext.LexicalHandler
,org.xml.sax.XMLReader
public class Parser extends org.xml.sax.helpers.DefaultHandler implements ScanHandler, org.xml.sax.XMLReader, org.xml.sax.ext.LexicalHandler
The SAX parser class.
-
-
Field Summary
Fields Modifier and Type Field Description static java.lang.String
autoDetectorProperty
Specifies the AutoDetector (for encoding detection) this Parser uses.static java.lang.String
bogonsEmptyFeature
A value of "true" indicates that the parser will give unknown elements a content model of EMPTY; a value of "false", a content model of ANY.static java.lang.String
CDATAElementsFeature
A value of "true" indicates that the parser will treat CDATA elements specially.static java.lang.String
defaultAttributesFeature
A value of "true" indicates that the parser will return default attribute values for missing attributes that have default values.static java.lang.String
externalGeneralEntitiesFeature
Reports whether this parser processes external general entities (it doesn't).static java.lang.String
externalParameterEntitiesFeature
Reports whether this parser processes external parameter entities (it doesn't).static java.lang.String
ignorableWhitespaceFeature
A value of "true" indicates that the parser will transmit whitespace in element-only content via the SAX ignorableWhitespace callback.static java.lang.String
ignoreBogonsFeature
A value of "true" indicates that the parser will ignore unknown elements.static java.lang.String
isStandaloneFeature
May be examined only during a parse, after the startDocument() callback has been completed; read-only.static java.lang.String
lexicalHandlerParameterEntitiesFeature
A value of "true" indicates that the LexicalHandler will report the beginning and end of parameter entities (it won't).static java.lang.String
lexicalHandlerProperty
Used to see some syntax events that are essential in some applications: comments, CDATA delimiters, selected general entity inclusions, and the start and end of the DTD (and declaration of document element name).static java.lang.String
namespacePrefixesFeature
A value of "true" indicates that XML qualified names (with prefixes) and attributes (including xmlns* attributes) will be available.static java.lang.String
namespacesFeature
A value of "true" indicates namespace URIs and unprefixed local names for element and attribute names will be available.static java.lang.String
resolveDTDURIsFeature
A value of "true" indicates that system IDs in declarations will be absolutized (relative to their base URIs) before reporting.static java.lang.String
restartElementsFeature
A value of "true" indicates that the parser will attempt to restart the restartable elements.static java.lang.String
rootBogonsFeature
A value of "true" indicates that the parser will allow unknown elements to be the root element.static java.lang.String
scannerProperty
Specifies the Scanner object this Parser uses.static java.lang.String
schemaProperty
Specifies the Schema object this Parser uses.static java.lang.String
stringInterningFeature
Has a value of "true" if all XML names (for elements, prefixes, attributes, entities, notations, and local names), as well as Namespace URIs, will have been interned using java.lang.String.intern.static java.lang.String
translateColonsFeature
A value of "true" indicates that the parser will translate colons into underscores in names.static java.lang.String
unicodeNormalizationCheckingFeature
Controls whether the parser reports Unicode normalization errors as described in section 2.13 and Appendix B of the XML 1.1 Recommendation.static java.lang.String
useAttributes2Feature
Returns "true" if the Attributes objects passed by this parser in ContentHandler.startElement() implement the org.xml.sax.ext.Attributes2 interface.static java.lang.String
useEntityResolver2Feature
Returns "true" if, when setEntityResolver is given an object implementing the org.xml.sax.ext.EntityResolver2 interface, those new methods will be used.static java.lang.String
useLocator2Feature
Returns "true" if the Locator objects passed by this parser in ContentHandler.setDocumentLocator() implement the org.xml.sax.ext.Locator2 interface.static java.lang.String
validationFeature
Controls whether the parser is reporting all validity errors (We don't report any validity errors.)static java.lang.String
XML11Feature
Returns "true" if the parser supports both XML 1.1 and XML 1.0.static java.lang.String
xmlnsURIsFeature
Controls whether, when the namespace-prefixes feature is set, the parser treats namespace declaration attributes as being in the http://www.w3.org/2000/xmlns/ namespace.
-
Constructor Summary
Constructors Constructor Description Parser()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description void
adup(char[] buff, int offset, int length)
Reports an attribute name without a value.void
aname(char[] buff, int offset, int length)
Reports an attribute name; a value will follow.void
aval(char[] buff, int offset, int length)
Reports an attribute value.void
cdsect(char[] buff, int offset, int length)
Reports the content of a CDATA section (not a CDATA element)void
cmnt(char[] buff, int offset, int length)
Reports a comment.void
comment(char[] ch, int start, int length)
void
decl(char[] buff, int offset, int length)
Parsing the complete XML Document Type Definition is way too complex, but for many simple cases we can extract something useful from it.void
endCDATA()
void
endDTD()
void
endEntity(java.lang.String name)
void
entity(char[] buff, int offset, int length)
Reports an entity reference or character reference.void
eof(char[] buff, int offset, int length)
Reports EOF.void
etag(char[] buff, int offset, int length)
Reports an end-tag.void
etag_basic(char[] buff, int offset, int length)
boolean
etag_cdata(char[] buff, int offset, int length)
org.xml.sax.ContentHandler
getContentHandler()
org.xml.sax.DTDHandler
getDTDHandler()
int
getEntity()
Returns the value of the last entity or character reference reported.org.xml.sax.EntityResolver
getEntityResolver()
org.xml.sax.ErrorHandler
getErrorHandler()
boolean
getFeature(java.lang.String name)
java.lang.Object
getProperty(java.lang.String name)
void
gi(char[] buff, int offset, int length)
Reports the general identifier (element type name) of a start-tag.void
parse(java.lang.String systemid)
void
parse(org.xml.sax.InputSource input)
void
pcdata(char[] buff, int offset, int length)
Reports character content.void
pi(char[] buff, int offset, int length)
Reports the data part of a processing instruction.void
pitarget(char[] buff, int offset, int length)
Reports the target part of a processing instruction.void
setContentHandler(org.xml.sax.ContentHandler handler)
void
setDTDHandler(org.xml.sax.DTDHandler handler)
void
setEntityResolver(org.xml.sax.EntityResolver resolver)
void
setErrorHandler(org.xml.sax.ErrorHandler handler)
void
setFeature(java.lang.String name, boolean value)
void
setProperty(java.lang.String name, java.lang.Object value)
void
stagc(char[] buff, int offset, int length)
Reports the close of a start-tag.void
stage(char[] buff, int offset, int length)
Reports the close of an empty-tag.void
startCDATA()
void
startDTD(java.lang.String name, java.lang.String publicid, java.lang.String systemid)
void
startEntity(java.lang.String name)
-
Methods inherited from class org.xml.sax.helpers.DefaultHandler
characters, endDocument, endElement, endPrefixMapping, error, fatalError, ignorableWhitespace, notationDecl, processingInstruction, resolveEntity, setDocumentLocator, skippedEntity, startDocument, startElement, startPrefixMapping, unparsedEntityDecl, warning
-
-
-
-
Field Detail
-
namespacesFeature
public static final java.lang.String namespacesFeature
A value of "true" indicates namespace URIs and unprefixed local names for element and attribute names will be available.- See Also:
- Constant Field Values
-
namespacePrefixesFeature
public static final java.lang.String namespacePrefixesFeature
A value of "true" indicates that XML qualified names (with prefixes) and attributes (including xmlns* attributes) will be available. We don't support this value.- See Also:
- Constant Field Values
-
externalGeneralEntitiesFeature
public static final java.lang.String externalGeneralEntitiesFeature
Reports whether this parser processes external general entities (it doesn't).- See Also:
- Constant Field Values
-
externalParameterEntitiesFeature
public static final java.lang.String externalParameterEntitiesFeature
Reports whether this parser processes external parameter entities (it doesn't).- See Also:
- Constant Field Values
-
isStandaloneFeature
public static final java.lang.String isStandaloneFeature
May be examined only during a parse, after the startDocument() callback has been completed; read-only. The value is true if the document specified standalone="yes" in its XML declaration, and otherwise is false. (It's always false.)- See Also:
- Constant Field Values
-
lexicalHandlerParameterEntitiesFeature
public static final java.lang.String lexicalHandlerParameterEntitiesFeature
A value of "true" indicates that the LexicalHandler will report the beginning and end of parameter entities (it won't).- See Also:
- Constant Field Values
-
resolveDTDURIsFeature
public static final java.lang.String resolveDTDURIsFeature
A value of "true" indicates that system IDs in declarations will be absolutized (relative to their base URIs) before reporting. (This returns true but doesn't actually do anything.)- See Also:
- Constant Field Values
-
stringInterningFeature
public static final java.lang.String stringInterningFeature
Has a value of "true" if all XML names (for elements, prefixes, attributes, entities, notations, and local names), as well as Namespace URIs, will have been interned using java.lang.String.intern. This supports fast testing of equality/inequality against string constants, rather than forcing slower calls to String.equals(). (We always intern.)- See Also:
- Constant Field Values
-
useAttributes2Feature
public static final java.lang.String useAttributes2Feature
Returns "true" if the Attributes objects passed by this parser in ContentHandler.startElement() implement the org.xml.sax.ext.Attributes2 interface. (They don't.)- See Also:
- Constant Field Values
-
useLocator2Feature
public static final java.lang.String useLocator2Feature
Returns "true" if the Locator objects passed by this parser in ContentHandler.setDocumentLocator() implement the org.xml.sax.ext.Locator2 interface. (They don't.)- See Also:
- Constant Field Values
-
useEntityResolver2Feature
public static final java.lang.String useEntityResolver2Feature
Returns "true" if, when setEntityResolver is given an object implementing the org.xml.sax.ext.EntityResolver2 interface, those new methods will be used. (They won't be.)- See Also:
- Constant Field Values
-
validationFeature
public static final java.lang.String validationFeature
Controls whether the parser is reporting all validity errors (We don't report any validity errors.)- See Also:
- Constant Field Values
-
unicodeNormalizationCheckingFeature
public static final java.lang.String unicodeNormalizationCheckingFeature
Controls whether the parser reports Unicode normalization errors as described in section 2.13 and Appendix B of the XML 1.1 Recommendation. (We don't normalize.)- See Also:
- Constant Field Values
-
xmlnsURIsFeature
public static final java.lang.String xmlnsURIsFeature
Controls whether, when the namespace-prefixes feature is set, the parser treats namespace declaration attributes as being in the http://www.w3.org/2000/xmlns/ namespace. (It doesn't.)- See Also:
- Constant Field Values
-
XML11Feature
public static final java.lang.String XML11Feature
Returns "true" if the parser supports both XML 1.1 and XML 1.0. (Always false.)- See Also:
- Constant Field Values
-
ignoreBogonsFeature
public static final java.lang.String ignoreBogonsFeature
A value of "true" indicates that the parser will ignore unknown elements.- See Also:
- Constant Field Values
-
bogonsEmptyFeature
public static final java.lang.String bogonsEmptyFeature
A value of "true" indicates that the parser will give unknown elements a content model of EMPTY; a value of "false", a content model of ANY.- See Also:
- Constant Field Values
-
rootBogonsFeature
public static final java.lang.String rootBogonsFeature
A value of "true" indicates that the parser will allow unknown elements to be the root element.- See Also:
- Constant Field Values
-
defaultAttributesFeature
public static final java.lang.String defaultAttributesFeature
A value of "true" indicates that the parser will return default attribute values for missing attributes that have default values.- See Also:
- Constant Field Values
-
translateColonsFeature
public static final java.lang.String translateColonsFeature
A value of "true" indicates that the parser will translate colons into underscores in names.- See Also:
- Constant Field Values
-
restartElementsFeature
public static final java.lang.String restartElementsFeature
A value of "true" indicates that the parser will attempt to restart the restartable elements.- See Also:
- Constant Field Values
-
ignorableWhitespaceFeature
public static final java.lang.String ignorableWhitespaceFeature
A value of "true" indicates that the parser will transmit whitespace in element-only content via the SAX ignorableWhitespace callback. Normally this is not done, because HTML is an SGML application and SGML suppresses such whitespace.- See Also:
- Constant Field Values
-
CDATAElementsFeature
public static final java.lang.String CDATAElementsFeature
A value of "true" indicates that the parser will treat CDATA elements specially. Normally true, since the input is by default HTML.- See Also:
- Constant Field Values
-
lexicalHandlerProperty
public static final java.lang.String lexicalHandlerProperty
Used to see some syntax events that are essential in some applications: comments, CDATA delimiters, selected general entity inclusions, and the start and end of the DTD (and declaration of document element name). The Object must implement org.xml.sax.ext.LexicalHandler.- See Also:
- Constant Field Values
-
scannerProperty
public static final java.lang.String scannerProperty
Specifies the Scanner object this Parser uses.- See Also:
- Constant Field Values
-
schemaProperty
public static final java.lang.String schemaProperty
Specifies the Schema object this Parser uses.- See Also:
- Constant Field Values
-
autoDetectorProperty
public static final java.lang.String autoDetectorProperty
Specifies the AutoDetector (for encoding detection) this Parser uses.- See Also:
- Constant Field Values
-
-
Method Detail
-
getFeature
public boolean getFeature(java.lang.String name) throws org.xml.sax.SAXNotRecognizedException, org.xml.sax.SAXNotSupportedException
- Specified by:
getFeature
in interfaceorg.xml.sax.XMLReader
- Throws:
org.xml.sax.SAXNotRecognizedException
org.xml.sax.SAXNotSupportedException
-
setFeature
public void setFeature(java.lang.String name, boolean value) throws org.xml.sax.SAXNotRecognizedException, org.xml.sax.SAXNotSupportedException
- Specified by:
setFeature
in interfaceorg.xml.sax.XMLReader
- Throws:
org.xml.sax.SAXNotRecognizedException
org.xml.sax.SAXNotSupportedException
-
getProperty
public java.lang.Object getProperty(java.lang.String name) throws org.xml.sax.SAXNotRecognizedException, org.xml.sax.SAXNotSupportedException
- Specified by:
getProperty
in interfaceorg.xml.sax.XMLReader
- Throws:
org.xml.sax.SAXNotRecognizedException
org.xml.sax.SAXNotSupportedException
-
setProperty
public void setProperty(java.lang.String name, java.lang.Object value) throws org.xml.sax.SAXNotRecognizedException, org.xml.sax.SAXNotSupportedException
- Specified by:
setProperty
in interfaceorg.xml.sax.XMLReader
- Throws:
org.xml.sax.SAXNotRecognizedException
org.xml.sax.SAXNotSupportedException
-
setEntityResolver
public void setEntityResolver(org.xml.sax.EntityResolver resolver)
- Specified by:
setEntityResolver
in interfaceorg.xml.sax.XMLReader
-
getEntityResolver
public org.xml.sax.EntityResolver getEntityResolver()
- Specified by:
getEntityResolver
in interfaceorg.xml.sax.XMLReader
-
setDTDHandler
public void setDTDHandler(org.xml.sax.DTDHandler handler)
- Specified by:
setDTDHandler
in interfaceorg.xml.sax.XMLReader
-
getDTDHandler
public org.xml.sax.DTDHandler getDTDHandler()
- Specified by:
getDTDHandler
in interfaceorg.xml.sax.XMLReader
-
setContentHandler
public void setContentHandler(org.xml.sax.ContentHandler handler)
- Specified by:
setContentHandler
in interfaceorg.xml.sax.XMLReader
-
getContentHandler
public org.xml.sax.ContentHandler getContentHandler()
- Specified by:
getContentHandler
in interfaceorg.xml.sax.XMLReader
-
setErrorHandler
public void setErrorHandler(org.xml.sax.ErrorHandler handler)
- Specified by:
setErrorHandler
in interfaceorg.xml.sax.XMLReader
-
getErrorHandler
public org.xml.sax.ErrorHandler getErrorHandler()
- Specified by:
getErrorHandler
in interfaceorg.xml.sax.XMLReader
-
parse
public void parse(org.xml.sax.InputSource input) throws java.io.IOException, org.xml.sax.SAXException
- Specified by:
parse
in interfaceorg.xml.sax.XMLReader
- Throws:
java.io.IOException
org.xml.sax.SAXException
-
parse
public void parse(java.lang.String systemid) throws java.io.IOException, org.xml.sax.SAXException
- Specified by:
parse
in interfaceorg.xml.sax.XMLReader
- Throws:
java.io.IOException
org.xml.sax.SAXException
-
adup
public void adup(char[] buff, int offset, int length) throws org.xml.sax.SAXException
Description copied from interface:ScanHandler
Reports an attribute name without a value.- Specified by:
adup
in interfaceScanHandler
- Throws:
org.xml.sax.SAXException
-
aname
public void aname(char[] buff, int offset, int length) throws org.xml.sax.SAXException
Description copied from interface:ScanHandler
Reports an attribute name; a value will follow.- Specified by:
aname
in interfaceScanHandler
- Throws:
org.xml.sax.SAXException
-
aval
public void aval(char[] buff, int offset, int length) throws org.xml.sax.SAXException
Description copied from interface:ScanHandler
Reports an attribute value.- Specified by:
aval
in interfaceScanHandler
- Throws:
org.xml.sax.SAXException
-
entity
public void entity(char[] buff, int offset, int length) throws org.xml.sax.SAXException
Description copied from interface:ScanHandler
Reports an entity reference or character reference.- Specified by:
entity
in interfaceScanHandler
- Throws:
org.xml.sax.SAXException
-
eof
public void eof(char[] buff, int offset, int length) throws org.xml.sax.SAXException
Description copied from interface:ScanHandler
Reports EOF.- Specified by:
eof
in interfaceScanHandler
- Throws:
org.xml.sax.SAXException
-
etag
public void etag(char[] buff, int offset, int length) throws org.xml.sax.SAXException
Description copied from interface:ScanHandler
Reports an end-tag.- Specified by:
etag
in interfaceScanHandler
- Throws:
org.xml.sax.SAXException
-
etag_cdata
public boolean etag_cdata(char[] buff, int offset, int length) throws org.xml.sax.SAXException
- Throws:
org.xml.sax.SAXException
-
etag_basic
public void etag_basic(char[] buff, int offset, int length) throws org.xml.sax.SAXException
- Throws:
org.xml.sax.SAXException
-
decl
public void decl(char[] buff, int offset, int length) throws org.xml.sax.SAXException
Parsing the complete XML Document Type Definition is way too complex, but for many simple cases we can extract something useful from it. doctypedecl ::= '' DeclSep ::= PEReference | S intSubset ::= (markupdecl | DeclSep)* markupdecl ::= elementdecl | AttlistDecl | EntityDecl | NotationDecl | PI | Comment ExternalID ::= 'SYSTEM' S SystemLiteral | 'PUBLIC' S PubidLiteral S SystemLiteral- Specified by:
decl
in interfaceScanHandler
- Throws:
org.xml.sax.SAXException
-
gi
public void gi(char[] buff, int offset, int length) throws org.xml.sax.SAXException
Description copied from interface:ScanHandler
Reports the general identifier (element type name) of a start-tag.- Specified by:
gi
in interfaceScanHandler
- Throws:
org.xml.sax.SAXException
-
cdsect
public void cdsect(char[] buff, int offset, int length) throws org.xml.sax.SAXException
Description copied from interface:ScanHandler
Reports the content of a CDATA section (not a CDATA element)- Specified by:
cdsect
in interfaceScanHandler
- Throws:
org.xml.sax.SAXException
-
pcdata
public void pcdata(char[] buff, int offset, int length) throws org.xml.sax.SAXException
Description copied from interface:ScanHandler
Reports character content.- Specified by:
pcdata
in interfaceScanHandler
- Throws:
org.xml.sax.SAXException
-
pitarget
public void pitarget(char[] buff, int offset, int length) throws org.xml.sax.SAXException
Description copied from interface:ScanHandler
Reports the target part of a processing instruction.- Specified by:
pitarget
in interfaceScanHandler
- Throws:
org.xml.sax.SAXException
-
pi
public void pi(char[] buff, int offset, int length) throws org.xml.sax.SAXException
Description copied from interface:ScanHandler
Reports the data part of a processing instruction.- Specified by:
pi
in interfaceScanHandler
- Throws:
org.xml.sax.SAXException
-
stagc
public void stagc(char[] buff, int offset, int length) throws org.xml.sax.SAXException
Description copied from interface:ScanHandler
Reports the close of a start-tag.- Specified by:
stagc
in interfaceScanHandler
- Throws:
org.xml.sax.SAXException
-
stage
public void stage(char[] buff, int offset, int length) throws org.xml.sax.SAXException
Description copied from interface:ScanHandler
Reports the close of an empty-tag.- Specified by:
stage
in interfaceScanHandler
- Throws:
org.xml.sax.SAXException
-
cmnt
public void cmnt(char[] buff, int offset, int length) throws org.xml.sax.SAXException
Description copied from interface:ScanHandler
Reports a comment.- Specified by:
cmnt
in interfaceScanHandler
- Throws:
org.xml.sax.SAXException
-
getEntity
public int getEntity()
Description copied from interface:ScanHandler
Returns the value of the last entity or character reference reported.- Specified by:
getEntity
in interfaceScanHandler
-
comment
public void comment(char[] ch, int start, int length) throws org.xml.sax.SAXException
- Specified by:
comment
in interfaceorg.xml.sax.ext.LexicalHandler
- Throws:
org.xml.sax.SAXException
-
endCDATA
public void endCDATA() throws org.xml.sax.SAXException
- Specified by:
endCDATA
in interfaceorg.xml.sax.ext.LexicalHandler
- Throws:
org.xml.sax.SAXException
-
endDTD
public void endDTD() throws org.xml.sax.SAXException
- Specified by:
endDTD
in interfaceorg.xml.sax.ext.LexicalHandler
- Throws:
org.xml.sax.SAXException
-
endEntity
public void endEntity(java.lang.String name) throws org.xml.sax.SAXException
- Specified by:
endEntity
in interfaceorg.xml.sax.ext.LexicalHandler
- Throws:
org.xml.sax.SAXException
-
startCDATA
public void startCDATA() throws org.xml.sax.SAXException
- Specified by:
startCDATA
in interfaceorg.xml.sax.ext.LexicalHandler
- Throws:
org.xml.sax.SAXException
-
startDTD
public void startDTD(java.lang.String name, java.lang.String publicid, java.lang.String systemid) throws org.xml.sax.SAXException
- Specified by:
startDTD
in interfaceorg.xml.sax.ext.LexicalHandler
- Throws:
org.xml.sax.SAXException
-
startEntity
public void startEntity(java.lang.String name) throws org.xml.sax.SAXException
- Specified by:
startEntity
in interfaceorg.xml.sax.ext.LexicalHandler
- Throws:
org.xml.sax.SAXException
-
-