Package org.ccil.cowan.tagsoup
Class HTMLScanner
- java.lang.Object
-
- org.ccil.cowan.tagsoup.HTMLScanner
-
- All Implemented Interfaces:
Scanner
,org.xml.sax.Locator
public class HTMLScanner extends java.lang.Object implements Scanner, org.xml.sax.Locator
This class implements a table-driven scanner for HTML, allowing for lots of defects. It implements the Scanner interface, which accepts a Reader object to fetch characters from and a ScanHandler object to report lexical events to.
-
-
Constructor Summary
Constructors Constructor Description HTMLScanner()
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description int
getColumnNumber()
int
getLineNumber()
java.lang.String
getPublicId()
java.lang.String
getSystemId()
static void
main(java.lang.String[] argv)
Test procedure.void
resetDocumentLocator(java.lang.String publicid, java.lang.String systemid)
Reset document locator, supplying systemid and publicid.void
scan(java.io.Reader r0, ScanHandler h)
Scan HTML source, reporting lexical events.void
startCDATA()
A callback for the ScanHandler that allows it to force the lexer state to CDATA content (no markup is recognized except the end of element.
-
-
-
Method Detail
-
getLineNumber
public int getLineNumber()
- Specified by:
getLineNumber
in interfaceorg.xml.sax.Locator
-
getColumnNumber
public int getColumnNumber()
- Specified by:
getColumnNumber
in interfaceorg.xml.sax.Locator
-
getPublicId
public java.lang.String getPublicId()
- Specified by:
getPublicId
in interfaceorg.xml.sax.Locator
-
getSystemId
public java.lang.String getSystemId()
- Specified by:
getSystemId
in interfaceorg.xml.sax.Locator
-
resetDocumentLocator
public void resetDocumentLocator(java.lang.String publicid, java.lang.String systemid)
Reset document locator, supplying systemid and publicid.- Specified by:
resetDocumentLocator
in interfaceScanner
- Parameters:
systemid
- System idpublicid
- Public id
-
scan
public void scan(java.io.Reader r0, ScanHandler h) throws java.io.IOException, org.xml.sax.SAXException
Scan HTML source, reporting lexical events.
-
startCDATA
public void startCDATA()
A callback for the ScanHandler that allows it to force the lexer state to CDATA content (no markup is recognized except the end of element.- Specified by:
startCDATA
in interfaceScanner
-
main
public static void main(java.lang.String[] argv) throws java.io.IOException, org.xml.sax.SAXException
Test procedure. Reads HTML from the standard input and writes PYX to the standard output.- Throws:
java.io.IOException
org.xml.sax.SAXException
-
-