- java.lang.Object
-
- org.codelibs.nekohtml.HTMLScanner.ContentScanner
-
- All Implemented Interfaces:
HTMLScanner.Scanner
- Enclosing class:
- HTMLScanner
public class HTMLScanner.ContentScanner extends Object implements HTMLScanner.Scanner
The primary HTML document scanner.- Author:
- Andy Clark
-
-
Constructor Summary
Constructors Constructor Description ContentScanner()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description protected voidaddLocationItem(org.apache.xerces.xni.XMLAttributes attributes, int index)Adds location augmentations to the specified attribute.protected StringnextContent(int len)Reads the next characters WITHOUT impacting the buffer content up to current offset.booleanscan(boolean complete)Scan.protected booleanscanAttribute(org.apache.xerces.util.XMLAttributesImpl attributes, boolean[] empty)Scans a real attribute.protected booleanscanAttribute(org.apache.xerces.util.XMLAttributesImpl attributes, boolean[] empty, char endc)Scans an attribute, pseudo or real.protected voidscanCDATA()Scans a CDATA section.protected voidscanCharacters()Scans characters.protected voidscanComment()Scans a comment.protected voidscanEndElement()Scans an end element.protected booleanscanMarkupContent(org.apache.xerces.util.XMLStringBuffer buffer, char cend)Scans markup content.protected voidscanPI()Scans a processing instruction.protected booleanscanPseudoAttribute(org.apache.xerces.util.XMLAttributesImpl attributes)Scans a pseudo attribute.protected StringscanStartElement(boolean[] empty)Scans a start element.
-
-
-
Method Detail
-
scan
public boolean scan(boolean complete) throws IOExceptionScan.- Specified by:
scanin interfaceHTMLScanner.Scanner- Parameters:
complete- True if the scanner should not return until scanning is complete.- Returns:
- True if additional scanning is required.
- Throws:
IOException- Thrown if I/O error occurs.
-
nextContent
protected String nextContent(int len) throws IOException
Reads the next characters WITHOUT impacting the buffer content up to current offset.- Parameters:
len- the number of characters to read- Returns:
- the read string (length may be smaller if EOF is encountered)
- Throws:
IOException
-
scanCharacters
protected void scanCharacters() throws IOExceptionScans characters.- Throws:
IOException
-
scanCDATA
protected void scanCDATA() throws IOExceptionScans a CDATA section.- Throws:
IOException
-
scanComment
protected void scanComment() throws IOExceptionScans a comment.- Throws:
IOException
-
scanMarkupContent
protected boolean scanMarkupContent(org.apache.xerces.util.XMLStringBuffer buffer, char cend) throws IOExceptionScans markup content.- Throws:
IOException
-
scanPI
protected void scanPI() throws IOExceptionScans a processing instruction.- Throws:
IOException
-
scanStartElement
protected String scanStartElement(boolean[] empty) throws IOException
Scans a start element.- Parameters:
empty- Is used for a second return value to indicate whether the start element tag is empty (e.g. "/>").- Throws:
IOException
-
scanAttribute
protected boolean scanAttribute(org.apache.xerces.util.XMLAttributesImpl attributes, boolean[] empty) throws IOExceptionScans a real attribute.- Parameters:
attributes- The list of attributes.empty- Is used for a second return value to indicate whether the start element tag is empty (e.g. "/>").- Throws:
IOException
-
scanPseudoAttribute
protected boolean scanPseudoAttribute(org.apache.xerces.util.XMLAttributesImpl attributes) throws IOExceptionScans a pseudo attribute.- Parameters:
attributes- The list of attributes.- Throws:
IOException
-
scanAttribute
protected boolean scanAttribute(org.apache.xerces.util.XMLAttributesImpl attributes, boolean[] empty, char endc) throws IOExceptionScans an attribute, pseudo or real.- Parameters:
attributes- The list of attributes.empty- Is used for a second return value to indicate whether the start element tag is empty (e.g. "/>").endc- The end character that appears before the closing angle bracket ('>').- Returns:
- Throws:
IOException
-
addLocationItem
protected void addLocationItem(org.apache.xerces.xni.XMLAttributes attributes, int index)Adds location augmentations to the specified attribute.- Parameters:
attributes-index-
-
scanEndElement
protected void scanEndElement() throws IOExceptionScans an end element.- Throws:
IOException
-
-