|
||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||
java.lang.Objectorg.htmlparser.visitors.NodeVisitor
org.opencms.util.StringBean
public class StringBean
Extracts the HTML page content.
| Field Summary | |
|---|---|
protected StringBuffer |
m_buffer
The buffer text is stored in while traversing the HTML. |
protected boolean |
m_collapse
If true sequences of whitespace characters are replaced
with a single space character. |
protected boolean |
m_isPre
Set true when traversing a PRE tag. |
protected boolean |
m_isScript
Set true when traversing a SCRIPT tag. |
protected boolean |
m_isStyle
Set true when traversing a STYLE tag. |
protected boolean |
m_links
If true the link URLs are embedded in the text output. |
protected String |
m_strings
The strings extracted from the URL. |
| Constructor Summary | |
|---|---|
StringBean()
Create a StringBean object. |
|
| Method Summary | |
|---|---|
protected void |
carriageReturn()
Appends a newline to the buffer if there isn't one there already. |
protected void |
carriageReturn(boolean check)
Appends a newline to the buffer if there isn't one there already. |
protected void |
collapse(StringBuffer buffer,
String string)
Add the given text collapsing whitespace. |
boolean |
getCollapse()
Get the current 'collapse whitespace' state. |
boolean |
getLinks()
Get the current 'include links' state. |
String |
getStrings()
Return the textual contents of the URL. |
void |
setCollapse(boolean collapse)
Set the current 'collapse whitespace' state. |
void |
setLinks(boolean links)
Set the 'include links' state. |
protected void |
setStrings()
Fetch the URL contents. |
protected void |
updateStrings(String strings)
Assign the Strings property, firing the property change. |
void |
visitEndTag(org.htmlparser.Tag tag)
Resets the state of the PRE and SCRIPT flags. |
void |
visitStringNode(org.htmlparser.Text string)
Appends the text to the output. |
void |
visitTag(org.htmlparser.Tag tag)
Appends a NEWLINE to the output if the tag breaks flow, and possibly sets the state of the PRE and SCRIPT flags. |
| Methods inherited from class org.htmlparser.visitors.NodeVisitor |
|---|
beginParsing, finishedParsing, shouldRecurseChildren, shouldRecurseSelf, visitRemarkNode |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Field Detail |
|---|
protected StringBuffer m_buffer
protected boolean m_collapse
true sequences of whitespace characters are replaced
with a single space character.
protected boolean m_isPre
true when traversing a PRE tag.
protected boolean m_isScript
true when traversing a SCRIPT tag.
protected boolean m_isStyle
true when traversing a STYLE tag.
protected boolean m_links
true the link URLs are embedded in the text output.
protected String m_strings
| Constructor Detail |
|---|
public StringBean()
Links is set false so text appears like a
browser would display it, albeit without the colour or underline clues
normally associated with a link.
ReplaceNonBreakingSpaces is set true, so
that printing the text works, but the extra information regarding these
formatting marks is available if you set it false.
Collapse is set true, so text appears
compact like a browser would display it.
| Method Detail |
|---|
public boolean getCollapse()
true this emulates the operation of browsers
in interpretting text where user agents should collapse input white space sequences when producing output inter-word space. See HTML specification section 9.1 White space http://www.w3.org/TR/html4/struct/text.html#h-9.1.
true if sequences of whitespace (space '\u0020',
tab '\u0009', form feed '\u000C', zero-width space '\u200B',
carriage-return '\r' and NEWLINE '\n') are to be replaced with a single
space.public boolean getLinks()
true if link text is included in the text extracted
from the URL, false otherwise.public String getStrings()
public void setCollapse(boolean collapse)
collapse - If true, sequences of whitespace
will be reduced to a single space.public void setLinks(boolean links)
links - Use true if link text is to be included in the
text extracted from the URL, false otherwise.public void visitEndTag(org.htmlparser.Tag tag)
visitEndTag in class org.htmlparser.visitors.NodeVisitortag - The end tag to process.public void visitStringNode(org.htmlparser.Text string)
visitStringNode in class org.htmlparser.visitors.NodeVisitorstring - The text node.public void visitTag(org.htmlparser.Tag tag)
visitTag in class org.htmlparser.visitors.NodeVisitortag - The tag to examine.protected void carriageReturn()
protected void carriageReturn(boolean check)
check - a parameter the developer forgot to comment
protected void collapse(StringBuffer buffer,
String string)
state 0: whitepace was last emitted character
state 1: in whitespace
state 2: in word
A whitespace character moves us to state 1 and any other character
moves us to state 2, except that state 0 stays in state 0 until
a non-whitespace and going from whitespace to word we emit a space
before the character:
input: whitespace other-character
state\next
0 0 2
1 1 space then 2
2 1 2
buffer - The buffer to append to.string - The string to append.protected void setStrings()
protected void updateStrings(String strings)
Strings property, firing the property change.
strings - The new value of the Strings property.
|
||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||