org.opencms.util
public class CmsHtmlParser extends org.htmlparser.visitors.NodeVisitor implements I_CmsHtmlNodeVisitor
NodeVisitor
implementations, which provides some often used utility functions.
This base implementation is only a "pass through" class, that is the content is parsed, but the generated result is exactly identical to the input.
Modifier and Type | Field and Description |
---|---|
protected boolean |
m_echo
Indicates if "echo" mode is on, that is all content is written to the result by default.
|
protected java.util.List<java.lang.String> |
m_noAutoCloseTags
List of upper case tag name strings of tags that should not be auto-corrected if closing divs are missing.
|
protected java.lang.StringBuffer |
m_result
The buffer to write the out to.
|
protected static java.lang.String[] |
TAG_ARRAY
The array of supported tag names.
|
protected static java.util.List<java.lang.String> |
TAG_LIST
The list of supported tag names.
|
Constructor and Description |
---|
CmsHtmlParser()
Creates a new instance of the html converter with echo mode set to
false . |
CmsHtmlParser(boolean echo)
Creates a new instance of the html converter.
|
Modifier and Type | Method and Description |
---|---|
protected java.lang.String |
collapse(java.lang.String string)
Collapse HTML whitespace in the given String.
|
protected org.htmlparser.PrototypicalNodeFactory |
configureNoAutoCorrectionTags()
Internally degrades Composite tags that do have children in the DOM tree
to simple single tags.
|
java.lang.String |
getConfiguration()
Returns the configuartion String of this visitor or the empty String if was not provided
before.
|
java.util.List<java.lang.String> |
getNoAutoCloseTags()
Returns a list of upper case tag names for which parsing / visiting will not correct missing closing tags.
|
java.lang.String |
getResult()
Returns the text extraction result.
|
java.lang.String |
getTagHtml(org.htmlparser.Tag tag)
Returns the HTML for the given tag itself (not the tag content).
|
java.lang.String |
process(java.lang.String html,
java.lang.String encoding)
Extracts the text from the given html content, assuming the given html encoding.
|
void |
setConfiguration(java.lang.String configuration)
Set a configuartion String for this visitor.
|
void |
setNoAutoCloseTags(java.util.List<java.lang.String> noAutoCloseTagList)
Sets a list of upper case tag names for which parsing / visiting should not correct missing closing tags.
|
void |
visitEndTag(org.htmlparser.Tag tag)
Visitor method (callback) invoked when a closing Tag is encountered.
|
void |
visitRemarkNode(org.htmlparser.Remark remark)
Visitor method (callback) invoked when a remark Tag (HTML comment) is encountered.
|
void |
visitStringNode(org.htmlparser.Text text)
Visitor method (callback) invoked when a remark Tag (HTML comment) is encountered.
|
void |
visitTag(org.htmlparser.Tag tag)
Visitor method (callback) invoked when a starting Tag (HTML comment) is encountered.
|
protected java.util.List<java.lang.String> m_noAutoCloseTags
protected static final java.lang.String[] TAG_ARRAY
protected static final java.util.List<java.lang.String> TAG_LIST
protected boolean m_echo
protected java.lang.StringBuffer m_result
public CmsHtmlParser()
false
.
public CmsHtmlParser(boolean echo)
echo
- indicates if "echo" mode is on, that is all content is written to the resultprotected org.htmlparser.PrototypicalNodeFactory configureNoAutoCorrectionTags()
setNoAutoCloseTags(List)
public java.lang.String getConfiguration()
I_CmsHtmlNodeVisitor
getConfiguration
in interface I_CmsHtmlNodeVisitor
I_CmsHtmlNodeVisitor.getConfiguration()
public java.lang.String getResult()
I_CmsHtmlNodeVisitor
getResult
in interface I_CmsHtmlNodeVisitor
I_CmsHtmlNodeVisitor.getResult()
public java.lang.String getTagHtml(org.htmlparser.Tag tag)
tag
- the tag to create the HTML forpublic java.lang.String process(java.lang.String html, java.lang.String encoding) throws org.htmlparser.util.ParserException
I_CmsHtmlNodeVisitor
process
in interface I_CmsHtmlNodeVisitor
html
- the content to extract the plain text fromencoding
- the encoding to useorg.htmlparser.util.ParserException
- if something goes wrongI_CmsHtmlNodeVisitor.process(java.lang.String, java.lang.String)
public void setConfiguration(java.lang.String configuration)
I_CmsHtmlNodeVisitor
This will most likely be done with data from an xsd, custom jsp tag, ...
setConfiguration
in interface I_CmsHtmlNodeVisitor
configuration
- the configuration of this visitor to set.I_CmsHtmlNodeVisitor.setConfiguration(java.lang.String)
public void visitEndTag(org.htmlparser.Tag tag)
I_CmsHtmlNodeVisitor
visitEndTag
in interface I_CmsHtmlNodeVisitor
visitEndTag
in class org.htmlparser.visitors.NodeVisitor
tag
- the tag that is ended.I_CmsHtmlNodeVisitor.visitEndTag(org.htmlparser.Tag)
public void visitRemarkNode(org.htmlparser.Remark remark)
I_CmsHtmlNodeVisitor
visitRemarkNode
in interface I_CmsHtmlNodeVisitor
visitRemarkNode
in class org.htmlparser.visitors.NodeVisitor
remark
- the remark Tag to visit.I_CmsHtmlNodeVisitor.visitRemarkNode(org.htmlparser.Remark)
public void visitStringNode(org.htmlparser.Text text)
I_CmsHtmlNodeVisitor
visitStringNode
in interface I_CmsHtmlNodeVisitor
visitStringNode
in class org.htmlparser.visitors.NodeVisitor
text
- the text that is visited.I_CmsHtmlNodeVisitor.visitStringNode(org.htmlparser.Text)
public void visitTag(org.htmlparser.Tag tag)
I_CmsHtmlNodeVisitor
visitTag
in interface I_CmsHtmlNodeVisitor
visitTag
in class org.htmlparser.visitors.NodeVisitor
tag
- the tag that is visited.I_CmsHtmlNodeVisitor.visitTag(org.htmlparser.Tag)
protected java.lang.String collapse(java.lang.String string)
string
- the string to collapsepublic java.util.List<java.lang.String> getNoAutoCloseTags()
public void setNoAutoCloseTags(java.util.List<java.lang.String> noAutoCloseTagList)
setNoAutoCloseTags
in interface I_CmsHtmlNodeVisitor
noAutoCloseTagList
- a list of upper case tag names for which parsing / visiting
should not correct missing closing tags to set.