Package org.opencms.util
Class CmsHtmlParser
- java.lang.Object
-
- org.htmlparser.visitors.NodeVisitor
-
- org.opencms.util.CmsHtmlParser
-
- All Implemented Interfaces:
I_CmsHtmlNodeVisitor
- Direct Known Subclasses:
CmsHtml2TextConverter,CmsHtmlDecorator,CmsLinkProcessor
public class CmsHtmlParser extends org.htmlparser.visitors.NodeVisitor implements I_CmsHtmlNodeVisitor
Base utility class for OpenCmsimplementations, which provides some often used utility functions.NodeVisitorThis base implementation is only a "pass through" class, that is the content is parsed, but the generated result is exactly identical to the input.
- Since:
- 6.2.0
-
-
Field Summary
Fields Modifier and Type Field Description protected booleanm_echoIndicates if "echo" mode is on, that is all content is written to the result by default.protected java.util.List<java.lang.String>m_noAutoCloseTagsList of upper case tag name strings of tags that should not be auto-corrected if closing divs are missing.protected java.lang.StringBufferm_resultThe buffer to write the out to.protected static java.lang.String[]TAG_ARRAYThe array of supported tag names.protected static java.util.List<java.lang.String>TAG_LISTThe list of supported tag names.
-
Constructor Summary
Constructors Constructor Description CmsHtmlParser()Creates a new instance of the html converter with echo mode set tofalse.CmsHtmlParser(boolean echo)Creates a new instance of the html converter.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description protected java.lang.Stringcollapse(java.lang.String string)Collapse HTML whitespace in the given String.protected org.htmlparser.PrototypicalNodeFactoryconfigureNoAutoCorrectionTags()Internally degrades Composite tags that do have children in the DOM tree to simple single tags.java.lang.StringgetConfiguration()Returns the configuartion String of this visitor or the empty String if was not provided before.java.util.List<java.lang.String>getNoAutoCloseTags()Returns a list of upper case tag names for which parsing / visiting will not correct missing closing tags.java.lang.StringgetResult()Returns the text extraction result.java.lang.StringgetTagHtml(org.htmlparser.Tag tag)Returns the HTML for the given tag itself (not the tag content).java.lang.Stringprocess(java.lang.String html, java.lang.String encoding)Extracts the text from the given html content, assuming the given html encoding.voidsetConfiguration(java.lang.String configuration)Set a configuartion String for this visitor.voidsetNoAutoCloseTags(java.util.List<java.lang.String> noAutoCloseTagList)Sets a list of upper case tag names for which parsing / visiting should not correct missing closing tags.voidvisitEndTag(org.htmlparser.Tag tag)Visitor method (callback) invoked when a closing Tag is encountered.voidvisitRemarkNode(org.htmlparser.Remark remark)Visitor method (callback) invoked when a remark Tag (HTML comment) is encountered.voidvisitStringNode(org.htmlparser.Text text)Visitor method (callback) invoked when a remark Tag (HTML comment) is encountered.voidvisitTag(org.htmlparser.Tag tag)Visitor method (callback) invoked when a starting Tag (HTML comment) is encountered.
-
-
-
Field Detail
-
m_noAutoCloseTags
protected java.util.List<java.lang.String> m_noAutoCloseTags
List of upper case tag name strings of tags that should not be auto-corrected if closing divs are missing.
-
TAG_ARRAY
protected static final java.lang.String[] TAG_ARRAY
The array of supported tag names.
-
TAG_LIST
protected static final java.util.List<java.lang.String> TAG_LIST
The list of supported tag names.
-
m_echo
protected boolean m_echo
Indicates if "echo" mode is on, that is all content is written to the result by default.
-
m_result
protected java.lang.StringBuffer m_result
The buffer to write the out to.
-
-
Constructor Detail
-
CmsHtmlParser
public CmsHtmlParser()
Creates a new instance of the html converter with echo mode set tofalse.
-
CmsHtmlParser
public CmsHtmlParser(boolean echo)
Creates a new instance of the html converter.- Parameters:
echo- indicates if "echo" mode is on, that is all content is written to the result
-
-
Method Detail
-
configureNoAutoCorrectionTags
protected org.htmlparser.PrototypicalNodeFactory configureNoAutoCorrectionTags()
Internally degrades Composite tags that do have children in the DOM tree to simple single tags. This allows to avoid auto correction of unclosed HTML tags.- Returns:
- A node factory that will not autocorrect open tags specified via
setNoAutoCloseTags(List)
-
getConfiguration
public java.lang.String getConfiguration()
Description copied from interface:I_CmsHtmlNodeVisitorReturns the configuartion String of this visitor or the empty String if was not provided before.- Specified by:
getConfigurationin interfaceI_CmsHtmlNodeVisitor- Returns:
- the configuartion String of this visitor - by this contract never null but an empty String if not provided.
- See Also:
I_CmsHtmlNodeVisitor.getConfiguration()
-
getResult
public java.lang.String getResult()
Description copied from interface:I_CmsHtmlNodeVisitorReturns the text extraction result.- Specified by:
getResultin interfaceI_CmsHtmlNodeVisitor- Returns:
- the text extraction result
- See Also:
I_CmsHtmlNodeVisitor.getResult()
-
getTagHtml
public java.lang.String getTagHtml(org.htmlparser.Tag tag)
Returns the HTML for the given tag itself (not the tag content).- Parameters:
tag- the tag to create the HTML for- Returns:
- the HTML for the given tag
-
process
public java.lang.String process(java.lang.String html, java.lang.String encoding) throws org.htmlparser.util.ParserException
Description copied from interface:I_CmsHtmlNodeVisitorExtracts the text from the given html content, assuming the given html encoding.- Specified by:
processin interfaceI_CmsHtmlNodeVisitor- Parameters:
html- the content to extract the plain text fromencoding- the encoding to use- Returns:
- the text extracted from the given html content
- Throws:
org.htmlparser.util.ParserException- if something goes wrong- See Also:
I_CmsHtmlNodeVisitor.process(java.lang.String, java.lang.String)
-
setConfiguration
public void setConfiguration(java.lang.String configuration)
Description copied from interface:I_CmsHtmlNodeVisitorSet a configuartion String for this visitor.This will most likely be done with data from an xsd, custom jsp tag, ...
- Specified by:
setConfigurationin interfaceI_CmsHtmlNodeVisitor- Parameters:
configuration- the configuration of this visitor to set.- See Also:
I_CmsHtmlNodeVisitor.setConfiguration(java.lang.String)
-
visitEndTag
public void visitEndTag(org.htmlparser.Tag tag)
Description copied from interface:I_CmsHtmlNodeVisitorVisitor method (callback) invoked when a closing Tag is encountered.- Specified by:
visitEndTagin interfaceI_CmsHtmlNodeVisitor- Overrides:
visitEndTagin classorg.htmlparser.visitors.NodeVisitor- Parameters:
tag- the tag that is ended.- See Also:
I_CmsHtmlNodeVisitor.visitEndTag(org.htmlparser.Tag)
-
visitRemarkNode
public void visitRemarkNode(org.htmlparser.Remark remark)
Description copied from interface:I_CmsHtmlNodeVisitorVisitor method (callback) invoked when a remark Tag (HTML comment) is encountered.- Specified by:
visitRemarkNodein interfaceI_CmsHtmlNodeVisitor- Overrides:
visitRemarkNodein classorg.htmlparser.visitors.NodeVisitor- Parameters:
remark- the remark Tag to visit.- See Also:
I_CmsHtmlNodeVisitor.visitRemarkNode(org.htmlparser.Remark)
-
visitStringNode
public void visitStringNode(org.htmlparser.Text text)
Description copied from interface:I_CmsHtmlNodeVisitorVisitor method (callback) invoked when a remark Tag (HTML comment) is encountered.- Specified by:
visitStringNodein interfaceI_CmsHtmlNodeVisitor- Overrides:
visitStringNodein classorg.htmlparser.visitors.NodeVisitor- Parameters:
text- the text that is visited.- See Also:
I_CmsHtmlNodeVisitor.visitStringNode(org.htmlparser.Text)
-
visitTag
public void visitTag(org.htmlparser.Tag tag)
Description copied from interface:I_CmsHtmlNodeVisitorVisitor method (callback) invoked when a starting Tag (HTML comment) is encountered.- Specified by:
visitTagin interfaceI_CmsHtmlNodeVisitor- Overrides:
visitTagin classorg.htmlparser.visitors.NodeVisitor- Parameters:
tag- the tag that is visited.- See Also:
I_CmsHtmlNodeVisitor.visitTag(org.htmlparser.Tag)
-
collapse
protected java.lang.String collapse(java.lang.String string)
Collapse HTML whitespace in the given String.- Parameters:
string- the string to collapse- Returns:
- the input String with all HTML whitespace collapsed
-
getNoAutoCloseTags
public java.util.List<java.lang.String> getNoAutoCloseTags()
Returns a list of upper case tag names for which parsing / visiting will not correct missing closing tags.- Returns:
- a List of upper case tag names for which parsing / visiting will not correct missing closing tags
-
setNoAutoCloseTags
public void setNoAutoCloseTags(java.util.List<java.lang.String> noAutoCloseTagList)
Sets a list of upper case tag names for which parsing / visiting should not correct missing closing tags.- Specified by:
setNoAutoCloseTagsin interfaceI_CmsHtmlNodeVisitor- Parameters:
noAutoCloseTagList- a list of upper case tag names for which parsing / visiting should not correct missing closing tags to set.
-
-