java.lang.Object
org.openpdf.resource.HtmlParserConfig
Configuration options for the htmlunit-neko HTML parser.
This class provides a builder-style API for configuring the HTML5 parser behavior. The configuration options correspond to features and properties available in the htmlunit-neko parser (htmlunit-neko).
Example usage:
HtmlParserConfig config = HtmlParserConfig.builder()
.reportErrors(true)
.allowSelfClosingTags(true)
.elementNameCase("lower")
.encoding("UTF-8")
.build();
- See Also:
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionstatic final classBuilder for creating HtmlParserConfig instances. -
Field Summary
Fields -
Method Summary
Modifier and TypeMethodDescriptionstatic HtmlParserConfig.Builderbuilder()Creates a new configuration builder.static HtmlParserConfigdefaults()Returns the default configuration.@Nullable StringGet the attribute name case handling setting.@Nullable StringGet the element name case handling setting.@Nullable StringGet the default character encoding.booleanWhether to allow self-closing iframe tags.booleanWhether to allow XHTML-style self-closing tags for all elements.booleanWhether to parse content within<noscript>tags as HTML markup.booleanWhether to report parsing errors.booleanWhether to strip HTML comment delimiters from script content.booleanWhether to strip HTML comment delimiters from style content.
-
Field Details
-
CASE_UPPER
Element name case values for configuring name handling.- See Also:
-
CASE_LOWER
- See Also:
-
CASE_DEFAULT
- See Also:
-
-
Method Details
-
defaults
Returns the default configuration.Default settings:
- reportErrors: false
- allowSelfClosingTags: false
- allowSelfClosingIframe: false
- parseNoScriptContent: true
- scriptStripCommentDelims: false
- styleStripCommentDelims: false
- elementNameCase: null (parser default)
- attributeNameCase: null (parser default)
- encoding: null (auto-detect)
- Returns:
- the default configuration
-
builder
Creates a new configuration builder.- Returns:
- a new Builder instance
-
isReportErrors
public boolean isReportErrors()Whether to report parsing errors. When enabled, the parser will report syntax errors, malformed markup, and other parsing issues.- Returns:
- true if error reporting is enabled
-
isAllowSelfClosingTags
public boolean isAllowSelfClosingTags()Whether to allow XHTML-style self-closing tags for all elements. When enabled, treats tags like<div/>as complete elements rather than requiring separate closing tags.- Returns:
- true if self-closing tags are allowed
-
isAllowSelfClosingIframe
public boolean isAllowSelfClosingIframe()Whether to allow self-closing iframe tags. When enabled, treats<iframe/>as a complete element.- Returns:
- true if self-closing iframe tags are allowed
-
isParseNoScriptContent
public boolean isParseNoScriptContent()Whether to parse content within<noscript>tags as HTML markup. When disabled, noscript content is treated as plain text.- Returns:
- true if noscript content should be parsed as markup
-
isScriptStripCommentDelims
public boolean isScriptStripCommentDelims()Whether to strip HTML comment delimiters from script content. Useful for handling legacy JavaScript wrapped in HTML comments.- Returns:
- true if script comment delimiters should be stripped
-
isStyleStripCommentDelims
public boolean isStyleStripCommentDelims()Whether to strip HTML comment delimiters from style content. Useful for handling CSS wrapped in HTML comments.- Returns:
- true if style comment delimiters should be stripped
-
getElementNameCase
Get the element name case handling setting.Possible values:
"upper"- convert element names to uppercase"lower"- convert element names to lowercase"default"- preserve original casenull- use parser default
- Returns:
- the element name case setting, or null for parser default
-
getAttributeNameCase
Get the attribute name case handling setting.Possible values:
"upper"- convert attribute names to uppercase"lower"- convert attribute names to lowercase"default"- preserve original casenull- use parser default
- Returns:
- the attribute name case setting, or null for parser default
-
getEncoding
Get the default character encoding.- Returns:
- the encoding name, or null for auto-detection
-