Class HtmlParserConfig

java.lang.Object
org.openpdf.resource.HtmlParserConfig

public final class HtmlParserConfig extends Object
Configuration options for the htmlunit-neko HTML parser.

This class provides a builder-style API for configuring the HTML5 parser behavior. The configuration options correspond to features and properties available in the htmlunit-neko parser (htmlunit-neko).

Example usage:

 HtmlParserConfig config = HtmlParserConfig.builder()
     .reportErrors(true)
     .allowSelfClosingTags(true)
     .elementNameCase("lower")
     .encoding("UTF-8")
     .build();
 
See Also:
  • Field Details

  • Method Details

    • defaults

      public static HtmlParserConfig defaults()
      Returns the default configuration.

      Default settings:

      • reportErrors: false
      • allowSelfClosingTags: false
      • allowSelfClosingIframe: false
      • parseNoScriptContent: true
      • scriptStripCommentDelims: false
      • styleStripCommentDelims: false
      • elementNameCase: null (parser default)
      • attributeNameCase: null (parser default)
      • encoding: null (auto-detect)
      Returns:
      the default configuration
    • builder

      public static HtmlParserConfig.Builder builder()
      Creates a new configuration builder.
      Returns:
      a new Builder instance
    • isReportErrors

      public boolean isReportErrors()
      Whether to report parsing errors. When enabled, the parser will report syntax errors, malformed markup, and other parsing issues.
      Returns:
      true if error reporting is enabled
    • isAllowSelfClosingTags

      public boolean isAllowSelfClosingTags()
      Whether to allow XHTML-style self-closing tags for all elements. When enabled, treats tags like <div/> as complete elements rather than requiring separate closing tags.
      Returns:
      true if self-closing tags are allowed
    • isAllowSelfClosingIframe

      public boolean isAllowSelfClosingIframe()
      Whether to allow self-closing iframe tags. When enabled, treats <iframe/> as a complete element.
      Returns:
      true if self-closing iframe tags are allowed
    • isParseNoScriptContent

      public boolean isParseNoScriptContent()
      Whether to parse content within <noscript> tags as HTML markup. When disabled, noscript content is treated as plain text.
      Returns:
      true if noscript content should be parsed as markup
    • isScriptStripCommentDelims

      public boolean isScriptStripCommentDelims()
      Whether to strip HTML comment delimiters from script content. Useful for handling legacy JavaScript wrapped in HTML comments.
      Returns:
      true if script comment delimiters should be stripped
    • isStyleStripCommentDelims

      public boolean isStyleStripCommentDelims()
      Whether to strip HTML comment delimiters from style content. Useful for handling CSS wrapped in HTML comments.
      Returns:
      true if style comment delimiters should be stripped
    • getElementNameCase

      public @Nullable String getElementNameCase()
      Get the element name case handling setting.

      Possible values:

      • "upper" - convert element names to uppercase
      • "lower" - convert element names to lowercase
      • "default" - preserve original case
      • null - use parser default
      Returns:
      the element name case setting, or null for parser default
    • getAttributeNameCase

      public @Nullable String getAttributeNameCase()
      Get the attribute name case handling setting.

      Possible values:

      • "upper" - convert attribute names to uppercase
      • "lower" - convert attribute names to lowercase
      • "default" - preserve original case
      • null - use parser default
      Returns:
      the attribute name case setting, or null for parser default
    • getEncoding

      public @Nullable String getEncoding()
      Get the default character encoding.
      Returns:
      the encoding name, or null for auto-detection