Package nl.vpro.util

Class TextUtil

java.lang.Object
nl.vpro.util.TextUtil

public class TextUtil extends Object
See JIRA
Since:
1.5
Author:
Roelof Jan Koekoek
  • Field Details

    • ILLEGAL_PATTERN

      public static final Pattern ILLEGAL_PATTERN
      Reusable pattern for matching text against illegal characters
  • Method Details

    • isValid

      public static boolean isValid(@NonNull String input, boolean aggressive)
      Checks if given text input complies to POMS standard.
      See Also:
    • isValid

      public static boolean isValid(@NonNull String input)
      Checks if given text input complies to POMS standard.
      See Also:
    • normalizeWhiteSpace

      public static @PolyNull String normalizeWhiteSpace(@PolyNull String input)
      Replaces any occurrences of 1 of more white space characters by one space.
    • normalizeWhiteSpacePreserveNewlines

      public static @PolyNull String normalizeWhiteSpacePreserveNewlines(@PolyNull String input)
    • replaceLineBreaks

      public static @PolyNull String replaceLineBreaks(@PolyNull String input)
      Replaces all line separators with a single white space character. The line separator character (
) is forbidden in most modern browsers. These browsers won't render any text containing this character.
    • replaceNonBreakingSpace

      public static @PolyNull String replaceNonBreakingSpace(@PolyNull String input)
      Replaces all non-breaking space characters ( ) with a normal white space character.
    • replaceOdd

      public static @PolyNull String replaceOdd(@PolyNull String input)
      Replaces 'odd' characters with a normal white space character.
    • replaceHtmlEscapedNonBreakingSpace

      public static @PolyNull String replaceHtmlEscapedNonBreakingSpace(@PolyNull String input)
      Replaces all non-breaking space entities( ) with a normal white space character.
    • unescapeHtml

      public static @PolyNull String unescapeHtml(@PolyNull String input)
      Un-escapes all html escape entities. For example: Replaces "&" with "&".
    • stripHtml

      public static @PolyNull String stripHtml(@PolyNull String input)
      Strips html like tags from the input. All content between tags, even non-html content is being removed.
      Parameters:
      input - a piece of HTML or text containing some HTML markup
      Returns:
      One line representing only the textual content of the input
      See Also:
    • unhtml

      public static @PolyNull String unhtml(@PolyNull String input)
      Parameters:
      input - A piece of HTML
      Returns:
      A piece of plain text, currently only supporting breaks, paragraphs, and lists. Empty paragraphs and multiple linebreaks are removed.
      Since:
      2.30
    • sanitize

      public static @PolyNull String sanitize(@PolyNull String input)
      Aggressively removes all tags and escaped HTML characters from the given input and replaces some characters that might lead to problems for end users.
      Returns:
      A single line of text
    • getLexico

      public static @PolyNull String getLexico(@PolyNull String title, Locale locale)
      Returns the 'lexicographic' presentation of a title. This means that articles are stripped and moved to the end of the string. Currently only supported for dutch.
    • select

      @Deprecated public static String select(String... options)
      Deprecated.
      Can easily be achieved with stream filter Objects.nonNull(Object)
      Selects first non-null of the parameters.
    • truncate

      public static @PolyNull String truncate(@PolyNull String text, int max)
    • truncate

      public static @PolyNull String truncate(@PolyNull String text, int max, boolean ellipses)
    • strikeThrough

      public static @PolyNull String strikeThrough(@PolyNull CharSequence s)
      Gives a representation of the string which is completely 'stroke through' (using unicode control characters)
      Since:
      2.11
    • underLine

      public static @PolyNull String underLine(@PolyNull CharSequence s)
      Gives a representation of the string which is completely 'underlined' (using unicode control characters)
      Since:
      2.11
    • underLineDouble

      public static @PolyNull String underLineDouble(@PolyNull CharSequence s)
      Gives a representation of the string which is completely 'double underlined' (using unicode control characters)
      Since:
      2.11
    • overLine

      public static @PolyNull String overLine(@PolyNull CharSequence s)
      Gives a representation of the string which is completely 'overlined' (using unicode control characters)
      Since:
      2.11
    • overLineDouble

      public static @PolyNull String overLineDouble(@PolyNull CharSequence s)
      Gives a representation of the string which is completely 'double overlined' (using unicode control characters)
      Since:
      2.11
    • underDiaeresis

      public static @PolyNull String underDiaeresis(@PolyNull CharSequence s)
      Gives a representation of the string which is completely 'diaeresised under' (using unicode control characters)
      Since:
      2.11
    • controlEach

      public static @PolyNull String controlEach(@PolyNull CharSequence s, @NonNull Character control)
      Since:
      2.11