Package nl.vpro.util

Class TextUtil


  • public class TextUtil
    extends Object
    See JIRA
    Since:
    1.5
    Author:
    Roelof Jan Koekoek
    • Field Detail

      • ILLEGAL_PATTERN

        public static final Pattern ILLEGAL_PATTERN
        Reusable pattern for matching text against illegal characters
    • Method Detail

      • isValid

        public static boolean isValid​(@NonNull String input)
        Checks if given text input complies to POMS standard.
        See Also:
        for a rough check
      • normalizeWhiteSpace

        public static @PolyNull String normalizeWhiteSpace​(@PolyNull String input)
        Replaces any occurrences of 1 of more white space characters by one space.
      • normalizeWhiteSpacePreserveNewlines

        public static @PolyNull String normalizeWhiteSpacePreserveNewlines​(@PolyNull String input)
      • replaceLineBreaks

        public static @PolyNull String replaceLineBreaks​(@PolyNull String input)
        Replaces all line separators with a single white space character. The line separator character (
) is forbidden in most modern browsers. These browsers won't render any text containing this character.
      • replaceNonBreakingSpace

        public static @PolyNull String replaceNonBreakingSpace​(@PolyNull String input)
        Replaces all non breaking space characters ( ) with a normal white space character.
      • replaceOdd

        public static @PolyNull String replaceOdd​(@PolyNull String input)
        Replaces 'odd' characters with a normal white space character.
      • replaceHtmlEscapedNonBreakingSpace

        public static @PolyNull String replaceHtmlEscapedNonBreakingSpace​(@PolyNull String input)
        Replaces all non breaking space entities( ) with a normal white space character.
      • unescapeHtml

        public static @PolyNull String unescapeHtml​(@PolyNull String input)
        Un-escapes all html escape entities. For example: Replaces "&" with "&".
      • stripHtml

        public static @PolyNull String stripHtml​(@PolyNull String input)
        Strips html like tags from the input. All content between tags, even non-html content is being removed.
        Parameters:
        input - a piece of HTML or text containing some HTML markup
        Returns:
        One line representing only the textual content of the input
        See Also:
        for multiline interpretation
      • unhtml

        public static @PolyNull String unhtml​(@PolyNull String input)
        Parameters:
        input - A piece of HTML
        Returns:
        A piece of plain text, currently only supporting breaks, paragraphs, and lists. Empty paragraphs and multiple linebreaks are removed.
        Since:
        2.30
      • sanitize

        public static @PolyNull String sanitize​(@PolyNull String input)
        Aggressively removes all tags and escaped HTML characters from the given input and replaces some characters that might lead to problems for end users.
        Returns:
        A single line of text
      • getLexico

        public static @PolyNull String getLexico​(@PolyNull String title,
                                                 Locale locale)
        Returns the 'lexicographic' presentation of a title. This means that articles are stripped and moved to the end of the string. Currently only supported for dutch.
      • truncate

        public static @PolyNull String truncate​(@PolyNull String text,
                                                int max)
      • truncate

        public static @PolyNull String truncate​(@PolyNull String text,
                                                int max,
                                                boolean ellipses)
      • strikeThrough

        public static @PolyNull String strikeThrough​(@PolyNull CharSequence s)
        Gives a representation of the string which is completely 'stroke through' (using unicode control characters)
        Since:
        2.11
      • underLine

        public static @PolyNull String underLine​(@PolyNull CharSequence s)
        Gives a representation of the string which is completely 'underlined' (using unicode control characters)
        Since:
        2.11
      • underLineDouble

        public static @PolyNull String underLineDouble​(@PolyNull CharSequence s)
        Gives a representation of the string which is completely 'double underlined' (using unicode control characters)
        Since:
        2.11
      • overLine

        public static @PolyNull String overLine​(@PolyNull CharSequence s)
        Gives a representation of the string which is completely 'overlined' (using unicode control characters)
        Since:
        2.11
      • overLineDouble

        public static @PolyNull String overLineDouble​(@PolyNull CharSequence s)
        Gives a representation of the string which is completely 'double overlined' (using unicode control characters)
        Since:
        2.11
      • underDiaeresis

        public static @PolyNull String underDiaeresis​(@PolyNull CharSequence s)
        Gives a representation of the string which is completely 'diaeresised under' (using unicode control characters)
        Since:
        2.11