Interface TextTokenizerRegistry

  • All Known Implementing Classes:
    TextTokenizerRegistryImpl

    @API(EXPERIMENTAL)
    public interface TextTokenizerRegistry
    Registry for TextTokenizers. This registry allows for full-text indexes to specify their tokenizer type through an index option, using the ""textTokenizerName"" option. The registry will then be queried for the tokenizer that has that name at index- and query-time.

    Note that there are two ways of adding elements to the tokenizer registry. The first is to use the AutoService annotation to mark a TextTokenizerFactory implementation as one that should be loaded into the registry. The other is to call register() on this interface to register that tokenizer manually. This second way is useful for tokenizers that are built on the fly from configuration parameters, for example.

    • Method Detail

      • register

        void register​(@Nonnull
                      TextTokenizerFactory tokenizerFactory)
        Registers a new tokenizer in this registry. The tokenizer should have a different name from all tokenizers that are currently registered. This will throw an error if there is already a tokenizer present that is not pointer-equal to the tokenizerFactory parameter given.
        Parameters:
        tokenizerFactory - new tokenizer to register
        Throws:
        RecordCoreArgumentException - if there is a tokenizer of the same name already registered
      • reset

        void reset()
        Clears the registry and reloads tokenizers from the classpath. This is intended mainly for testing purposes (to avoid having one test add a tokenizer to the registry that another test cannot override).