eu.medsea.mimeutil
Class TextMimeDetector

java.lang.Object
  extended by eu.medsea.mimeutil.detector.MimeDetector
      extended by eu.medsea.mimeutil.TextMimeDetector

public final class TextMimeDetector
extends MimeDetector

This MimeDetector cannot be registered, unregistered or subclassed. It is a default MimeDetector that is pre-installed into the mime-util utility and is used as the FIRST MimeDetector.

You can influence this MimeDetector in several ways.

The TextMimeDetector.setPreferredEncodings(...) method is used to provide a preferred list of encodings. The final encoding for the MimeType will be the first one in this list that is also contained in the possible encodings returned from the EncodingGuesser class. If none of these match then the first entry in the possible encodings collection is used.

The EncodingGuesser.setSupportedEncodings(...) method is used to set the list of encodings that will be considered when trying to guess the encoding. If you provide encodings that are not supported by your JVM an error is logged and the next encoding is tried. If you set this to an empty Collection then you will effectively turn this MimeDetector OFF (the default). This is the recommended way to disable this MimeDetector. The most common usage scenario for this method is when your application is designed to support only a limited set of encodings such as UTF-8 and UTF-16 encoded text files. You can set the supported encodings list to this sub set of encodings and improve the performance of this MimeDetector greatly.

The TextMimeDetector.registerTextMimeHandler(...) method can be used to register special TextMimeHandler(s). These MimeHandler(s) are delegated to when once valid encodings have been found for the content contained in File, InputStream or byte []. The handlers can influence both the returned MimeType and encoding of any matched content. For instance, the default behavior is to return a MimeType of text/plain and encoding set according to the rules above. The Handler(s) allow you to further process the content and decide that it is in fact a text/xml or application/svg-xml or even mytype/mysubtype. You can also change the assigned encoding as it may be wrong for your new MimeType. For instance, if you decide the MimeType is really an XML file and not just a standard text/plain file and the detector calculated that the best encoding is UTF-8 but you detect and encoding attribute in the XML content for ISO-8859-1, you can set this as well thus returning a TextMimeType of application/xml with an encoding or ISO-8859-1 instead of a TextMimeType of text/plain and an encoding of UTF-8.

IMPORTANT: Your handler(s) will only get to see and act on content that this MimeDetector thinks is text in the first place. So if your restrictions on supported encodings will no longer detect a file as text then your handler(s) will never be called.

The methods will do their best to eliminate any binary files before trying to detect an encoding. However, if a binary file contains only a few bytes of data or you are very unlucky it could be mistakenly recognised as a text file and processed by this MimeDetector.

The Collection(s) returned from the methods in this class will contain either 0 or 1 MimeType entry of type TextMimeType with a mime type of "text/plain" or whatever matching registered TextMimeHandler(s) decide to return. You can test for matches from this MimeDetector by using the instanceof operator on the Collection of returned MimeType(s) to your code (remember, the returned Collection to you is the accumulated collection from ALL registered MimeDetectors. You can retrieve the encoding using the getEncoding() method of TextMimeType after casting the MimeType to a TextMimeType.

You should also remember that if this MimeDetector puts a TextMimeType into the eventual Collection of MimeType(s) returned to your code of say "text/plain" and one or more of the other registered MimeDetector(s) also add an instance of "text/plain" in accordance with their detection rules, the type will not be changed from TextMimeType to MimeType. Only the specificity value of the MimeType will be increased thus improving the likelihood that this MimeType will be returned from the MimeUtil.getMostSpecificMimeType(Collection mimeTypes) method.

Author:
Steven McArdle

Method Summary
 String getDescription()
          Abstract method to be implement by concrete MimeDetector(s).
 Collection getMimeTypesByteArray(byte[] data)
          Abstract method that must be implemented by concrete MimeDetector(s).
 Collection getMimeTypesFile(File file)
          We only want to deal with the stream for the file
 Collection getMimeTypesFileName(String fileName)
          This MimeDetector requires content so defer to the file method
 Collection getMimeTypesInputStream(InputStream in)
          Abstract method that must be implemented by concrete MimeDetector(s).
 Collection getMimeTypesURL(URL url)
          We only want to deal with the stream from the URL
static Collection getRegisteredTextMimeHandlers()
          Get the current Collection of registered TexMimeHandler(s)
static void registerTextMimeHandler(TextMimeHandler handler)
          Register a TexMimeHandler(s)
static void setPreferredEncodings(String[] encodings)
          Change the list of preferred encodings.
static void unregisterTextMimeHandler(TextMimeHandler handler)
          Unregister a TextMimeHandler
 
Methods inherited from class eu.medsea.mimeutil.detector.MimeDetector
closeStream, delete, getMimeTypes, getMimeTypes, getMimeTypes, getMimeTypes, getMimeTypes, getName, init
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Method Detail

getDescription

public String getDescription()
Description copied from class: MimeDetector
Abstract method to be implement by concrete MimeDetector(s).

Specified by:
getDescription in class MimeDetector
Returns:
description of this MimeDetector
See Also:
MimeDetector.getDescription()

getMimeTypesFileName

public Collection getMimeTypesFileName(String fileName)
                                throws UnsupportedOperationException
This MimeDetector requires content so defer to the file method

Specified by:
getMimeTypesFileName in class MimeDetector
Returns:
Collection of matched MimeType(s)
Throws:
UnsupportedOperationException

getMimeTypesURL

public Collection getMimeTypesURL(URL url)
                           throws UnsupportedOperationException
We only want to deal with the stream from the URL

Specified by:
getMimeTypesURL in class MimeDetector
Returns:
Collection of matched MimeType(s)
Throws:
UnsupportedOperationException
See Also:
MimeDetector.getMimeTypesURL(URL url)

getMimeTypesFile

public Collection getMimeTypesFile(File file)
                            throws UnsupportedOperationException
We only want to deal with the stream for the file

Specified by:
getMimeTypesFile in class MimeDetector
Returns:
Collection of matched MimeType(s)
Throws:
UnsupportedOperationException
See Also:
MimeDetector.getMimeTypesURL(URL url)

getMimeTypesInputStream

public Collection getMimeTypesInputStream(InputStream in)
                                   throws UnsupportedOperationException
Description copied from class: MimeDetector
Abstract method that must be implemented by concrete MimeDetector(s). This takes an InputStream object and is called by the MimeUtil getMimeTypes(URL url), getMimeTypes(File file) and getMimeTypes(InputStream in) methods. If your MimeDetector does not handle InputStream objects then either throw an UnsupportedOperationException or return an empty collection.

If the InputStream passed in does not support the mark() and reset() methods a MimeException will be thrown before reaching this point. The implementation is responsible for the actual use of the mark() and reset() methods as the amount of data to retrieve from the stream is implementation and even call by call dependent. If you do not use the mark() and reset() methods on the Stream then the position in the Stream will have moved on when this method returns and the next MimeDetector that handles the stream will either fail or be incorrect.

To allow the reuse of the Stream in other parts of your code and by further MimeDetector(s) in a way that it is unaware of any data read via this method i.e. the Stream position will be returned to where it was when this method was called, it is IMPORTANT to utilise the mark() and reset() methods within your implementing method.

Specified by:
getMimeTypesInputStream in class MimeDetector
Parameters:
in - InputStream.
Returns:
Collection of matched MimeType(s)
Throws:
UnsupportedOperationException
See Also:
MimeDetector.getMimeTypesInputStream(InputStream in)

getMimeTypesByteArray

public Collection getMimeTypesByteArray(byte[] data)
                                 throws UnsupportedOperationException
Description copied from class: MimeDetector
Abstract method that must be implemented by concrete MimeDetector(s). This takes a byte [] object and is called by the MimeUtil getMimeTypes(byte []) method. If your MimeDetector does not handle byte [] objects then either throw an UnsupportedOperationException or return an empty collection.

Specified by:
getMimeTypesByteArray in class MimeDetector
Parameters:
data - byte []. Is a byte array that you want to parse for matching mime types.
Returns:
Collection of matched MimeType(s)
Throws:
UnsupportedOperationException
See Also:
MimeDetector.getMimeTypesByteArray(byte [] data)

setPreferredEncodings

public static void setPreferredEncodings(String[] encodings)
Change the list of preferred encodings. This list is used where multiple possible encodings are identified to refer to the contents in a byte array passed in or read in from a Stream or File object. This list is iterated over in order and the first match is set as the encoding for the text/plain TextMimeType ONLY if the JVM default encoding is not in the list. If the neither the defaultEncoding or any of these preferred encodings are in the list of possible encodings then the first possible encoding will be used.

Parameters:
encodings - String array of canonical encoding names.

registerTextMimeHandler

public static void registerTextMimeHandler(TextMimeHandler handler)
Register a TexMimeHandler(s)

Parameters:
handler - to register

unregisterTextMimeHandler

public static void unregisterTextMimeHandler(TextMimeHandler handler)
Unregister a TextMimeHandler

Parameters:
handler - to unregister

getRegisteredTextMimeHandlers

public static Collection getRegisteredTextMimeHandlers()
Get the current Collection of registered TexMimeHandler(s)

Returns:
currently registered collection of TextMimeHandler(s)


Copyright © 2007-2009 Medsea Business Solutions S.L.. All Rights Reserved.