Class EMFParser

java.lang.Object
org.apache.tika.parser.microsoft.EMFParser
All Implemented Interfaces:
Serializable, org.apache.tika.parser.Parser

public class EMFParser extends Object implements org.apache.tika.parser.Parser
Extracts files embedded in EMF and offers a very rough capability to extract text if there is text stored in the EMF.

To improve text extraction, we'd have to implement quite a bit more at the POI level. We'd want to track changes in font and use that information for identifying character sets, inserting spaces and new lines.

See Also:
  • Field Details

    • EMF_ICON_ONLY

      public static org.apache.tika.metadata.Property EMF_ICON_ONLY
    • EMF_ICON_STRING

      public static org.apache.tika.metadata.Property EMF_ICON_STRING
  • Constructor Details

    • EMFParser

      public EMFParser()
  • Method Details

    • getSupportedTypes

      public Set<org.apache.tika.mime.MediaType> getSupportedTypes(org.apache.tika.parser.ParseContext context)
      Specified by:
      getSupportedTypes in interface org.apache.tika.parser.Parser
    • parse

      public void parse(InputStream stream, ContentHandler handler, org.apache.tika.metadata.Metadata metadata, org.apache.tika.parser.ParseContext context) throws IOException, SAXException, org.apache.tika.exception.TikaException
      Specified by:
      parse in interface org.apache.tika.parser.Parser
      Throws:
      IOException
      SAXException
      org.apache.tika.exception.TikaException