Package org.apache.tika.parser.microsoft
Class EMFParser
- java.lang.Object
-
- org.apache.tika.parser.microsoft.EMFParser
-
- All Implemented Interfaces:
Serializable,org.apache.tika.parser.Parser
public class EMFParser extends Object implements org.apache.tika.parser.Parser
Extracts files embedded in EMF and offers a very rough capability to extract text if there is text stored in the EMF. To improve text extraction, we'd have to implement quite a bit more at the POI level. We'd want to track changes in font and use that information for identifying character sets, inserting spaces and new lines. We're also relying on storage order for text order, which isn't great. We'd have to do something like what PDFBox or XPS do to sort the runs and then put the cow back together from the hamburger...lol...- See Also:
- Serialized Form
-
-
Field Summary
Fields Modifier and Type Field Description static org.apache.tika.metadata.PropertyEMF_ICON_ONLYstatic org.apache.tika.metadata.PropertyEMF_ICON_STRING
-
Constructor Summary
Constructors Constructor Description EMFParser()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description Set<org.apache.tika.mime.MediaType>getSupportedTypes(org.apache.tika.parser.ParseContext context)voidparse(InputStream stream, ContentHandler handler, org.apache.tika.metadata.Metadata metadata, org.apache.tika.parser.ParseContext context)
-
-
-
Method Detail
-
getSupportedTypes
public Set<org.apache.tika.mime.MediaType> getSupportedTypes(org.apache.tika.parser.ParseContext context)
- Specified by:
getSupportedTypesin interfaceorg.apache.tika.parser.Parser
-
parse
public void parse(InputStream stream, ContentHandler handler, org.apache.tika.metadata.Metadata metadata, org.apache.tika.parser.ParseContext context) throws IOException, SAXException, org.apache.tika.exception.TikaException
- Specified by:
parsein interfaceorg.apache.tika.parser.Parser- Throws:
IOExceptionSAXExceptionorg.apache.tika.exception.TikaException
-
-