Class RecursiveParserWrapperHandler

    • Constructor Detail

      • RecursiveParserWrapperHandler

        public RecursiveParserWrapperHandler​(ContentHandlerFactory contentHandlerFactory)
        Create a handler with no limit on the number of embedded resources
      • RecursiveParserWrapperHandler

        public RecursiveParserWrapperHandler​(ContentHandlerFactory contentHandlerFactory,
                                             int maxEmbeddedResources)
        Create a handler that limits the number of embedded resources that will be parsed
        Parameters:
        maxEmbeddedResources - number of embedded resources that will be parsed
      • RecursiveParserWrapperHandler

        public RecursiveParserWrapperHandler​(ContentHandlerFactory contentHandlerFactory,
                                             int maxEmbeddedResources,
                                             int maxWriteLimit,
                                             org.apache.tika.metadata.filter.MetadataFilter metadataFilter)
    • Method Detail

      • startEmbeddedDocument

        public void startEmbeddedDocument​(org.xml.sax.ContentHandler contentHandler,
                                          Metadata metadata)
                                   throws org.xml.sax.SAXException
        This is called before parsing an embedded document
        Overrides:
        startEmbeddedDocument in class AbstractRecursiveParserWrapperHandler
        Parameters:
        contentHandler - - local content handler to use on the embedded document
        metadata - metadata to use for the embedded document
        Throws:
        org.xml.sax.SAXException
      • endEmbeddedDocument

        public void endEmbeddedDocument​(org.xml.sax.ContentHandler contentHandler,
                                        Metadata metadata)
                                 throws org.xml.sax.SAXException
        This is called after parsing an embedded document.
        Overrides:
        endEmbeddedDocument in class AbstractRecursiveParserWrapperHandler
        Parameters:
        contentHandler - local contenthandler used on the embedded document
        metadata - metadata from the embedded document
        Throws:
        org.xml.sax.SAXException
      • endDocument

        public void endDocument​(org.xml.sax.ContentHandler contentHandler,
                                Metadata metadata)
                         throws org.xml.sax.SAXException
        Description copied from class: AbstractRecursiveParserWrapperHandler
        This is called after the full parse has completed. Override this for custom behavior. Make sure to call this as super.endDocument(...) in subclasses because this adds whether or not the embedded resource maximum has been hit to the metadata.
        Overrides:
        endDocument in class AbstractRecursiveParserWrapperHandler
        Parameters:
        contentHandler - content handler used on the main document
        metadata - metadata from the main document
        Throws:
        org.xml.sax.SAXException
      • getMetadataList

        public java.util.List<Metadata> getMetadataList()
        Returns:
        a list of Metadata objects, one for the main document and one for each embedded document