Package it.unimi.dsi.parser.callback
Class TextExtractor
java.lang.Object
it.unimi.dsi.parser.callback.DefaultCallback
it.unimi.dsi.parser.callback.TextExtractor
- All Implemented Interfaces:
Callback
public class TextExtractor extends DefaultCallback
-
Field Summary
Fields Modifier and Type Field Description MutableStringtextThe text resulting from the parsing process.MutableStringtitleThe title resulting from the parsing process. -
Constructor Summary
Constructors Constructor Description TextExtractor() -
Method Summary
Modifier and Type Method Description booleancharacters(char[] characters, int offset, int length, boolean flowBroken)Receive notification of character data inside an element.voidconfigure(BulletParser parser)Configure the parser to parse text.booleanendElement(Element element)Receive notification of the end of an element.voidstartDocument()Receive notification of the beginning of the document.booleanstartElement(Element element, Map<Attribute,MutableString> attrMapUnused)Receive notification of the start of an element.Methods inherited from class it.unimi.dsi.parser.callback.DefaultCallback
cdata, endDocument, getInstance
-
Field Details
-
Constructor Details
-
TextExtractor
public TextExtractor()
-
-
Method Details
-
configure
Configure the parser to parse text.- Specified by:
configurein interfaceCallback- Overrides:
configurein classDefaultCallback
-
startDocument
public void startDocument()Description copied from interface:CallbackReceive notification of the beginning of the document.The callback must use this method to reset its internal state so that it can be resued. It must be safe to invoke this method several times.
- Specified by:
startDocumentin interfaceCallback- Overrides:
startDocumentin classDefaultCallback
-
characters
public boolean characters(char[] characters, int offset, int length, boolean flowBroken)Description copied from interface:CallbackReceive notification of character data inside an element.You must not write into
text, as it could be passed around to many callbacks.flowBrokenwill be true iff the flow was broken beforetext. This feature makes it possible to extract quickly the text in a document without looking at the elements.- Specified by:
charactersin interfaceCallback- Overrides:
charactersin classDefaultCallback- Parameters:
characters- an array containing the character data.offset- the start position in the array.length- the number of characters to read from the array.flowBroken- whether the flow is broken at the start oftext.- Returns:
- true to keep the parser parsing, false to stop it.
-
endElement
Description copied from interface:CallbackReceive notification of the end of an element. Warning: unless specific decorators are used, in general a callback will just receive notifications for elements whose closing tag appears explicitly in the document.This method will never be called for element without closing tags, even if such a tag is found.
- Specified by:
endElementin interfaceCallback- Overrides:
endElementin classDefaultCallback- Parameters:
element- the element whose closing tag was found.- Returns:
- true to keep the parser parsing, false to stop it.
-
startElement
Description copied from interface:CallbackReceive notification of the start of an element.For simple elements, this is the only notification that the callback will ever receive.
- Specified by:
startElementin interfaceCallback- Overrides:
startElementin classDefaultCallback- Parameters:
element- the element whose opening tag was found.attrMapUnused- a map fromAttributes toMutableStrings.- Returns:
- true to keep the parser parsing, false to stop it.
-