Package it.unimi.dsi.parser.callback
Class TextExtractor
java.lang.Object
it.unimi.dsi.parser.callback.DefaultCallback
it.unimi.dsi.parser.callback.TextExtractor
- All Implemented Interfaces:
Callback
public class TextExtractor extends DefaultCallback
-
Field Summary
Fields Modifier and Type Field Description MutableString
text
The text resulting from the parsing process.MutableString
title
The title resulting from the parsing process.Fields inherited from interface it.unimi.dsi.parser.callback.Callback
EMPTY_CALLBACK_ARRAY
-
Constructor Summary
Constructors Constructor Description TextExtractor()
-
Method Summary
Modifier and Type Method Description boolean
characters(char[] characters, int offset, int length, boolean flowBroken)
Receive notification of character data inside an element.void
configure(BulletParser parser)
Configure the parser to parse text.boolean
endElement(Element element)
Receive notification of the end of an element.void
startDocument()
Receive notification of the beginning of the document.boolean
startElement(Element element, Map<Attribute,MutableString> attrMapUnused)
Receive notification of the start of an element.Methods inherited from class it.unimi.dsi.parser.callback.DefaultCallback
cdata, endDocument, getInstance
-
Field Details
-
text
The text resulting from the parsing process. -
title
The title resulting from the parsing process.
-
-
Constructor Details
-
TextExtractor
public TextExtractor()
-
-
Method Details
-
configure
Configure the parser to parse text.- Specified by:
configure
in interfaceCallback
- Overrides:
configure
in classDefaultCallback
-
startDocument
public void startDocument()Description copied from interface:Callback
Receive notification of the beginning of the document.The callback must use this method to reset its internal state so that it can be resued. It must be safe to invoke this method several times.
- Specified by:
startDocument
in interfaceCallback
- Overrides:
startDocument
in classDefaultCallback
-
characters
public boolean characters(char[] characters, int offset, int length, boolean flowBroken)Description copied from interface:Callback
Receive notification of character data inside an element.You must not write into
text
, as it could be passed around to many callbacks.flowBroken
will be true iff the flow was broken beforetext
. This feature makes it possible to extract quickly the text in a document without looking at the elements.- Specified by:
characters
in interfaceCallback
- Overrides:
characters
in classDefaultCallback
- Parameters:
characters
- an array containing the character data.offset
- the start position in the array.length
- the number of characters to read from the array.flowBroken
- whether the flow is broken at the start oftext
.- Returns:
- true to keep the parser parsing, false to stop it.
-
endElement
Description copied from interface:Callback
Receive notification of the end of an element. Warning: unless specific decorators are used, in general a callback will just receive notifications for elements whose closing tag appears explicitly in the document.This method will never be called for element without closing tags, even if such a tag is found.
- Specified by:
endElement
in interfaceCallback
- Overrides:
endElement
in classDefaultCallback
- Parameters:
element
- the element whose closing tag was found.- Returns:
- true to keep the parser parsing, false to stop it.
-
startElement
Description copied from interface:Callback
Receive notification of the start of an element.For simple elements, this is the only notification that the callback will ever receive.
- Specified by:
startElement
in interfaceCallback
- Overrides:
startElement
in classDefaultCallback
- Parameters:
element
- the element whose opening tag was found.attrMapUnused
- a map fromAttribute
s toMutableString
s.- Returns:
- true to keep the parser parsing, false to stop it.
-