gate
Interface DocumentContent

All Superinterfaces:
Serializable
All Known Implementing Classes:
DocumentContentImpl

public interface DocumentContent
extends Serializable

The content of Documents.


Method Summary
 DocumentContent getContent(Long start, Long end)
          Return the contents under a particular span.
 Long size()
          The size of this content (e.g. character length for textual content).
 

Method Detail

getContent

DocumentContent getContent(Long start,
                           Long end)
                           throws InvalidOffsetException
Return the contents under a particular span.

Conceptually the annotation offsets are defined as falling in between characters, with "0" pointing before the fist character. Because of that, the offsets where an annotation ends and the space after it starts are the same.

So this is what the "abcde" string looks like with the offsets explicitly included: 0a1b2c3d4e5

"ab cd" would then look like this: 0a1b2 3c4d5

with the following annotations:
Token "ab" [0,2]
SpaceToken " " [2,3]
Token "cd" [3,5]

Parameters:
start - the beginning index, inclusive.
end - the ending index, exclusive.
Returns:
the specified substring for the document.
Throws:
InvalidOffsetException - if the start is negative, or end is larger than the length of this DocumentContent object, or start is larger than end.

size

Long size()
The size of this content (e.g. character length for textual content).