gate.corpora
Class DocumentContentImpl

java.lang.Object
  extended by gate.corpora.DocumentContentImpl
All Implemented Interfaces:
DocumentContent, Serializable

public class DocumentContentImpl
extends Object
implements DocumentContent

Represents the commonalities between all sorts of document contents.

See Also:
Serialized Form

Constructor Summary
DocumentContentImpl()
          Default construction
DocumentContentImpl(String s)
          For ranges
DocumentContentImpl(URL u, String encoding, Long start, Long end)
          Contruction from URL and offsets.
 
Method Summary
 boolean equals(Object other)
          Two documents are the same if their contents is the same
 DocumentContent getContent(Long start, Long end)
          Return the contents under a particular span.
 String getOriginalContent()
          Return the original content of the document received during the loading phase or on construction from string.
 int hashCode()
          Calculate the hash value for the object.
 Long size()
          The size of this content (e.g. character length for textual content).
 String toString()
          Returns the String representing the content in case of a textual document.
 
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
 

Constructor Detail

DocumentContentImpl

public DocumentContentImpl()
Default construction


DocumentContentImpl

public DocumentContentImpl(URL u,
                           String encoding,
                           Long start,
                           Long end)
                    throws IOException
Contruction from URL and offsets.

Throws:
IOException

DocumentContentImpl

public DocumentContentImpl(String s)
For ranges

Method Detail

getContent

public DocumentContent getContent(Long start,
                                  Long end)
                           throws InvalidOffsetException
Description copied from interface: DocumentContent
Return the contents under a particular span.

Conceptually the annotation offsets are defined as falling in between characters, with "0" pointing before the fist character. Because of that, the offsets where an annotation ends and the space after it starts are the same.

So this is what the "abcde" string looks like with the offsets explicitly included: 0a1b2c3d4e5

"ab cd" would then look like this: 0a1b2 3c4d5

with the following annotations:
Token "ab" [0,2]
SpaceToken " " [2,3]
Token "cd" [3,5]

Specified by:
getContent in interface DocumentContent
Parameters:
start - the beginning index, inclusive.
end - the ending index, exclusive.
Returns:
the specified substring for the document.
Throws:
InvalidOffsetException - if the start is negative, or end is larger than the length of this DocumentContent object, or start is larger than end.

toString

public String toString()
Returns the String representing the content in case of a textual document. NOTE: this is a temporary solution until we have a more generic one.

Overrides:
toString in class Object

size

public Long size()
The size of this content (e.g. character length for textual content).

Specified by:
size in interface DocumentContent

equals

public boolean equals(Object other)
Two documents are the same if their contents is the same

Overrides:
equals in class Object

hashCode

public int hashCode()
Calculate the hash value for the object.

Overrides:
hashCode in class Object

getOriginalContent

public String getOriginalContent()
Return the original content of the document received during the loading phase or on construction from string.