gate.html
Class HtmlDocumentHandler

java.lang.Object
  extended by javax.swing.text.html.HTMLEditorKit.ParserCallback
      extended by gate.html.HtmlDocumentHandler

public class HtmlDocumentHandler
extends HTMLEditorKit.ParserCallback

Implements the behaviour of the HTML reader. Methods of an object of this class are called by the HTML parser when events will appear. The idea is to parse the HTML document and construct Gate annotations objects. This class also will replace the content of the Gate document with a new one containing anly text from the HTML document.


Field Summary
protected  long customObjectsId
           
protected  List myStatusListeners
           
 
Fields inherited from class javax.swing.text.html.HTMLEditorKit.ParserCallback
IMPLIED
 
Constructor Summary
HtmlDocumentHandler(Document aDocument, Map aMarkupElementsMap)
          Constructor initialises all the private memeber data.
HtmlDocumentHandler(Document aDocument, Map aMarkupElementsMap, AnnotationSet anAnnotationSet)
          Constructor initialises all the private memeber data
 
Method Summary
 void addRepositioningInfo(String content, int pos, int extractedPos)
          For given content the list with shrink position information is searched and on the corresponding positions the correct repositioning information is calculated and generated.
 void addStatusListener(StatusListener listener)
           
protected  void customizeAppearanceOfDocumentWithEndTag(HTML.Tag t)
          This method analizes the tag t and adds some \n chars and spaces to the tmpDocContent.The reason behind is that we need to have a readable form for the final document.
protected  void customizeAppearanceOfDocumentWithSimpleTag(HTML.Tag t)
          This method analizes the tag t and adds some \n chars and spaces to the tmpDocContent.The reason behind is that we need to have a readable form for the final document.
protected  void customizeAppearanceOfDocumentWithStartTag(HTML.Tag t)
          This method analizes the tag t and adds some \n chars and spaces to the tmpDocContent.The reason behind is that we need to have a readable form for the final document.
protected  void fireStatusChangedEvent(String text)
           
 void flush()
          This method is called once, when the HTML parser reaches the end of its input streamin order to notify the parserCallback that there is nothing more to parse.
 RepositioningInfo getAmpCodingInfo()
          Return current RepositioningInfo object for ampersand coding.
 RepositioningInfo getRepositioningInfo()
          Return current RepositioningInfo object
 void handleComment(char[] text, int pos)
          This method is called when the HTML parser encounts a comment
 void handleEndTag(HTML.Tag t, int pos)
          This method is called when the HTML parser encounts the end of a tag that means that the tag is paired by a beginning tag
 void handleError(String errorMsg, int pos)
          This method is called when the HTML parser encounts an error it depends on the programmer if he wants to deal with that error
 void handleSimpleTag(HTML.Tag t, MutableAttributeSet a, int pos)
          This method is called when the HTML parser encounts an empty tag
 void handleStartTag(HTML.Tag t, MutableAttributeSet a, int pos)
          This method is called when the HTML parser encounts the beginning of a tag that means that the tag is paired by an end tag and it's not an empty one.
 void handleText(char[] text, int pos)
          This method is called when the HTML parser encounts text (PCDATA)
 void removeStatusListener(StatusListener listener)
           
 void setAmpCodingInfo(RepositioningInfo info)
          Set repositioning information structure refference for ampersand coding.
 void setRepositioningInfo(RepositioningInfo info)
          Set repositioning information structure refference.
 
Methods inherited from class javax.swing.text.html.HTMLEditorKit.ParserCallback
handleEndOfLineString
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

myStatusListeners

protected List myStatusListeners

customObjectsId

protected long customObjectsId
Constructor Detail

HtmlDocumentHandler

public HtmlDocumentHandler(Document aDocument,
                           Map aMarkupElementsMap)
Constructor initialises all the private memeber data. This will use the default annotation set taken from the gate document.

Parameters:
aDocument - The gate document that will be processed
aMarkupElementsMap - The map containing the elements that will transform into annotations

HtmlDocumentHandler

public HtmlDocumentHandler(Document aDocument,
                           Map aMarkupElementsMap,
                           AnnotationSet anAnnotationSet)
Constructor initialises all the private memeber data

Parameters:
aDocument - The gate document that will be processed
aMarkupElementsMap - The map containing the elements that will transform into annotations
anAnnotationSet - The annotation set that will contain annotations resulted from the processing of the gate document
Method Detail

setRepositioningInfo

public void setRepositioningInfo(RepositioningInfo info)
Set repositioning information structure refference. If you set this refference to null information wouldn't be collected.


getRepositioningInfo

public RepositioningInfo getRepositioningInfo()
Return current RepositioningInfo object


setAmpCodingInfo

public void setAmpCodingInfo(RepositioningInfo info)
Set repositioning information structure refference for ampersand coding. If you set this refference to null information wouldn't be used.


getAmpCodingInfo

public RepositioningInfo getAmpCodingInfo()
Return current RepositioningInfo object for ampersand coding.


handleStartTag

public void handleStartTag(HTML.Tag t,
                           MutableAttributeSet a,
                           int pos)
This method is called when the HTML parser encounts the beginning of a tag that means that the tag is paired by an end tag and it's not an empty one.

Overrides:
handleStartTag in class HTMLEditorKit.ParserCallback

handleEndTag

public void handleEndTag(HTML.Tag t,
                         int pos)
This method is called when the HTML parser encounts the end of a tag that means that the tag is paired by a beginning tag

Overrides:
handleEndTag in class HTMLEditorKit.ParserCallback

handleSimpleTag

public void handleSimpleTag(HTML.Tag t,
                            MutableAttributeSet a,
                            int pos)
This method is called when the HTML parser encounts an empty tag

Overrides:
handleSimpleTag in class HTMLEditorKit.ParserCallback

handleText

public void handleText(char[] text,
                       int pos)
This method is called when the HTML parser encounts text (PCDATA)

Overrides:
handleText in class HTMLEditorKit.ParserCallback

addRepositioningInfo

public void addRepositioningInfo(String content,
                                 int pos,
                                 int extractedPos)
For given content the list with shrink position information is searched and on the corresponding positions the correct repositioning information is calculated and generated.


customizeAppearanceOfDocumentWithSimpleTag

protected void customizeAppearanceOfDocumentWithSimpleTag(HTML.Tag t)
This method analizes the tag t and adds some \n chars and spaces to the tmpDocContent.The reason behind is that we need to have a readable form for the final document. This method modifies the content of tmpDocContent.

Parameters:
t - the Html tag encounted by the HTML parser

customizeAppearanceOfDocumentWithStartTag

protected void customizeAppearanceOfDocumentWithStartTag(HTML.Tag t)
This method analizes the tag t and adds some \n chars and spaces to the tmpDocContent.The reason behind is that we need to have a readable form for the final document. This method modifies the content of tmpDocContent.

Parameters:
t - the Html tag encounted by the HTML parser

customizeAppearanceOfDocumentWithEndTag

protected void customizeAppearanceOfDocumentWithEndTag(HTML.Tag t)
This method analizes the tag t and adds some \n chars and spaces to the tmpDocContent.The reason behind is that we need to have a readable form for the final document. This method modifies the content of tmpDocContent.

Parameters:
t - the Html tag encounted by the HTML parser

handleError

public void handleError(String errorMsg,
                        int pos)
This method is called when the HTML parser encounts an error it depends on the programmer if he wants to deal with that error

Overrides:
handleError in class HTMLEditorKit.ParserCallback

flush

public void flush()
           throws BadLocationException
This method is called once, when the HTML parser reaches the end of its input streamin order to notify the parserCallback that there is nothing more to parse.

Overrides:
flush in class HTMLEditorKit.ParserCallback
Throws:
BadLocationException

handleComment

public void handleComment(char[] text,
                          int pos)
This method is called when the HTML parser encounts a comment

Overrides:
handleComment in class HTMLEditorKit.ParserCallback

addStatusListener

public void addStatusListener(StatusListener listener)

removeStatusListener

public void removeStatusListener(StatusListener listener)

fireStatusChangedEvent

protected void fireStatusChangedEvent(String text)