gate.util
Class AnnotationDiffer

java.lang.Object
  extended by gate.util.AnnotationDiffer

public class AnnotationDiffer
extends Object

This class provides the logic used by the Annotation Diff tool. It starts with two collections of annotation objects, one of key annotations (representing the gold standard) and one of response annotations (representing the system's responses). It will then pair the keys and responses in a way that maximises the score. Each key - response pair gets a score of CORRECT_VALUE (2), PARTIALLY_CORRECT_VALUE (1) or WRONG_VALUE (0)depending on whether the two annotations match are overlapping or completely unmatched. Each pairing also has a type of CORRECT_TYPE, PARTIALLY_CORRECT_TYPE, SPURIOUS_TYPE or MISSING_TYPE further detailing the type of error for the wrong matches (missing being the keys that weren't matched to a response while spurious are the responses that were over-generated and are not matching any key. Precision, recall and f-measure are also calculated.


Nested Class Summary
static interface AnnotationDiffer.Pairing
          Interface representing a pairing between a key annotation and a response one.
 class AnnotationDiffer.PairingImpl
          Represents a pairing of a key annotation with a response annotation and the associated score for that pairing.
static class AnnotationDiffer.PairingOffsetComparator
          Compares two choices based on start offset of key (or response if key not present) and type if offsets are equal.
protected static class AnnotationDiffer.PairingScoreComparator
          Compares two pairings: the better score is preferred; for the same score the better type is preferred (exact matches are preffered to partial ones).
 
Field Summary
static int CORRECT_TYPE
          Type for correct pairings (when the key and response match completely)
 Set<Annotation> correctAnnotations
           
protected  int correctMatches
          The number of correct matches.
protected  List<AnnotationDiffer.Pairing> finalChoices
          A list with the choices selected for the best result.
protected  List<List<AnnotationDiffer.Pairing>> keyChoices
          A list of lists representing all possible choices for each key
protected  List<Annotation> keyList
          A list with all the key annotations
static int MISMATCH_TYPE
          Type for mismatched pairings (where the key and response are co-extensive but they don't match).
protected  int missing
          The number of missing matches.
static int MISSING_TYPE
          Type for missing pairings (where the key was not matched to a response).
 Set<Annotation> missingAnnotations
           
static int PARTIALLY_CORRECT_TYPE
          Type for partially correct pairings (when the key and response match in type and significant features but the spans are just overlapping and not identical.
 Set<Annotation> partiallyCorrectAnnotations
           
protected  int partiallyCorrectMatches
          The number of partially correct matches.
protected  List<AnnotationDiffer.Pairing> possibleChoices
          All the posible choices are added to this list for easy iteration.
protected  List<List<AnnotationDiffer.Pairing>> responseChoices
          A list of lists representing all possible choices for each response
protected  List<Annotation> responseList
          A list with all the response annotations
protected  int spurious
          The number of spurious matches.
static int SPURIOUS_TYPE
          Type for spurious pairings (where the response is not matching any key).
 Set<Annotation> spuriousAnnotations
           
 
Constructor Summary
AnnotationDiffer()
           
AnnotationDiffer(Collection<AnnotationDiffer> differs)
          Constructor to be used when you have a collection of AnnotationDiffer and want to consider it as only one AnnotationDiffer.
 
Method Summary
protected  void addPairing(AnnotationDiffer.Pairing pairing, int index, List<List<AnnotationDiffer.Pairing>> listOfPairings)
          Adds a new pairing to the internal data structures.
 List<AnnotationDiffer.Pairing> calculateDiff(Collection<Annotation> key, Collection<Annotation> response)
          Computes a diff between two collections of annotations.
 Set<Annotation> getAnnotationsOfType(int type)
          A method that returns specific type of annotations
 String getAnnotationType()
           
 int getCorrectMatches()
          Gets the number of correct matches.
 int getFalsePositivesLenient()
          Gets the number of responses that aren't either correct or partially correct.
 int getFalsePositivesStrict()
          Gets the number of pairings of type SPURIOUS_TYPE.
 double getFMeasureAverage(double beta)
          Gets the average of strict and lenient F-Measure values.
 double getFMeasureLenient(double beta)
          Gets the lenient F-Measure (F-Measure where the lenient precision and recall values are used) using the provided parameter as relative weight.
 double getFMeasureStrict(double beta)
          Gets the strict F-Measure (the harmonic weighted mean of the strict precision and the strict recall) using the provided parameter as relative weight.
 int getKeysCount()
          Gets the number of keys provided.
 List<String> getMeasuresRow(Object[] measures, String title)
           
 int getMissing()
          Gets the number of pairings of type MISSING_TYPE.
 int getPartiallyCorrectMatches()
          Gets the number of partially correct matches.
 double getPrecisionAverage()
          Gets the average of the strict and lenient precision values.
 double getPrecisionLenient()
          Gets the lenient precision (where the partial matches are considered as correct).
 double getPrecisionStrict()
          Gets the strict precision (the ratio of correct responses out of all the provided responses).
 double getRecallAverage()
          Gets the average of the strict and lenient recall values.
 double getRecallLenient()
          Gets the lenient recall (where the partial matches are considered as correct).
 double getRecallStrict()
          Gets the strict recall (the ratio of key matched to a response out of all the keys).
 int getResponsesCount()
          Gets the number of responses provided.
 Set<?> getSignificantFeaturesSet()
          Gets the set of features considered significant for the matching algorithm.
 int getSpurious()
          Gets the number of pairings of type SPURIOUS_TYPE.
 void printMissmatches()
          Prints to System.out the pairings that are not correct.
 void setSignificantFeaturesSet(Set<?> significantFeaturesSet)
          Set the set of features considered significant for the matching algorithm.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

correctAnnotations

public Set<Annotation> correctAnnotations

partiallyCorrectAnnotations

public Set<Annotation> partiallyCorrectAnnotations

missingAnnotations

public Set<Annotation> missingAnnotations

spuriousAnnotations

public Set<Annotation> spuriousAnnotations

CORRECT_TYPE

public static final int CORRECT_TYPE
Type for correct pairings (when the key and response match completely)

See Also:
Constant Field Values

PARTIALLY_CORRECT_TYPE

public static final int PARTIALLY_CORRECT_TYPE
Type for partially correct pairings (when the key and response match in type and significant features but the spans are just overlapping and not identical.

See Also:
Constant Field Values

MISSING_TYPE

public static final int MISSING_TYPE
Type for missing pairings (where the key was not matched to a response).

See Also:
Constant Field Values

SPURIOUS_TYPE

public static final int SPURIOUS_TYPE
Type for spurious pairings (where the response is not matching any key).

See Also:
Constant Field Values

MISMATCH_TYPE

public static final int MISMATCH_TYPE
Type for mismatched pairings (where the key and response are co-extensive but they don't match).

See Also:
Constant Field Values

correctMatches

protected int correctMatches
The number of correct matches.


partiallyCorrectMatches

protected int partiallyCorrectMatches
The number of partially correct matches.


missing

protected int missing
The number of missing matches.


spurious

protected int spurious
The number of spurious matches.


keyList

protected List<Annotation> keyList
A list with all the key annotations


responseList

protected List<Annotation> responseList
A list with all the response annotations


keyChoices

protected List<List<AnnotationDiffer.Pairing>> keyChoices
A list of lists representing all possible choices for each key


responseChoices

protected List<List<AnnotationDiffer.Pairing>> responseChoices
A list of lists representing all possible choices for each response


possibleChoices

protected List<AnnotationDiffer.Pairing> possibleChoices
All the posible choices are added to this list for easy iteration.


finalChoices

protected List<AnnotationDiffer.Pairing> finalChoices
A list with the choices selected for the best result.

Constructor Detail

AnnotationDiffer

public AnnotationDiffer(Collection<AnnotationDiffer> differs)
Constructor to be used when you have a collection of AnnotationDiffer and want to consider it as only one AnnotationDiffer. Then you can only use the methods getPrecision/Recall/FMeasure...().

Parameters:
differs - collection to be regrouped in one AnnotationDiffer

AnnotationDiffer

public AnnotationDiffer()
Method Detail

calculateDiff

public List<AnnotationDiffer.Pairing> calculateDiff(Collection<Annotation> key,
                                                    Collection<Annotation> response)
Computes a diff between two collections of annotations.

Parameters:
key - the collection of key annotations.
response - the collection of response annotations.
Returns:
a list of AnnotationDiffer.Pairing objects representing the pairing set that results in the best score.

getPrecisionStrict

public double getPrecisionStrict()
Gets the strict precision (the ratio of correct responses out of all the provided responses).

Returns:
a double value.

getRecallStrict

public double getRecallStrict()
Gets the strict recall (the ratio of key matched to a response out of all the keys).

Returns:
a double value.

getPrecisionLenient

public double getPrecisionLenient()
Gets the lenient precision (where the partial matches are considered as correct).

Returns:
a double value.

getPrecisionAverage

public double getPrecisionAverage()
Gets the average of the strict and lenient precision values.

Returns:
a double value.

getRecallLenient

public double getRecallLenient()
Gets the lenient recall (where the partial matches are considered as correct).

Returns:
a double value.

getRecallAverage

public double getRecallAverage()
Gets the average of the strict and lenient recall values.

Returns:
a double value.

getFMeasureStrict

public double getFMeasureStrict(double beta)
Gets the strict F-Measure (the harmonic weighted mean of the strict precision and the strict recall) using the provided parameter as relative weight.

Parameters:
beta - The relative weight of precision and recall. A value of 1 gives equal weights to precision and recall. A value of 0 takes the recall value completely out of the equation.
Returns:
a doublevalue.

getFMeasureLenient

public double getFMeasureLenient(double beta)
Gets the lenient F-Measure (F-Measure where the lenient precision and recall values are used) using the provided parameter as relative weight.

Parameters:
beta - The relative weight of precision and recall. A value of 1 gives equal weights to precision and recall. A value of 0 takes the recall value completely out of the equation.
Returns:
a doublevalue.

getFMeasureAverage

public double getFMeasureAverage(double beta)
Gets the average of strict and lenient F-Measure values.

Parameters:
beta - The relative weight of precision and recall. A value of 1 gives equal weights to precision and recall. A value of 0 takes the recall value completely out of the equation.
Returns:
a doublevalue.

getCorrectMatches

public int getCorrectMatches()
Gets the number of correct matches.

Returns:
an int value.

getPartiallyCorrectMatches

public int getPartiallyCorrectMatches()
Gets the number of partially correct matches.

Returns:
an int value.

getMissing

public int getMissing()
Gets the number of pairings of type MISSING_TYPE.

Returns:
an int value.

getSpurious

public int getSpurious()
Gets the number of pairings of type SPURIOUS_TYPE.

Returns:
an int value.

getFalsePositivesStrict

public int getFalsePositivesStrict()
Gets the number of pairings of type SPURIOUS_TYPE.

Returns:
an int value.

getFalsePositivesLenient

public int getFalsePositivesLenient()
Gets the number of responses that aren't either correct or partially correct.

Returns:
an int value.

getKeysCount

public int getKeysCount()
Gets the number of keys provided.

Returns:
an int value.

getResponsesCount

public int getResponsesCount()
Gets the number of responses provided.

Returns:
an int value.

printMissmatches

public void printMissmatches()
Prints to System.out the pairings that are not correct.


addPairing

protected void addPairing(AnnotationDiffer.Pairing pairing,
                          int index,
                          List<List<AnnotationDiffer.Pairing>> listOfPairings)
Adds a new pairing to the internal data structures.

Parameters:
pairing - the pairing to be added
index - the index in the list of pairings
listOfPairings - the list of AnnotationDiffer.Pairings where the pairing should be added

getSignificantFeaturesSet

public Set<?> getSignificantFeaturesSet()
Gets the set of features considered significant for the matching algorithm.

Returns:
a Set.

setSignificantFeaturesSet

public void setSignificantFeaturesSet(Set<?> significantFeaturesSet)
Set the set of features considered significant for the matching algorithm. A null value means that all features are significant, an empty set value means that no features are significant while a set of String values specifies that only features with names included in the set are significant.

Parameters:
significantFeaturesSet - a Set of String values or null.

getAnnotationsOfType

public Set<Annotation> getAnnotationsOfType(int type)
A method that returns specific type of annotations

Parameters:
type -
Returns:
a Set of Annotations.

getAnnotationType

public String getAnnotationType()
Returns:
annotation type for all the annotations

getMeasuresRow

public List<String> getMeasuresRow(Object[] measures,
                                   String title)