weka.associations
Class Apriori

java.lang.Object
  extended by weka.associations.AbstractAssociator
      extended by weka.associations.Apriori
All Implemented Interfaces:
Serializable, Cloneable, AssociationRulesProducer, Associator, CARuleMiner, CapabilitiesHandler, OptionHandler, RevisionHandler, TechnicalInformationHandler

public class Apriori
extends AbstractAssociator
implements OptionHandler, AssociationRulesProducer, CARuleMiner, TechnicalInformationHandler

Class implementing an Apriori-type algorithm. Iteratively reduces the minimum support until it finds the required number of rules with the given minimum confidence.
The algorithm has an option to mine class association rules. It is adapted as explained in the second reference.

For more information see:

R. Agrawal, R. Srikant: Fast Algorithms for Mining Association Rules in Large Databases. In: 20th International Conference on Very Large Data Bases, 478-499, 1994.

Bing Liu, Wynne Hsu, Yiming Ma: Integrating Classification and Association Rule Mining. In: Fourth International Conference on Knowledge Discovery and Data Mining, 80-86, 1998.

BibTeX:

 @inproceedings{Agrawal1994,
    author = {R. Agrawal and R. Srikant},
    booktitle = {20th International Conference on Very Large Data Bases},
    pages = {478-499},
    publisher = {Morgan Kaufmann, Los Altos, CA},
    title = {Fast Algorithms for Mining Association Rules in Large Databases},
    year = {1994}
 }
 
 @inproceedings{Liu1998,
    author = {Bing Liu and Wynne Hsu and Yiming Ma},
    booktitle = {Fourth International Conference on Knowledge Discovery and Data Mining},
    pages = {80-86},
    publisher = {AAAI Press},
    title = {Integrating Classification and Association Rule Mining},
    year = {1998}
 }
 

Valid options are:

 -N <required number of rules output>
  The required number of rules. (default = 10)
 -T <0=confidence | 1=lift | 2=leverage | 3=Conviction>
  The metric type by which to rank rules. (default = confidence)
 -C <minimum metric score of a rule>
  The minimum confidence of a rule. (default = 0.9)
 -D <delta for minimum support>
  The delta by which the minimum support is decreased in
  each iteration. (default = 0.05)
 -U <upper bound for minimum support>
  Upper bound for minimum support. (default = 1.0)
 -M <lower bound for minimum support>
  The lower bound for the minimum support. (default = 0.1)
 -S <significance level>
  If used, rules are tested for significance at
  the given level. Slower. (default = no significance testing)
 -I
  If set the itemsets found are also output. (default = no)
 -R
  Remove columns that contain all missing values (default = no)
 -V
  Report progress iteratively. (default = no)
 -A
  If set class association rules are mined. (default = no)
 -Z
  Treat zero (i.e. first value of nominal attributes) as missing
 -B <toString delimiters>
  If used, two characters to use as rule delimiters
  in the result of toString: the first to delimit fields,
  the second to delimit items within fields.
  (default = traditional toString result)
 -c <the class index>
  The class index. (default = last)

Version:
$Revision: 9469 $
Author:
Eibe Frank ([email protected]), Mark Hall ([email protected]), Stefan Mutter ([email protected])
See Also:
Serialized Form

Field Summary
static Tag[] TAGS_SELECTION
          Metric types.
 
Constructor Summary
Apriori()
          Constructor that allows to sets default values for the minimum confidence and the maximum number of rules the minimum confidence.
 
Method Summary
 void buildAssociations(Instances instances)
          Method that generates all large itemsets with a minimum support, and from these all association rules with a minimum confidence.
 boolean canProduceRules()
          Returns true if this AssociationRulesProducer can actually produce rules.
 String carTipText()
          Returns the tip text for this property
 String classIndexTipText()
          Returns the tip text for this property
 String deltaTipText()
          Returns the tip text for this property
 FastVector[] getAllTheRules()
          returns all the rules
 AssociationRules getAssociationRules()
          Gets the list of mined association rules.
 Capabilities getCapabilities()
          Returns default capabilities of the classifier.
 boolean getCar()
          Gets whether class association ruels are mined
 int getClassIndex()
          Gets the class index
 double getDelta()
          Get the value of delta.
 Instances getInstancesNoClass()
          Gets the instances without the class atrribute.
 Instances getInstancesOnlyClass()
          Gets only the class attribute of the instances.
 double getLowerBoundMinSupport()
          Get the value of lowerBoundMinSupport.
 SelectedTag getMetricType()
          Get the metric type
 double getMinMetric()
          Get the value of minConfidence.
 int getNumRules()
          Get the value of numRules.
 String[] getOptions()
          Gets the current settings of the Apriori object.
 boolean getOutputItemSets()
          Gets whether itemsets are output as well
 boolean getRemoveAllMissingCols()
          Returns whether columns containing all missing values are to be removed
 String getRevision()
          Returns the revision string.
 String[] getRuleMetricNames()
          Gets a list of the names of the metrics output for each rule.
 double getSignificanceLevel()
          Get the value of significanceLevel.
 TechnicalInformation getTechnicalInformation()
          Returns an instance of a TechnicalInformation object, containing detailed information about the technical background of this class, e.g., paper reference or book this class is based on.
 boolean getTreatZeroAsMissing()
          Gets whether zeros (i.e.
 double getUpperBoundMinSupport()
          Get the value of upperBoundMinSupport.
 boolean getVerbose()
          Gets whether algorithm is run in verbose mode
 String globalInfo()
          Returns a string describing this associator
 Enumeration listOptions()
          Returns an enumeration describing the available options.
 String lowerBoundMinSupportTipText()
          Returns the tip text for this property
static void main(String[] args)
          Main method.
 String metricString()
          Returns the metric string for the chosen metric type
 String metricTypeTipText()
          Returns the tip text for this property
 FastVector[] mineCARs(Instances data)
          Method that mines all class association rules with minimum support and with a minimum confidence.
 String minMetricTipText()
          Returns the tip text for this property
 String numRulesTipText()
          Returns the tip text for this property
 String outputItemSetsTipText()
          Returns the tip text for this property
 String removeAllMissingColsTipText()
          Returns the tip text for this property
 void resetOptions()
          Resets the options to the default values.
 void setCar(boolean flag)
          Sets class association rule mining
 void setClassIndex(int index)
          Sets the class index
 void setDelta(double v)
          Set the value of delta.
 void setLowerBoundMinSupport(double v)
          Set the value of lowerBoundMinSupport.
 void setMetricType(SelectedTag d)
          Set the metric type for ranking rules
 void setMinMetric(double v)
          Set the value of minConfidence.
 void setNumRules(int v)
          Set the value of numRules.
 void setOptions(String[] options)
          Parses a given list of options.
 void setOutputItemSets(boolean flag)
          Sets whether itemsets are output as well
 void setRemoveAllMissingCols(boolean r)
          Remove columns containing all missing values.
 void setSignificanceLevel(double v)
          Set the value of significanceLevel.
 void setTreatZeroAsMissing(boolean z)
          Sets whether zeros (i.e.
 void setUpperBoundMinSupport(double v)
          Set the value of upperBoundMinSupport.
 void setVerbose(boolean flag)
          Sets verbose mode
 String significanceLevelTipText()
          Returns the tip text for this property
 String toString()
          Outputs the size of all the generated sets of itemsets and the rules.
 String treatZeroAsMissingTipText()
          Returns the tip text for this property
 String upperBoundMinSupportTipText()
          Returns the tip text for this property
 String verboseTipText()
          Returns the tip text for this property
 
Methods inherited from class weka.associations.AbstractAssociator
forName, makeCopies, makeCopy, runAssociator
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

TAGS_SELECTION

public static final Tag[] TAGS_SELECTION
Metric types.

Constructor Detail

Apriori

public Apriori()
Constructor that allows to sets default values for the minimum confidence and the maximum number of rules the minimum confidence.

Method Detail

globalInfo

public String globalInfo()
Returns a string describing this associator

Returns:
a description of the evaluator suitable for displaying in the explorer/experimenter gui

getTechnicalInformation

public TechnicalInformation getTechnicalInformation()
Returns an instance of a TechnicalInformation object, containing detailed information about the technical background of this class, e.g., paper reference or book this class is based on.

Specified by:
getTechnicalInformation in interface TechnicalInformationHandler
Returns:
the technical information about this class

resetOptions

public void resetOptions()
Resets the options to the default values.


getCapabilities

public Capabilities getCapabilities()
Returns default capabilities of the classifier.

Specified by:
getCapabilities in interface Associator
Specified by:
getCapabilities in interface CapabilitiesHandler
Overrides:
getCapabilities in class AbstractAssociator
Returns:
the capabilities of this classifier
See Also:
Capabilities

buildAssociations

public void buildAssociations(Instances instances)
                       throws Exception
Method that generates all large itemsets with a minimum support, and from these all association rules with a minimum confidence.

Specified by:
buildAssociations in interface Associator
Parameters:
instances - the instances to be used for generating the associations
Throws:
Exception - if rules can't be built successfully

mineCARs

public FastVector[] mineCARs(Instances data)
                      throws Exception
Method that mines all class association rules with minimum support and with a minimum confidence.

Specified by:
mineCARs in interface CARuleMiner
Parameters:
data - the instances for which class association rules should be mined
Returns:
an sorted array of FastVector (confidence depended) containing the rules and metric information
Throws:
Exception - if rules can't be built successfully

getInstancesNoClass

public Instances getInstancesNoClass()
Gets the instances without the class atrribute.

Specified by:
getInstancesNoClass in interface CARuleMiner
Returns:
the instances without the class attribute.

getInstancesOnlyClass

public Instances getInstancesOnlyClass()
Gets only the class attribute of the instances.

Specified by:
getInstancesOnlyClass in interface CARuleMiner
Returns:
the class attribute of all instances.

listOptions

public Enumeration listOptions()
Returns an enumeration describing the available options.

Specified by:
listOptions in interface OptionHandler
Returns:
an enumeration of all the available options.

setOptions

public void setOptions(String[] options)
                throws Exception
Parses a given list of options.

Valid options are:

 -N <required number of rules output>
  The required number of rules. (default = 10)
 -T <0=confidence | 1=lift | 2=leverage | 3=Conviction>
  The metric type by which to rank rules. (default = confidence)
 -C <minimum metric score of a rule>
  The minimum confidence of a rule. (default = 0.9)
 -D <delta for minimum support>
  The delta by which the minimum support is decreased in
  each iteration. (default = 0.05)
 -U <upper bound for minimum support>
  Upper bound for minimum support. (default = 1.0)
 -M <lower bound for minimum support>
  The lower bound for the minimum support. (default = 0.1)
 -S <significance level>
  If used, rules are tested for significance at
  the given level. Slower. (default = no significance testing)
 -I
  If set the itemsets found are also output. (default = no)
 -R
  Remove columns that contain all missing values (default = no)
 -V
  Report progress iteratively. (default = no)
 -A
  If set class association rules are mined. (default = no)
 -Z
  Treat zero (i.e. first value of nominal attributes) as missing
 -B <toString delimiters>
  If used, two characters to use as rule delimiters
  in the result of toString: the first to delimit fields,
  the second to delimit items within fields.
  (default = traditional toString result)
 -c <the class index>
  The class index. (default = last)

Specified by:
setOptions in interface OptionHandler
Parameters:
options - the list of options as an array of strings
Throws:
Exception - if an option is not supported

getOptions

public String[] getOptions()
Gets the current settings of the Apriori object.

Specified by:
getOptions in interface OptionHandler
Returns:
an array of strings suitable for passing to setOptions

toString

public String toString()
Outputs the size of all the generated sets of itemsets and the rules.

Overrides:
toString in class Object
Returns:
a string representation of the model

metricString

public String metricString()
Returns the metric string for the chosen metric type

Specified by:
metricString in interface CARuleMiner
Returns:
a string describing the used metric for the interestingness of a class association rule

removeAllMissingColsTipText

public String removeAllMissingColsTipText()
Returns the tip text for this property

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

setRemoveAllMissingCols

public void setRemoveAllMissingCols(boolean r)
Remove columns containing all missing values.

Parameters:
r - true if cols are to be removed.

getRemoveAllMissingCols

public boolean getRemoveAllMissingCols()
Returns whether columns containing all missing values are to be removed

Returns:
true if columns are to be removed.

upperBoundMinSupportTipText

public String upperBoundMinSupportTipText()
Returns the tip text for this property

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

getUpperBoundMinSupport

public double getUpperBoundMinSupport()
Get the value of upperBoundMinSupport.

Returns:
Value of upperBoundMinSupport.

setUpperBoundMinSupport

public void setUpperBoundMinSupport(double v)
Set the value of upperBoundMinSupport.

Parameters:
v - Value to assign to upperBoundMinSupport.

setClassIndex

public void setClassIndex(int index)
Sets the class index

Specified by:
setClassIndex in interface CARuleMiner
Parameters:
index - the class index

getClassIndex

public int getClassIndex()
Gets the class index

Returns:
the index of the class attribute

classIndexTipText

public String classIndexTipText()
Returns the tip text for this property

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

setCar

public void setCar(boolean flag)
Sets class association rule mining

Parameters:
flag - if class association rules are mined, false otherwise

getCar

public boolean getCar()
Gets whether class association ruels are mined

Returns:
true if class association rules are mined, false otherwise

carTipText

public String carTipText()
Returns the tip text for this property

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

lowerBoundMinSupportTipText

public String lowerBoundMinSupportTipText()
Returns the tip text for this property

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

getLowerBoundMinSupport

public double getLowerBoundMinSupport()
Get the value of lowerBoundMinSupport.

Returns:
Value of lowerBoundMinSupport.

setLowerBoundMinSupport

public void setLowerBoundMinSupport(double v)
Set the value of lowerBoundMinSupport.

Parameters:
v - Value to assign to lowerBoundMinSupport.

getMetricType

public SelectedTag getMetricType()
Get the metric type

Returns:
the type of metric to use for ranking rules

metricTypeTipText

public String metricTypeTipText()
Returns the tip text for this property

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

setMetricType

public void setMetricType(SelectedTag d)
Set the metric type for ranking rules

Parameters:
d - the type of metric

minMetricTipText

public String minMetricTipText()
Returns the tip text for this property

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

getMinMetric

public double getMinMetric()
Get the value of minConfidence.

Returns:
Value of minConfidence.

setMinMetric

public void setMinMetric(double v)
Set the value of minConfidence.

Parameters:
v - Value to assign to minConfidence.

numRulesTipText

public String numRulesTipText()
Returns the tip text for this property

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

getNumRules

public int getNumRules()
Get the value of numRules.

Returns:
Value of numRules.

setNumRules

public void setNumRules(int v)
Set the value of numRules.

Parameters:
v - Value to assign to numRules.

deltaTipText

public String deltaTipText()
Returns the tip text for this property

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

getDelta

public double getDelta()
Get the value of delta.

Returns:
Value of delta.

setDelta

public void setDelta(double v)
Set the value of delta.

Parameters:
v - Value to assign to delta.

significanceLevelTipText

public String significanceLevelTipText()
Returns the tip text for this property

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

getSignificanceLevel

public double getSignificanceLevel()
Get the value of significanceLevel.

Returns:
Value of significanceLevel.

setSignificanceLevel

public void setSignificanceLevel(double v)
Set the value of significanceLevel.

Parameters:
v - Value to assign to significanceLevel.

setOutputItemSets

public void setOutputItemSets(boolean flag)
Sets whether itemsets are output as well

Parameters:
flag - true if itemsets are to be output as well

getOutputItemSets

public boolean getOutputItemSets()
Gets whether itemsets are output as well

Returns:
true if itemsets are output as well

outputItemSetsTipText

public String outputItemSetsTipText()
Returns the tip text for this property

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

setVerbose

public void setVerbose(boolean flag)
Sets verbose mode

Parameters:
flag - true if algorithm should be run in verbose mode

getVerbose

public boolean getVerbose()
Gets whether algorithm is run in verbose mode

Returns:
true if algorithm is run in verbose mode

verboseTipText

public String verboseTipText()
Returns the tip text for this property

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

treatZeroAsMissingTipText

public String treatZeroAsMissingTipText()
Returns the tip text for this property

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

setTreatZeroAsMissing

public void setTreatZeroAsMissing(boolean z)
Sets whether zeros (i.e. the first value of a nominal attribute) should be treated as missing values.

Parameters:
z - true if zeros should be treated as missing values.

getTreatZeroAsMissing

public boolean getTreatZeroAsMissing()
Gets whether zeros (i.e. the first value of a nominal attribute) is to be treated int he same way as missing values.

Returns:
true if zeros are to be treated like missing values.

getAllTheRules

public FastVector[] getAllTheRules()
returns all the rules

Returns:
all the rules
See Also:
m_allTheRules

getAssociationRules

public AssociationRules getAssociationRules()
Description copied from interface: AssociationRulesProducer
Gets the list of mined association rules.

Specified by:
getAssociationRules in interface AssociationRulesProducer
Returns:
the list of association rules discovered during mining. Returns null if mining hasn't been performed yet.

getRuleMetricNames

public String[] getRuleMetricNames()
Gets a list of the names of the metrics output for each rule. This list should be the same (in terms of the names and order thereof) as that produced by AssociationRule.getMetricNamesForRule().

Specified by:
getRuleMetricNames in interface AssociationRulesProducer
Returns:
an array of the names of the metrics available for each rule learned by this producer.

canProduceRules

public boolean canProduceRules()
Returns true if this AssociationRulesProducer can actually produce rules. Most implementing classes will always return true from this method (obviously :-)). However, an implementing class that actually acts as a wrapper around things that may or may not implement AssociationRulesProducer will want to return false if the thing they wrap can't produce rules.

Specified by:
canProduceRules in interface AssociationRulesProducer
Returns:
true if this producer can produce rules in its current configuration

getRevision

public String getRevision()
Returns the revision string.

Specified by:
getRevision in interface RevisionHandler
Overrides:
getRevision in class AbstractAssociator
Returns:
the revision

main

public static void main(String[] args)
Main method.

Parameters:
args - the commandline options


Copyright © 2013 University of Waikato, Hamilton, NZ. All Rights Reserved.