Interface Algorithm.CostBenefit

Enclosing interface:
Algorithm

public static interface Algorithm.CostBenefit
Collection of metrics describing the cost and benefit of instantiating a particular Aggregate.

The cost metrics relate to the space and time investment required to add the aggregate to the system. The current cost metrics are (getRowCount(), getSpace() and getLoadTime()), and their interpretation is obvious.

The benefit metrics apply are the incremental benefit to the system for this aggregate existing. Generally benefits are all about reduced query time. Thus for aggregate #n, the benefit is the benefit having aggregates {1, ..., n - 1, n} compared to the benefit of having aggregates {1, ..., n - 1}. It is clear that the order of aggregates is important.

Benefit metrics also assume a particular query load. The query load may be drawn from past observed queries; or a theoretical load assuming, say, that all queries with N attributes are equally likely; or a mixture of the two. In any case, the query load is a theoretical model, because the actual queries can not be known in advance, and tends to be implicit from a particular choice of algorithm.

The sole benefit metric at this time is getSavedQueryRowCount().

  • Method Summary

    Modifier and Type
    Method
    Description
    void
    Describes this cost/benefit metric.
    double
    Returns an estimate of the number of seconds required to load this aggregate.
    double
    Returns an estimate of the number of rows in this aggregate.
    double
    Returns the number of rows that do not need to be read in a typical query because this aggregate exists.
    double
    Returns an estimate of the number of bytes required to store this aggregate on disk.
  • Method Details

    • getRowCount

      double getRowCount()
      Returns an estimate of the number of rows in this aggregate.
      Returns:
      estimated number of rows
    • getSpace

      double getSpace()
      Returns an estimate of the number of bytes required to store this aggregate on disk. This includes space for secondary structures such as indexes.
      Returns:
      estated number of bytes
    • getLoadTime

      double getLoadTime()
      Returns an estimate of the number of seconds required to load this aggregate.

      This estimate is for a full load of an aggregate from empty; a related metric, not currently supported, would describe the effort required to incrementally maintain the aggregate during typical operation.

      Returns:
      estimated load time
    • getSavedQueryRowCount

      double getSavedQueryRowCount()
      Returns the number of rows that do not need to be read in a typical query because this aggregate exists.

      Suppose that there are 6 possible queries, and only 2 of them could use this aggregate.

      Cost/benefit for various queries
      Query Rows Rows read without aggregate Rows read with aggregate Incremental Benefit
      Q1 100 100 100 0
      Q2 200 200 40 160
      Q3 300 300 300 0
      Q41000 200 10 190
      Q5 500 25 25 0

      Queries Q1 and Q3 are not helped by this or any aggregate; their benefit is 0. Query Q5 is helped by a previous aggregate, but not further helped by this one; its benefit is 0. This aggregate helps reduce Q2 from 200 rows to 40 rows, giving a benefit of 160 rows. An aggregate has improved Q2 from 1000 rows to 200, and this aggregate further improves the row count to 10, giving a benefit of 190.

      The expected benefit of this aggreate is the average benefit over all queries. For this example we assume that all queries are equally likely, so the expected benefit is (0 + 160 + 0 + 190 + 0)/ 5 = 70 rows per query.

      Returns:
      number of row reads saved by this aggregate
    • describe

      void describe(PrintWriter pw)
      Describes this cost/benefit metric.
      Parameters:
      pw - Print writer