Enclosing interface:: Algorithm

public static interface Algorithm.CostBenefit

Collection of metrics describing the cost and benefit of instantiating a particular Aggregate.

The cost metrics relate to the space and time investment required to add the aggregate to the system. The current cost metrics are (getRowCount(), getSpace() and getLoadTime()), and their interpretation is obvious.

The benefit metrics apply are the incremental benefit to the system for this aggregate existing. Generally benefits are all about reduced query time. Thus for aggregate #n, the benefit is the benefit having aggregates {1, ..., n - 1, n} compared to the benefit of having aggregates {1, ..., n - 1}. It is clear that the order of aggregates is important.

Benefit metrics also assume a particular query load. The query load may be drawn from past observed queries; or a theoretical load assuming, say, that all queries with N attributes are equally likely; or a mixture of the two. In any case, the query load is a theoretical model, because the actual queries can not be known in advance, and tends to be implicit from a particular choice of algorithm.

The sole benefit metric at this time is getSavedQueryRowCount().

Method Summary

Modifier and Type

Method

Description

void

describe(PrintWriter pw)

Describes this cost/benefit metric.

double

getLoadTime()

Returns an estimate of the number of seconds required to load this aggregate.

double

getRowCount()

Returns an estimate of the number of rows in this aggregate.

double

getSavedQueryRowCount()

Returns the number of rows that do not need to be read in a typical query because this aggregate exists.

double

getSpace()

Returns an estimate of the number of bytes required to store this aggregate on disk.

Method Details

getRowCount

double getRowCount()

Returns an estimate of the number of rows in this aggregate.

Returns:

estimated number of rows
getSpace

double getSpace()

Returns an estimate of the number of bytes required to store this aggregate on disk. This includes space for secondary structures such as indexes.

Returns:

estated number of bytes
getLoadTime

double getLoadTime()

Returns an estimate of the number of seconds required to load this aggregate.
This estimate is for a full load of an aggregate from empty; a related metric, not currently supported, would describe the effort required to incrementally maintain the aggregate during typical operation.

Returns:

estimated load time

getSavedQueryRowCount

double getSavedQueryRowCount()

Returns the number of rows that do not need to be read in a typical query because this aggregate exists.

Suppose that there are 6 possible queries, and only 2 of them could use this aggregate.

Cost/benefit for various queries
Query	Rows	Rows read without aggregate	Rows read with aggregate
Q1	100	100	100	0
Q2	200	200	40	160
Q3	300	300	300	0
Q4	1000	200	10	190
Q5	500	25	25	0

Queries Q1 and Q3 are not helped by this or any aggregate; their benefit is 0. Query Q5 is helped by a previous aggregate, but not further helped by this one; its benefit is 0. This aggregate helps reduce Q2 from 200 rows to 40 rows, giving a benefit of 160 rows. An aggregate has improved Q2 from 1000 rows to 200, and this aggregate further improves the row count to 10, giving a benefit of 190.

The expected benefit of this aggreate is the average benefit over all queries. For this example we assume that all queries are equally likely, so the expected benefit is (0 + 160 + 0 + 190 + 0)/ 5 = 70 rows per query.

Returns:: number of row reads saved by this aggregate

describe

void describe(PrintWriter pw)

Describes this cost/benefit metric.

Parameters:

pw - Print writer

Interface Algorithm.CostBenefit

Method Summary

Method Details

getRowCount

getSpace

getLoadTime

getSavedQueryRowCount

describe