TimeseriesTable (CDAP API 6.5.0 API)

java.lang.Object
- io.cdap.cdap.api.dataset.lib.AbstractDataset
- - io.cdap.cdap.api.dataset.lib.TimeseriesTable

All Implemented Interfaces:

BatchReadable<byte[],TimeseriesTable.Entry>, BatchWritable<byte[],TimeseriesTable.Entry>, Dataset, MeteredDataset, Closeable, AutoCloseable, org.apache.tephra.TransactionAware
```
public class TimeseriesTable
extends AbstractDataset
implements BatchReadable<byte[],TimeseriesTable.Entry>, BatchWritable<byte[],TimeseriesTable.Entry>
```
Defines a Dataset implementation for managing time series data. This class offers simple ways to process read operations for time ranges.
This Dataset works by partitioning time into bins representing time intervals. Entries added to the Dataset are added to a bin based on their timestamp and row key. Hence, every row in the underlying table contains entries that share the same time interval and row key. Data for each entry is stored in separate columns.

A user can set the time interval length for partitioning data into rows (as defined by timeIntervalToStorePerRow in the DatasetSpecification properties). This interval should be chosen according to the use case at hand. In general, larger time interval sizes mean faster reading of small-to-medium time ranges (range size up to several time intervals) of entries data, while having slower reading of very small time ranges of entries data (range size a small portion of the time interval). Using a larger time interval also helps with faster batched writing of entries.

Vice versa, setting smaller time intervals provides faster reading of very small time ranges of entries data, but has slower batched writing of entries.

As expected, a larger time interval means that more data will be stored per row. A user should generally avoid storing more than 50 megabytes of data per row, since it affects performance.

The default value for time interval length is one hour and is generally suggested for users to use a value of between one minute and several hours. In cases where the amount of written entries is small, the rule of thumb is:

row partition interval size = 5 * (average size of the time range to be read)

TimeseriesTable supports tagging, where each entry is (optionally) labeled with a set of tags used for filtering of items during data retrievals. For an entry to be retrievable using a given tag, the tag must be provided when the entry was written. If multiple tags are provided during reading, an entry must contain every one of these tags in order to qualify for return.

Due to the data format used for storing, filtering by tags during reading is done on client-side (not on a cluster). At the same time, filtering by entry keys happens on the server side, which is much more efficient performance-wise. Depending on the use-case you may want to push some of the tags you would use into the entry key for faster reading.

Notes on implementation:
1. This implementation does NOT address the RegionServer hot-spotting issue that appears when writing rows with monotonically increasing/decreasing keys into HBase. This point is relevant for HBase-backed data stores. To avoid this problem, a user should not write all data under the same metric key. In general, writes will be as distributed as the number of different metric keys the data is written for. Having a single metric key would mean hitting a single RegionServer at any given point of time with all writes; this is generally not desirable.
2. The current implementation (including the format of the stored data) is heavily affected by the Table API which is used "under-the-hood". In particular the implementation is constrained by the absence of a readHigherOrEq() method in the Table API, which would return the next row with key greater or equals to the given.
3. The client code should not rely on the implementation details as they may be changed without notice.
See Also:

CounterTimeseriesTable

Nested Class Summary

Nested Classes
Modifier and Type	Class and Description
`static class`	`TimeseriesTable.Entry` Time series table entry.
`static class`	`TimeseriesTable.InputSplit` A method for using a Dataset as input for a MapReduce job.
`class`	`TimeseriesTable.TimeseriesTableRecordsReader` A record reader for time series.

Field Summary

Fields
Modifier and Type	Field and Description
`static String`	`ATTR_TIME_INTERVAL_TO_STORE_PER_ROW`
`static long`	`DEFAULT_TIME_INTERVAL_PER_ROW` See `TimeseriesTable` javadoc for description.
`static int`	`MAX_ROWS_TO_SCAN_PER_READ` Limit on the number of rows to scan per read.
`protected Table`	`table`
`static String`	`TYPE` Type name

Constructor Summary

Constructors
Constructor and Description

TimeseriesTable(DatasetSpecification spec, Table table)
Creates an instance of the table.

Constructors
Constructor and Description
`TimeseriesTable(DatasetSpecification spec, Table table)` Creates an instance of the table.

Method Summary

All Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`SplitReader<byte[],TimeseriesTable.Entry>`	`createSplitReader(Split split)` Creates a reader for the split of a dataset.
`List<Split>`	`getInputSplits(int splitsCount, byte[] key, long startTime, long endTime, byte[]... tags)` Defines input selection for batch jobs.
`List<Split>`	`getSplits()` Returns all splits of the dataset.
`Iterator<TimeseriesTable.Entry>`	`read(byte[] key, long startTime, long endTime, byte[]... tags)` Reads entries for a given time range and returns an `Iterator`.
`Iterator<TimeseriesTable.Entry>`	`read(byte[] key, long startTime, long endTime, int offset, int limit, byte[]... tags)` Reads entries for a given time range and returns an `Iterator`.
`void`	`write(byte[] key, TimeseriesTable.Entry value)` Writes an entry to the Dataset.
`void`	`write(TimeseriesTable.Entry entry)` Writes an entry to the Dataset.

Methods inherited from class io.cdap.cdap.api.dataset.lib.AbstractDataset
close, commitTx, getName, getTransactionAwareName, getTxChanges, postTxCommit, rollbackTx, setMetricsCollector, startTx, toString, updateTx

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait

- Field Detail
  - TYPE
```
public static final String TYPE
```
    Type name
    
    See Also:
    
    Constant Field Values
  - ATTR_TIME_INTERVAL_TO_STORE_PER_ROW
```
public static final String ATTR_TIME_INTERVAL_TO_STORE_PER_ROW
```
    See Also:
    
    Constant Field Values
  - DEFAULT_TIME_INTERVAL_PER_ROW
```
public static final long DEFAULT_TIME_INTERVAL_PER_ROW
```
    See TimeseriesTable javadoc for description.
  - MAX_ROWS_TO_SCAN_PER_READ
```
public static final int MAX_ROWS_TO_SCAN_PER_READ
```
    Limit on the number of rows to scan per read.
    
    See Also:
    
    Constant Field Values
  - table
```
protected final Table table
```
- Constructor Detail
  - TimeseriesTable
```
public TimeseriesTable(DatasetSpecification spec,
                       Table table)
```
    Creates an instance of the table.
- Method Detail
  - write
```
public final void write(TimeseriesTable.Entry entry)
```
    Writes an entry to the Dataset.
    
    Parameters:
    
    entry - entry to write
  - read
```
public final Iterator<TimeseriesTable.Entry> read(byte[] key,
                                                  long startTime,
                                                  long endTime,
                                                  int offset,
                                                  int limit,
                                                  byte[]... tags)
```
    Reads entries for a given time range and returns an Iterator. Provides the same functionality as read(byte[], long, long, byte[]...) but accepts additional parameters for pagination purposes. NOTE: A limit is placed on the max number of time intervals to be scanned during a read, as defined by MAX_ROWS_TO_SCAN_PER_READ.
    
    Parameters:
    
    key - key of the entries to read
    
    startTime - defines start of the time range to read, inclusive
    
    endTime - defines end of the time range to read, inclusive
    
    offset - the number of initial entries to ignore and not add to the results
    
    limit - upper limit on number of results returned. If limit is exceeded, the first limit results are returned
    
    tags - a set of tags which entries returned must contain. Tags for entries are defined at write-time and an entry is only returned if it contains all of these tags.
    
    Returns:
    
    an iterator over entries that satisfy provided conditions
    
    Throws:
    
    IllegalArgumentException - when provided condition is incorrect
  - read
```
public Iterator<TimeseriesTable.Entry> read(byte[] key,
                                            long startTime,
                                            long endTime,
                                            byte[]... tags)
```
    Reads entries for a given time range and returns an Iterator. NOTE: A limit is placed on the max number of time intervals to be scanned during a read, as defined by MAX_ROWS_TO_SCAN_PER_READ.
    
    Parameters:
    
    key - key of the entries to read
    
    startTime - defines start of the time range to read, inclusive
    
    endTime - defines end of the time range to read, inclusive
    
    tags - a set of tags which entries returned must contain. Tags for entries are defined at write-time and an entry is only returned if it contains all of these tags.
    
    Returns:
    
    an iterator over entries that satisfy provided conditions
  - getInputSplits
```
public List<Split> getInputSplits(int splitsCount,
                                  byte[] key,
                                  long startTime,
                                  long endTime,
                                  byte[]... tags)
```
    Defines input selection for batch jobs.
    
    Parameters:
    
    splitsCount - number of parts to split the data selection into
    
    key - key of the entries to read
    
    startTime - defines start of the time range to read, inclusive
    
    endTime - defines end of the time range to read, inclusive
    
    tags - a set of tags which entries returned must contain. Tags for entries are defined at write-time and an entry is only returned if it contains all of these tags.
    
    Returns:
    
    the list of splits
  - getSplits
```
public List<Split> getSplits()
```
    Description copied from interface: BatchReadable
    
    Returns all splits of the dataset.
    For feeding the whole dataset into a batch job.
    
    Specified by:
    
    getSplits in interface BatchReadable<byte[],TimeseriesTable.Entry>
    
    Returns:
    
    A list of Splits.
  - createSplitReader
```
public SplitReader<byte[],TimeseriesTable.Entry> createSplitReader(Split split)
```
    Description copied from interface: BatchReadable
    
    Creates a reader for the split of a dataset.
    
    Specified by:
    
    createSplitReader in interface BatchReadable<byte[],TimeseriesTable.Entry>
    
    Parameters:
    
    split - The split to create a reader for.
    
    Returns:
    
    The instance of a SplitReader.
  - write
```
public void write(byte[] key,
                  TimeseriesTable.Entry value)
```
    Writes an entry to the Dataset. This method overrides write(key, value) in BatchWritable. The key is ignored in this method and instead it uses the key provided in the Entry object.
    
    Specified by:
    
    write in interface BatchWritable<byte[],TimeseriesTable.Entry>
    
    Parameters:
    
    key - row key to write to. Value is ignored
    
    value - entry to write. The key used to write to the table is extracted from this object

Class TimeseriesTable

Nested Class Summary

Field Summary

Constructor Summary

Method Summary

Methods inherited from class io.cdap.cdap.api.dataset.lib.AbstractDataset

Methods inherited from class java.lang.Object

Field Detail

TYPE

ATTR_TIME_INTERVAL_TO_STORE_PER_ROW

DEFAULT_TIME_INTERVAL_PER_ROW

MAX_ROWS_TO_SCAN_PER_READ

table

Constructor Detail

TimeseriesTable

Method Detail

write

read

read

getInputSplits

getSplits

createSplitReader

write