public final class KernelSegmenter<DATA>
extends java.lang.Object
Segments data (i.e., finds multiple changepoints) using a method based on the kernel-segmentation algorithm described in https://hal.inria.fr/hal-01413230/document, which gives a framework to quickly calculate the cost of a segment given a low-rank approximation to a specified kernel. However, unlike the algorithm described there, which seeks to minimize a global segmentation cost, the method implemented here instead finds candidate changepoints based on the local costs within windows of various sizes.
Given N sequential data points of type DATA
to segment, the basic steps of the method are:
(DATA, DATA) -> double
.
Note that we break with camelCase naming convention in places to match some notation in the paper
Modifier and Type | Class and Description |
---|---|
static class |
KernelSegmenter.ChangepointSortOrder |
Constructor and Description |
---|
KernelSegmenter(java.util.List<DATA> data) |
Modifier and Type | Method and Description |
---|---|
java.util.List<java.lang.Integer> |
findChangepoints(int maxNumChangepoints,
java.util.function.BiFunction<DATA,DATA,java.lang.Double> kernel,
int kernelApproximationDimension,
java.util.List<java.lang.Integer> windowSizes,
double numChangepointsPenaltyLinearFactor,
double numChangepointsPenaltyLogLinearFactor,
KernelSegmenter.ChangepointSortOrder changepointSortOrder)
Returns a list of the indices of the changepoints, either sorted by decreasing change to the global segmentation cost
or by increasing index order.
|
public KernelSegmenter(java.util.List<DATA> data)
public java.util.List<java.lang.Integer> findChangepoints(int maxNumChangepoints, java.util.function.BiFunction<DATA,DATA,java.lang.Double> kernel, int kernelApproximationDimension, java.util.List<java.lang.Integer> windowSizes, double numChangepointsPenaltyLinearFactor, double numChangepointsPenaltyLogLinearFactor, KernelSegmenter.ChangepointSortOrder changepointSortOrder)
maxNumChangepoints
- maximum number of changepoints to return (first and last points do not count towards this number)kernel
- kernel function used to calculate segment costskernelApproximationDimension
- dimension of low-rank approximation to the kernelwindowSizes
- list of sizes to use for the flanking segments used to calculate local changepoint costsnumChangepointsPenaltyLinearFactor
- factor A for penalty of the form A * C, where C is the number of changepointsnumChangepointsPenaltyLogLinearFactor
- factor B for penalty of the form B * C * log (N / C),
where C is the number of changepoints and N is the number of data pointschangepointSortOrder
- sort by decreasing change to the global segmentation cost or by increasing index order