A spline with the property the delta between consecutive domain values is a fixed constant.
A spline with the property the delta between consecutive domain values is a fixed constant. Because of this, we just need to specify the min and max and the values in the image of the function.
NOTE: This class is exposed outside the package for use in aloha-conversions only. This class SHOULD NOT be used outside the aloha libraries.
the minimum domain value
the maximum domain value (strictly greater than min IFF spline has at least two knots, or equal to min IFF spline has one knot)
Required to have a positive number of knots (size > 0).
A tree with a map structure for the descendants data structure.
A tree with a map structure for the descendants data structure. Note: Map keys are invariant. We could make K contravariant and use existential types in the type lambda to make it so we could construct MapTree without having to specify the type in the root instance or leave instance.
key type of the map structure
type of the descendant data structure.
Provides an extension method toKv to convert Options to Seq[(String, Double)].
Provides an extension method toKv to convert Options to Seq[(String, Double)]. This is used to coerce the value to the type that is used in regression models. We don't do an implicit conversion method from Option[A] to Iterable[(String, Double)] because it can negatively impact type inference. So we make the users convert explicitly via:
val option: Option[Int] = Option(1) val iterable = option.toKv require(iterable == List(("", 1d)))
the type of Option.
An algorithm for efficiently evaluating polynomials at a given point.
An algorithm for efficiently evaluating polynomials at a given point. Evaluating first order polynomials is obviously a sub case, which is important because first order polynomial evaluation is isomorphic to linear regression, which may be the standard use case.
As an example, imagine that we wanted to evaluate Z(u,v,x,y) = wu,vuv + wu,v,xuvx + wu,v,yuvy for coefficients W = [wu,v, wu,v,x, wu,v,y]T.
This is:
That Z can be factored indicates there is a way to structure the algorithm to efficiently reuse computations. The way to achieve this is to structure the possible polynomials as a tree and traverse and aggregate over the tree. As the tree is traversed, the computation is accumulated. As a concrete example, let's take the above example and show it using real code. Then a motivation of the example will be provided.
The computation tree works as follows: the edge labels are multiplied by the associated coefficient (0 if non-existent) to get the node values. Node values are added together to get the inner product. So, every time we descend farther into the tree, we multiply the running product by the value we extract from the input vector X and every time a weight is found, it is multiplied by the current product and added to the running sum. The process recurses until the tree can no longer by traversed. The sum is then returned.
// u u v u v x // (1)*1.00 (1*1.00)*1.000 u v w1 (1*1.00*1.000)*0.75 u v x w2 // ----------> 0 ----------------> 1*1.00*1.000 * 0.5 ------------------------> 1*1.00*1.000*0.75 * 0.111 // \ // \ u v y // \ (1*1.00*1.000)*0.25 u v y w3 // ---------------------> 1*1.00*1.000*0.25 * 0.4545 // // u * v * w1 + u * v * x * w2 + u * v * y * w3 val Z = 1.00 * 1.000 * 0.5 + 1.00 * 1.000 * 0.75 * 0.111 + 1.00 * 1.000 * 0.25 * 0.4545 val X = IndexedSeq( Seq(("a=1", 1.00)), // u Seq(("b=1", 1.000)), // v Seq(("c=1", 0.75), ("c=2", 0.25))) // x and y, respectively val W1 = PolynomialEvaluator(Coefficient(0, IndexedSeq(0)), Map( "a=1" -> PolynomialEvaluator(Coefficient(0, IndexedSeq(1)), Map( "b=1" -> PolynomialEvaluator(Coefficient(0.5, IndexedSeq(2)), Map( // w1 "c=1" -> PolynomialEvaluator(Coefficient(0.111)), // w2 "c=2" -> PolynomialEvaluator(Coefficient(0.4545)))))))) // w3 assert(Z == (W1 dot X))
While constructing a PolynomialEvaluator via direct means is entirely possible, it is less straightforward than using a builder to do it. Below, we show a better way to construct PolynomialEvaluator instances where we just specify the terms in the polynomial and the associated coefficient values. Note linear regression is the special case when all of the inner maps contain exactly one element.
val W2 = (PolynomialEvaluator.builder ++= Map( Map("a=1" -> 0, "b=1" -> 1 ) -> 0.5, Map("a=1" -> 0, "b=1" -> 1, "c=1" -> 2) -> 0.111, Map("a=1" -> 0, "b=1" -> 1, "c=2" -> 2) -> 0.4545 )).result assert(W2 == W1)
Notice the values in the inner map look a little weird. These are the indices into the input vector x from which the key comes. This is for efficiency purposes but allows the algorithm to dramatically prune the search space while accumulating over the tree.
Provides a method to evaluate polynomials given an input.
Provides a method to evaluate polynomials given an input. Default implementation of com.eharmony.aloha.models.reg.PolynomialEvaluationAlgo.
Created by deak on 11/1/15.
A helper trait for sparse regression models with String keys.
A helper trait for sparse regression models with String keys. This trait exposes the constructFeatures method which applies the featureFunctions to the input data and keeps track of missing features.
A regression model capable of doing not only linear regression but polynomial regression in general.
A regression model capable of doing not only linear regression but polynomial regression in general.
val regImp = "com.eharmony.aloha.models.reg.RegressionModelValueToTupleConversions._" val compiler = ... val plugin = ... val imports: Seq[String] = ... val s = CompiledSemantics(compiler, plugin, imports :+ regImp)
This is useful because these conversions allow implicit conversion function from some of the AnyVal types and Options of AnyVal types to Iterable[(String, Double)]. This is useful because specifying features in the JSON spec like:
{ ... "features": { "intercept": "-3", "income": "${user.profile.income}" } }
into sequences like:
val interceptFeature = Iterable(("intercept", 3.0)) // AND val incomeFeature = Iterable(("income", [the income value converted to a double]))
For more information, see com.eharmony.aloha.models.reg.RegressionModelValueToTupleConversions.
model input type
model output type. to convert from B to com.eharmony.aloha.score.Scores.Score
An identifier for the model. User in score and error reporting.
feature names (parallel to featureFunctions)
feature extracting functions.
representation of the regression model parameters.
a function applied to the inner product of the input vector and weight vector.
an optional calibration spline to Obtaining calibrated probability estimates from decision trees and naive Bayesian classifiers, Zadrozny, Elkan (ICML, 2001). This is applied prior to invLinkFunction
if provided, we check whether the threshold is exceeded. If so, return an error instead of the computed score. This is for missing data situations.
Provides a series of implicit conversions to make the specification of regression models cleaner.
Provides a series of implicit conversions to make the specification of regression models cleaner.
Each feature in the Regression model constructs an Iterable[(String, Double)]. Once each feature constructs the iterable, the regression model maps this to a new one prefixed by the feature name. For instance, in the example that follows, "intercept" would emit a value of type Long which would become a function of type com.eharmony.aloha.semantics.func.GenAggFunc [A, Long]. This however doesn't match the expected output type of com.eharmony.aloha.semantics.func.GenAggFunc [A, Iterable[(String, Double)] ]. Conversions are provide for {Byte, Short, Int, Long, Float, Double} and the Option equivalents so that can produce specify the translate the JSON key-value pair "intercept": "1234L" to Iterable(("", 1234.0)), which when prefixed will yield Iterable(("intercept", 1234.0))
* { "modelType": "Regression", "modelId": {"id": 0, "name": ""}, "features": { "intercept": "1234L", "some_option": "Option(5678L).toKv" ... }, ... }
A polynomial evaluator.