Changes numeric values to booleans so that expressions like true = 1 can be evaluated.
Coerces the type of different branches of a CASE WHEN statement to a common type.
Turns Add/Subtract of DateType/TimestampType/StringType and CalendarIntervalType to TimeAdd/TimeSub
Calculates and propagates precision for fixed-precision decimals.
Calculates and propagates precision for fixed-precision decimals. Hive has a number of rules for this based on the SQL standard and MS SQL: https://cwiki.apache.org/confluence/download/attachments/27362075/Hive_Decimal_Precision_Scale_Support.pdf https://msdn.microsoft.com/en-us/library/ms190476.aspx
In particular, if we have expressions e1 and e2 with precision/scale p1/s2 and p2/s2 respectively, then the following operations have the following precision / scale:
Operation Result Precision Result Scale ------------------------------------------------------------------------ e1 + e2 max(s1, s2) + max(p1-s1, p2-s2) + 1 max(s1, s2) e1 - e2 max(s1, s2) + max(p1-s1, p2-s2) + 1 max(s1, s2) e1 * e2 p1 + p2 + 1 s1 + s2 e1 / e2 p1 - s1 + s2 + max(6, s1 + p2 + 1) max(6, s1 + p2 + 1) e1 % e2 min(p1-s1, p2-s2) + max(s1, s2) max(s1, s2) e1 union e2 max(s1, s2) + max(p1-s1, p2-s2) max(s1, s2) sum(e1) p1 + 10 s1 avg(e1) p1 + 4 s1 + 4
Catalyst also has unlimited-precision decimals. For those, all ops return unlimited precision.
To implement the rules for fixed-precision types, we introduce casts to turn them to unlimited precision, do the math on unlimited-precision numbers, then introduce casts back to the required fixed precision. This allows us to do all rounding and overflow handling in the cast-to-fixed-precision operator.
In addition, when mixing non-decimal types with decimals, we use the following rules: - BYTE gets turned into DECIMAL(3, 0) - SHORT gets turned into DECIMAL(5, 0) - INT gets turned into DECIMAL(10, 0) - LONG gets turned into DECIMAL(20, 0) - FLOAT and DOUBLE cause fixed-length decimals to turn into DOUBLE
Note: Union/Except/Interact is handled by WidenTypes
Hive only performs integral division with the DIV operator.
Hive only performs integral division with the DIV operator. The arguments to / are always converted to fractional types.
This ensure that the types for various functions are as expected.
Coerces the type of different branches of If statement to a common type.
Casts types according to the expected input types for Expressions.
Convert all expressions in in() list to the left operator type
Promotes strings that appear in arithmetic expressions.
Applies any changes to AttributeReference data types that are made by other rules to instances higher in the query tree.
When encountering a cast from a string representing a valid fractional number to an integral
type the jvm will throw a java.lang.NumberFormatException
.
When encountering a cast from a string representing a valid fractional number to an integral
type the jvm will throw a java.lang.NumberFormatException
. Hive, in contrast, returns the
truncated version of this number.
Widens numeric types and converts strings to numbers when appropriate.
Widens numeric types and converts strings to numbers when appropriate.
Loosely based on rules from "Hadoop: The Definitive Guide" 2nd edition, by Tom White
The implicit conversion rules can be summarized as follows:
Additionally, all types when UNION-ed with strings will be promoted to strings. Other string conversions are handled by PromoteStrings.
Widening types might result in loss of precision in the following cases: - IntegerType to FloatType - LongType to FloatType - LongType to DoubleType - DecimalType to Double
This rule is only applied to Union/Except/Intersect
Find the tightest common type of two types that might be used in a binary expression.
Find the tightest common type of two types that might be used in a binary expression. This handles all numeric types except fixed-precision decimals interacting with each other or with primitive types, because in that case the precision and scale of the result depends on the operation. Those rules are implemented in HiveTypeCoercion.DecimalPrecision.
A collection of Rules that can be used to coerce differing types that participate in operations into compatible ones. Most of these rules are based on Hive semantics, but they do not introduce any dependencies on the hive codebase. For this reason they remain in Catalyst until we have a more standard set of coercions.