Simple case class wrapper around the components of a URI.
Helper to convert a Boolean value to a Byte.
Helper to convert a Boolean value to a Byte. Does not require any validation.
The Boolean to convert into a Byte
0 if false, 1 if true
Helper to convert a Byte value (1 or 0) into a Boolean.
Helper to convert a Byte value (1 or 0) into a Boolean.
The Byte to turn into a Boolean
the Boolean value of b, or an error message if b is not 0 or 1 - all boxed in a Scalaz Validation
Decodes a URL-safe Base64 string.
Decodes a URL-safe Base64 string.
For details on the Base 64 Encoding with URL and Filename Safe Alphabet see:
http://tools.ietf.org/html/rfc4648#page-7
The name of the field
The encoded string to be decoded
a Scalaz Validation, wrapping either an an error String or the decoded String
Decodes a String in the specific encoding, also removing: * Newlines - because they will break Hive * Tabs - because they will break non-Hive targets (e.g.
Decodes a String in the specific encoding, also removing: * Newlines - because they will break Hive * Tabs - because they will break non-Hive targets (e.g. Infobright)
IMPLDIFF: note that this version, unlike the Hive serde version, does not call cleanUri. This is because we cannot assume that str is a URI which needs 'cleaning'.
TODO: simplify this when we move to a more robust output format (e.g. Avro) - as then no need to remove line breaks, tabs etc
a Scalaz Validation, wrapping either an error String or the decoded String
Decode double-encoded percents, then percent decode
Decode double-encoded percents, then percent decode
The name of the field
The String to decode
a Scalaz Validation, wrapping either an error String or the decoded String
Encodes a URL-safe Base64 string.
Encodes a URL-safe Base64 string.
For details on the Base 64 Encoding with URL and Filename Safe Alphabet see:
http://tools.ietf.org/html/rfc4648#page-7
The string to be encoded
the string encoded in URL-safe Base64
Encodes a string in the specified encoding
Encodes a string in the specified encoding
The encoding to be used
The string which needs to be URLEncoded
a URL encoded string
Explodes a URI into its 6 components pieces.
Explodes a URI into its 6 components pieces. Simple code but we use it in multiple places
The URI to explode into its constituent pieces
The 6 components in a UriComponents case class
Attempt to extract the querystring from a URI as a map
Attempt to extract the querystring from a URI as a map
URI containing the querystring
Encoding of the URI
Replaces tabs with four spaces and removes newlines altogether.
Replaces tabs with four spaces and removes newlines altogether.
Useful to prepare user-created strings for fragile storage formats like TSV.
The String to fix
The String with tabs and newlines fixed.
Quick helper to make sure our Strings are TSV-safe, i.e.
Quick helper to make sure our Strings are TSV-safe, i.e. don't include tabs, special characters, newlines etc.
The string we want to make safe
a safe String
On 17th August 2013, Amazon made an unannounced change to their CloudFront log format - they went from always encoding % characters, to only encoding % characters which were not previously encoded.
On 17th August 2013, Amazon made an unannounced change to their CloudFront log format - they went from always encoding % characters, to only encoding % characters which were not previously encoded. For a full discussion of this see:
https://forums.aws.amazon.com/thread.jspa?threadID=134017&tstart=0#
On 14th September 2013, Amazon rolled out a further fix, from which point onwards all fields, including the referer and useragent, would have %s double-encoded.
This causes issues, because the ETL process expects referers and useragents to be only single-encoded.
This function turns a double-encoded percent (%) into a single-encoded one.
Examples: 1. "page=Celestial%25Tarot" - no change (only single encoded) 2. "page=Dreaming%2520Way%2520Tarot" -> "page=Dreaming%20Way%20Tarot" 3. "loading 30%2525 complete" -> "loading 30%25 complete"
Limitation of this approach: %2588 is ambiguous. Is it a: a) A double-escaped caret "ˆ" (%2588 -> %88 -> ^), or: b) A single-escaped "%88" (%2588 -> %88)
This code assumes it's a).
The String which potentially has double-encoded %s
the String with %s now single-encoded
Converts a String of value "1" or "0" to true or false respectively.
Converts a String of value "1" or "0" to true or false respectively.
The String to convert
True for "1", false for "0", or an error message for any other value, all boxed in a Scalaz Validation
Extract a Java Byte representing 1 or 0 only from a String, or error.
Extract a Java Byte representing 1 or 0 only from a String, or error.
a Scalaz Validation, being either a Failure String or a Success Byte
Convert a String to a String containing a Redshift-compatible Double.
Convert a String to a String containing a Redshift-compatible Double.
Necessary because Redshift does not support all Java Double syntaxes e.g. "3.4028235E38"
Note that this code does NOT check that the value will fit within a Redshift Double - meaning Redshift may silently round this number on load.
a Scalaz Validation, being either a Failure String or a Success String
Extract a Scala Int from a String, or error.
Extract a Scala Int from a String, or error.
a Scalaz Validation, being either a Failure String or a Success JInt
Convert a String to a Double
Convert a String to a Double
The name of the field we are validating. To use in our error message
The String which we hope contains a Double
a Scalaz Validation, being either a Failure String or a Success Double
A wrapper around Java's URI.create().
A wrapper around Java's URI.create().
Exceptions thrown by URI.create(): 1. NullPointerException if uri is null 2. IllegalArgumentException if uri violates RFC 2396
The URI string to convert
Whether to use the com.netaporter.uri library
an Option-boxed URI object, or an error message, all wrapped in a Validation
Truncates a String - useful for making sure Strings can't overflow a database field.
Truncates a String - useful for making sure Strings can't overflow a database field.
The String to truncate
The maximum length of the String to keep
the truncated String
a Scalaz ValidatedString containing either the original String on Success, or an error String on Failure.
Validates that the given field contains a valid UUID.
Validates that the given field contains a valid UUID.
a Scalaz ValidatedString containing either the original String on Success, or an error String on Failure.
General-purpose utils to help the ETL process along.