T
- data type for row_splits
outputpublic final class UnicodeDecodeWithOffsets<T extends TNumber> extends RawOp
input
into a sequence of Unicode code points.
The character codepoints for all strings are returned using a single vector
char_values
, with strings expanded to characters in row-major order.
Similarly, the character start byte offsets are returned using a single vector
char_to_byte_starts
, with strings expanded in row-major order.
The row_splits
tensor indicates where the codepoints and start offsets for
each input string begin and end within the char_values
and
char_to_byte_starts
tensors. In particular, the values for the i
th
string (in row-major order) are stored in the slice
[row_splits[i]:row_splits[i+1]]
. Thus:
char_values[row_splits[i]+j]
is the Unicode codepoint for the j
th
character in the i
th string (in row-major order).char_to_bytes_starts[row_splits[i]+j]
is the start byte offset for the j
th
character in the i
th string (in row-major order).row_splits[i+1] - row_splits[i]
is the number of characters in the i
th
string (in row-major order).Modifier and Type | Class and Description |
---|---|
static class |
UnicodeDecodeWithOffsets.Inputs |
static class |
UnicodeDecodeWithOffsets.Options
Optional attributes for
UnicodeDecodeWithOffsets |
Modifier and Type | Field and Description |
---|---|
static String |
OP_NAME
The name of this op, as known by TensorFlow core engine
|
Constructor and Description |
---|
UnicodeDecodeWithOffsets(Operation operation) |
Modifier and Type | Method and Description |
---|---|
Output<TInt64> |
charToByteStarts()
Gets charToByteStarts.
|
Output<TInt32> |
charValues()
Gets charValues.
|
static <T extends TNumber> |
create(Scope scope,
Operand<TString> input,
String inputEncoding,
Class<T> Tsplits,
UnicodeDecodeWithOffsets.Options... options)
Factory method to create a class wrapping a new UnicodeDecodeWithOffsets operation.
|
static UnicodeDecodeWithOffsets<TInt64> |
create(Scope scope,
Operand<TString> input,
String inputEncoding,
UnicodeDecodeWithOffsets.Options[] options)
Factory method to create a class wrapping a new UnicodeDecodeWithOffsets operation, with the default output types.
|
static UnicodeDecodeWithOffsets.Options |
errors(String errors)
Sets the errors option.
|
static UnicodeDecodeWithOffsets.Options |
replaceControlCharacters(Boolean replaceControlCharacters)
Sets the replaceControlCharacters option.
|
static UnicodeDecodeWithOffsets.Options |
replacementChar(Long replacementChar)
Sets the replacementChar option.
|
Output<T> |
rowSplits()
Gets rowSplits.
|
public static final String OP_NAME
public UnicodeDecodeWithOffsets(Operation operation)
@Endpoint(describeByClass=true) public static <T extends TNumber> UnicodeDecodeWithOffsets<T> create(Scope scope, Operand<TString> input, String inputEncoding, Class<T> Tsplits, UnicodeDecodeWithOffsets.Options... options)
T
- data type for UnicodeDecodeWithOffsets
output and operandsscope
- current scopeinput
- The text to be decoded. Can have any shape. Note that the output is flattened
to a vector of char values.inputEncoding
- Text encoding of the input strings. This is any of the encodings supported
by ICU ucnv algorithmic converters. Examples: "UTF-16", "US ASCII", "UTF-8"
.Tsplits
- The value of the Tsplits attributeoptions
- carries optional attribute values@Endpoint(describeByClass=true) public static UnicodeDecodeWithOffsets<TInt64> create(Scope scope, Operand<TString> input, String inputEncoding, UnicodeDecodeWithOffsets.Options[] options)
scope
- current scopeinput
- The text to be decoded. Can have any shape. Note that the output is flattened
to a vector of char values.inputEncoding
- Text encoding of the input strings. This is any of the encodings supported
by ICU ucnv algorithmic converters. Examples: "UTF-16", "US ASCII", "UTF-8"
.options
- carries optional attribute valuespublic static UnicodeDecodeWithOffsets.Options errors(String errors)
errors
- Error handling policy when there is invalid formatting found in the input.
The value of 'strict' will cause the operation to produce a InvalidArgument
error on any invalid input formatting. A value of 'replace' (the default) will
cause the operation to replace any invalid formatting in the input with the
replacement_char
codepoint. A value of 'ignore' will cause the operation to
skip any invalid formatting in the input and produce no corresponding output
character.public static UnicodeDecodeWithOffsets.Options replacementChar(Long replacementChar)
replacementChar
- The replacement character codepoint to be used in place of any invalid
formatting in the input when errors='replace'
. Any valid unicode codepoint may
be used. The default value is the default unicode replacement character is
0xFFFD or U+65533.)public static UnicodeDecodeWithOffsets.Options replaceControlCharacters(Boolean replaceControlCharacters)
replaceControlCharacters
- Whether to replace the C0 control characters (00-1F) with the
replacement_char
. Default is false.public Output<T> rowSplits()
public Output<TInt32> charValues()
Copyright © 2015–2022. All rights reserved.