Object

za.co.absa.cobrix.cobol.parser

CopybookParser

Related Doc: package parser

Permalink

object CopybookParser

The object contains generic function for the Copybook parser

Linear Supertypes
AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. CopybookParser
  2. AnyRef
  3. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Type Members

  1. type CopybookAST = Group

    Permalink
  2. case class CopybookLine(level: Int, name: String, lineNumber: Int, modifiers: Map[String, String]) extends Product with Serializable

    Permalink
  3. case class RecordBoundary(name: String, begin: Int, end: Int) extends Product with Serializable

    Permalink
  4. case class StatementLine(lineNumber: Int, text: String) extends Product with Serializable

    Permalink
  5. case class StatementTokens(lineNumber: Int, tokens: Array[String]) extends Product with Serializable

    Permalink

Value Members

  1. final def !=(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  4. final def asInstanceOf[T0]: T0

    Permalink
    Definition Classes
    Any
  5. def calculateBinaryProperties(ast: CopybookAST): CopybookAST

    Permalink

    Calculate binary properties based on the whole AST

    Calculate binary properties based on the whole AST

    ast

    An AST as a set of copybook records

    returns

    The same AST with binary properties set for every field

  6. def calculateSchemaSizes(ast: CopybookAST): CopybookAST

    Permalink

    Calculate binary properties for a mutable Cobybook schema which is just an array of AST objects

    Calculate binary properties for a mutable Cobybook schema which is just an array of AST objects

    ast

    An array of AST objects

    returns

    The same AST with binary properties set for every field

    Annotations
    @throws( classOf[SyntaxErrorException] )
  7. def clone(): AnyRef

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  8. final def eq(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  9. def equals(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  10. def finalize(): Unit

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  11. def findCycleInAMap(m: Map[String, String]): List[String]

    Permalink

    Finds a cycle in a parent-child relation map.

    Finds a cycle in a parent-child relation map.

    m

    A mapping from field name to its parent field name.

    returns

    A list of fields in a cycle if there is one, an empty list otherwise

  12. def getAllSegmentRedefines(schema: CopybookAST): List[Group]

    Permalink

    Given an AST of a copybook returns the list of all segment redefine GROUPs

    Given an AST of a copybook returns the list of all segment redefine GROUPs

    schema

    An AST as a set of copybook records

    returns

    A list of segment redefine GROUPs

  13. final def getClass(): Class[_]

    Permalink
    Definition Classes
    AnyRef → Any
  14. def getParentToChildrenMap(schema: CopybookAST): Map[String, Seq[Group]]

    Permalink

    Given an AST of a copybook returns a map from segment redefines to their children

    Given an AST of a copybook returns a map from segment redefines to their children

    schema

    An AST as a set of copybook records

    returns

    A map from segment redefines to their children

  15. def getRootSegmentAST(schema: CopybookAST): CopybookAST

    Permalink

    Given an AST of a copybook returns a new AST that does not contain child segments

    Given an AST of a copybook returns a new AST that does not contain child segments

    schema

    An AST as a set of copybook records

    returns

    A list of segment redefine GROUPs

  16. def getRootSegmentIds(segmentIdRedefineMap: Map[String, String], fieldParentMap: Map[String, String]): List[String]

    Permalink

    Returns a a list of values of segment ids for the root segment.

  17. def getSchemaWithOffsets(bitOffset: Int, ast: CopybookAST): CopybookAST

    Permalink

    Calculate binary offsets for a mutable Cobybook schema which is just an array of AST objects

    Calculate binary offsets for a mutable Cobybook schema which is just an array of AST objects

    ast

    An array of AST objects

    returns

    The same AST with all offsets set for every field

  18. def hashCode(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  19. final def isInstanceOf[T0]: Boolean

    Permalink
    Definition Classes
    Any
  20. def markDependeeFields(ast: CopybookAST, occursHandlers: Map[String, Map[String, Int]]): CopybookAST

    Permalink

    Sets isDependee attribute for fields in the schema which are used by other fields in DEPENDING ON clause

    Sets isDependee attribute for fields in the schema which are used by other fields in DEPENDING ON clause

    ast

    An AST as a set of copybook records

    returns

    The same AST with binary properties set for every field

    Annotations
    @throws( classOf[IllegalStateException] )
  21. def markSegmentRedefines(ast: CopybookAST, segmentRedefines: Seq[String]): CopybookAST

    Permalink

    Sets isSegmentRedefine property of redefined groups so the row extractor be able to skip parsing segment groups that do not belong to a particular segment id.

    Sets isSegmentRedefine property of redefined groups so the row extractor be able to skip parsing segment groups that do not belong to a particular segment id.

    * Each field should appear in the list only once * Any such field should be a redefine or a redefined by. * All segment fields should belong to the same redefine group. E.g. they should redefine each other, * All segment fields should belong to the level 1 (one level down record root level) * A segment redefine cannot be inside an array

    ast

    An AST as a set of copybook records

    segmentRedefines

    The list of fields names that correspond to segment GROUPs.

    returns

    The same AST with binary properties set for every field

    Annotations
    @throws( classOf[IllegalStateException] )
  22. final def ne(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  23. final def notify(): Unit

    Permalink
    Definition Classes
    AnyRef
  24. final def notifyAll(): Unit

    Permalink
    Definition Classes
    AnyRef
  25. def parse(copyBookContents: String, dataEncoding: Encoding = EBCDIC, dropGroupFillers: Boolean = false, dropValueFillers: Boolean = true, segmentRedefines: Seq[String] = Nil, fieldParentMap: Map[String, String] = HashMap[String, String](), stringTrimmingPolicy: StringTrimmingPolicy = StringTrimmingPolicy.TrimBoth, commentPolicy: CommentPolicy = CommentPolicy(), improvedNullDetection: Boolean = false, ebcdicCodePage: CodePage = new CodePageCommon, asciiCharset: Charset = StandardCharsets.US_ASCII, isUtf16BigEndian: Boolean = true, floatingPointFormat: FloatingPointFormat = FloatingPointFormat.IBM, nonTerminals: Seq[String] = Nil, occursHandlers: Map[String, Map[String, Int]] = Map(), debugFieldsPolicy: DebugFieldsPolicy = DebugFieldsPolicy.NoDebug): Copybook

    Permalink

    Tokenizes a Cobol Copybook contents and returns the AST.

    Tokenizes a Cobol Copybook contents and returns the AST.

    copyBookContents

    A string containing all lines of a copybook

    dataEncoding

    Encoding of the data file (either ASCII/EBCDIC). The encoding of the copybook is expected to be ASCII.

    dropGroupFillers

    Drop groups marked as fillers from the output AST

    dropValueFillers

    Drop primitive fields marked as fillers from the output AST

    segmentRedefines

    A list of redefined fields that correspond to various segments. This needs to be specified for automatically resolving segment redefines.

    fieldParentMap

    A segment fields parent mapping

    stringTrimmingPolicy

    Specifies if and how strings should be trimmed when parsed

    commentPolicy

    Specifies a policy for comments truncation inside a copybook

    improvedNullDetection

    If true, string values that contain only zero bytes (0x0) will be considered null.

    ebcdicCodePage

    A code page for EBCDIC encoded data

    asciiCharset

    A charset for ASCII encoded data

    isUtf16BigEndian

    If true UTF-16 strings are considered big-endian.

    floatingPointFormat

    A format of floating-point numbers (IBM/IEEE754)

    nonTerminals

    A list of non-terminals that should be extracted as strings

    debugFieldsPolicy

    Specifies if debugging fields need to be added and what should they contain (false, hex, raw).

    returns

    Seq[Group] where a group is a record inside the copybook

  26. def parseSimple(copyBookContents: String, dropGroupFillers: Boolean = false, dropValueFillers: Boolean = true, commentPolicy: CommentPolicy = CommentPolicy()): Copybook

    Permalink

    Tokenizes a Cobol Copybook contents and returns the AST.

    Tokenizes a Cobol Copybook contents and returns the AST.

    This method accepts arguments that affect only structure of the output AST.

    copyBookContents

    A string containing all lines of a copybook

    dropGroupFillers

    Drop groups marked as fillers from the output AST

    dropValueFillers

    Drop primitive fields marked as fillers from the output AST

    commentPolicy

    Specifies a policy for comments truncation inside a copybook

    returns

    Seq[Group] where a group is a record inside the copybook

  27. def parseTree(enc: Encoding, copyBookContents: String, dropGroupFillers: Boolean, dropValueFillers: Boolean, segmentRedefines: Seq[String], fieldParentMap: Map[String, String], stringTrimmingPolicy: StringTrimmingPolicy, commentPolicy: CommentPolicy, improvedNullDetection: Boolean, ebcdicCodePage: CodePage, asciiCharset: Charset, isUtf16BigEndian: Boolean, floatingPointFormat: FloatingPointFormat, nonTerminals: Seq[String], occursHandlers: Map[String, Map[String, Int]], debugFieldsPolicy: DebugFieldsPolicy): Copybook

    Permalink

    Tokenizes a Cobol Copybook contents and returns the AST.

    Tokenizes a Cobol Copybook contents and returns the AST.

    enc

    Encoding of the data file (either ASCII/EBCDIC). The encoding of the copybook is expected to be ASCII.

    copyBookContents

    A string containing all lines of a copybook

    dropGroupFillers

    Drop groups marked as fillers from the output AST

    dropValueFillers

    Drop primitive fields marked as fillers from the output AST

    segmentRedefines

    A list of redefined fields that correspond to various segments. This needs to be specified for automatically resolving segment redefines.

    fieldParentMap

    A segment fields parent mapping

    stringTrimmingPolicy

    Specifies if and how strings should be trimmed when parsed

    commentPolicy

    Specifies a policy for comments truncation inside a copybook

    improvedNullDetection

    If true, string values that contain only zero bytes (0x0) will be considered null.

    ebcdicCodePage

    A code page for EBCDIC encoded data

    asciiCharset

    A charset for ASCII encoded data

    isUtf16BigEndian

    If true UTF-16 strings are considered big-endian.

    floatingPointFormat

    A format of floating-point numbers (IBM/IEEE754)

    nonTerminals

    A list of non-terminals that should be extracted as strings

    debugFieldsPolicy

    Specifies if debugging fields need to be added and what should they contain (false, hex, raw).

    returns

    Seq[Group] where a group is a record inside the copybook

    Annotations
    @throws( classOf[SyntaxErrorException] )
  28. def parseTree(copyBookContents: String, dropGroupFillers: Boolean = false, dropValueFillers: Boolean = true, segmentRedefines: Seq[String] = Nil, fieldParentMap: Map[String, String] = HashMap[String, String](), stringTrimmingPolicy: StringTrimmingPolicy = StringTrimmingPolicy.TrimBoth, commentPolicy: CommentPolicy = CommentPolicy(), improvedNullDetection: Boolean = false, ebcdicCodePage: CodePage = new CodePageCommon, asciiCharset: Charset = StandardCharsets.US_ASCII, isUtf16BigEndian: Boolean = true, floatingPointFormat: FloatingPointFormat = FloatingPointFormat.IBM, nonTerminals: Seq[String] = Nil, occursHandlers: Map[String, Map[String, Int]] = Map(), debugFieldsPolicy: DebugFieldsPolicy = DebugFieldsPolicy.NoDebug): Copybook

    Permalink

    Tokenizes a Cobol Copybook contents and returns the AST.

    Tokenizes a Cobol Copybook contents and returns the AST.

    copyBookContents

    A string containing all lines of a copybook

    dropGroupFillers

    Drop groups marked as fillers from the output AST

    dropValueFillers

    Drop primitive fields marked as fillers from the output AST

    segmentRedefines

    A list of redefined fields that correspond to various segments. This needs to be specified for automatically

    fieldParentMap

    A segment fields parent mapping

    stringTrimmingPolicy

    Specifies if and how strings should be trimmed when parsed

    commentPolicy

    Specifies a policy for comments truncation inside a copybook

    improvedNullDetection

    If true, string values that contain only zero bytes (0x0) will be considered null.

    ebcdicCodePage

    A code page for EBCDIC encoded data

    asciiCharset

    A charset for ASCII encoded data

    isUtf16BigEndian

    If true UTF-16 strings are considered big-endian.

    floatingPointFormat

    A format of floating-point numbers (IBM/IEEE754)

    nonTerminals

    A list of non-terminals that should be extracted as strings

    debugFieldsPolicy

    Specifies if debugging fields need to be added and what should they contain (false, hex, raw).

    returns

    Seq[Group] where a group is a record inside the copybook

  29. def setSegmentParents(originalSchema: CopybookAST, fieldParentMap: Map[String, String]): CopybookAST

    Permalink

    Sets parent groups for child segment redefines.

    Sets parent groups for child segment redefines. This relies on segment id to redefines map. The assumptions are

    * Only one segment redefine field has empty parent - the root segment. * All other segment redefines should have a parent segment. * isSegmentRedefine should be already set for all segment redefines. * A parent of a segment redefine should be a segment redefine as well

    originalSchema

    An AST as a set of copybook records

    fieldParentMap

    A mapping between field names and their parents

    returns

    The same AST with binary properties set for every field

    Annotations
    @throws( classOf[IllegalStateException] )
  30. final def synchronized[T0](arg0: ⇒ T0): T0

    Permalink
    Definition Classes
    AnyRef
  31. def toString(): String

    Permalink
    Definition Classes
    AnyRef → Any
  32. def transformIdentifier(identifier: String): String

    Permalink

    Transforms the Cobol identifiers to be useful in Spark context.

    Transforms the Cobol identifiers to be useful in Spark context. Removes characters an identifier cannot contain.

  33. def transformIdentifierMap(identifierMap: Map[String, String]): Map[String, String]

    Permalink

    Transforms all identifiers in a map to be useful in Spark context.

    Transforms all identifiers in a map to be useful in Spark context. Removes characters an identifier cannot contain.

  34. final def wait(): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  35. final def wait(arg0: Long, arg1: Int): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  36. final def wait(arg0: Long): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )

Inherited from AnyRef

Inherited from Any

Ungrouped