Characters used to explicitly mark sentence bounds (Default: None)
Whether take lists into consideration at sentence detection (Default: true
)
Whether to explode each sentence into a different row, for better parallelization (Default: false
)
Custom sentence separator text
Whether to take lists into consideration at sentence detection.
Whether to take lists into consideration at sentence detection. Defaults to true.
Whether to split sentences into different Dataset rows.
Whether to split sentences into different Dataset rows. Useful for higher parallelism in fat rows. Defaults to false.
Get the maximum allowed length for each sentence
Get the minimum allowed length for each sentence
Length at which sentences will be forcibly split
Whether to consider abbreviation strategies for better accuracy but slower performance.
Whether to consider abbreviation strategies for better accuracy but slower performance. Defaults to true.
Use only custom bounds without considering those of Pragmatic Segmenter.
Use only custom bounds without considering those of Pragmatic Segmenter. Defaults to false. Needs customBounds.
Set the maximum allowed length for each sentence (Ignored if not set)
Set the minimum allowed length for each sentence (Default: 0
)
Custom sentence separator text
Whether to take lists into consideration at sentence detection.
Whether to take lists into consideration at sentence detection. Defaults to true.
Whether to split sentences into different Dataset rows.
Whether to split sentences into different Dataset rows. Useful for higher parallelism in fat rows. Defaults to false.
Set the maximum allowed length for each sentence
Set the minimum allowed length for each sentence
Length at which sentences will be forcibly split
Whether to consider abbreviation strategies for better accuracy but slower performance.
Whether to consider abbreviation strategies for better accuracy but slower performance. Defaults to true.
Use only custom bounds without considering those of Pragmatic Segmenter.
Use only custom bounds without considering those of Pragmatic Segmenter. Defaults to false. Needs customBounds.
Length at which sentences will be forcibly split (Ignored if not set)
Whether to apply abbreviations at sentence detection (Default: true
)
Whether to only utilize custom bounds for sentence detection (Default: false
)
See https://github.com/JohnSnowLabs/spark-nlp/tree/master/src/test/scala/com/johnsnowlabs/nlp/annotators/sbd/pragmatic for further reference on how to use this API