public class AnnotatedNYTDocument extends Object
NYTCorpusDocument
object, providing some
extra utility.
Optional
wrappers. Methods that return an Optional
object represent fields that
can be null
in the Annotated NYT corpus. Methods that return an object directly
(e.g., String
, Integer
etc.) are guaranteed to be non-null across the corpus.
List
objects, for convenience, for many fields.
The semantics of these lists is as follows: the list is guaranteed to be non-null,
but the list might be empty if the underlying corpus document does not have content
for the field specified.Constructor and Description |
---|
AnnotatedNYTDocument(NYTCorpusDocument nytdoc)
Wrap an
NYTCorpusDocument object. |
Modifier and Type | Method and Description |
---|---|
Optional<URL> |
getAlternateURL() |
Optional<String> |
getArticleAbstract() |
Optional<String> |
getAuthorBiography() |
Optional<String> |
getBanner() |
List<String> |
getBiographicalCategories() |
List<String> |
getBodyAsList() |
Optional<String> |
getByline() |
Optional<String> |
getColumnName() |
Optional<Integer> |
getColumnNumber()
Accessor for the columnNumber property.
|
Optional<Date> |
getCorrectionDate()
Accessor for the correctionDate property.
|
Optional<String> |
getCorrectionText() |
String |
getCredit()
Accessor for the credit property.
|
Optional<String> |
getDateline() |
Optional<String> |
getDayOfWeek()
Accessor for the dayOfWeek property.
|
List<String> |
getDescriptors() |
Optional<String> |
getFeaturePage()
Accessor for the featurePage property.
|
List<String> |
getGeneralOnlineDescriptors()
Accessor for the generalOnlineDescriptors property.
|
Integer |
getGuid() |
Optional<String> |
getHeadline() |
Optional<String> |
getKicker() |
Optional<String> |
getLeadParagraph() |
List<String> |
getLeadParagraphAsList() |
List<String> |
getLocations()
Accessor for the locations property.
|
List<String> |
getNames()
Accessor for the names property.
|
Optional<String> |
getNewsDesk()
Accessor for the newsDesk property.
|
Optional<String> |
getNormalizedByline()
Accessor for the normalizedByline property.
|
List<String> |
getOnlineDescriptors()
Accessor for the onlineDescriptors property.
|
Optional<String> |
getOnlineHeadline() |
Optional<String> |
getOnlineLeadParagraph() |
List<String> |
getOnlineLeadParagraphAsList() |
List<String> |
getOnlineLocations()
Accessor for the onlineLocations property.
|
List<String> |
getOnlineOrganizations()
Accessor for the onlineOrganizations property.
|
List<String> |
getOnlinePeople()
Accessor for the onlinePeople property.
|
Optional<String> |
getOnlineSection()
Accessor for the onlineSection property.
|
List<String> |
getOnlineSectionAsList() |
List<String> |
getOnlineTitles()
Accessor for the onlineTitles property.
|
List<String> |
getOrganizations()
Accessor for the organizations property.
|
Optional<Integer> |
getPage()
Accessor for the page property.
|
List<String> |
getPeople()
Accessor for the people property.
|
Optional<Date> |
getPublicationDate()
Accessor for the publicationDate property.
|
Optional<Integer> |
getPublicationDayOfMonth()
Accessor for the publicationDayOfMonth property.
|
Optional<Integer> |
getPublicationMonth()
Accessor for the publicationMonth property.
|
Optional<Integer> |
getPublicationYear()
Accessor for the publicationYear property.
|
Optional<String> |
getSection()
Accessor for the section property.
|
Optional<String> |
getSeriesName()
Accessor for the seriesName property.
|
Optional<String> |
getSlug()
Accessor for the slug property.
|
Optional<Path> |
getSourcePath()
Accessor for the sourceFile property.
|
List<String> |
getTaxonomicClassifiers()
Accessor for the taxonomicClassifiers property.
|
List<String> |
getTitles()
Accessor for the titles property.
|
List<String> |
getTypesOfMaterial()
Accessor for the typesOfMaterial property.
|
Optional<URL> |
getUrl()
Accessor for the url property.
|
Optional<Integer> |
getWordCount()
Accessor for the wordCount property.
|
String |
toString() |
public AnnotatedNYTDocument(NYTCorpusDocument nytdoc)
NYTCorpusDocument
object.nytdoc
- the NYTCorpusDocument
to wrappublic Integer getGuid()
Integer
. Guaranteed non-null.public List<String> getOnlineSectionAsList()
;
delimeter.public List<String> getLeadParagraphAsList()
public List<String> getOnlineLeadParagraphAsList()
public List<String> getBodyAsList()
public Optional<String> getOnlineLeadParagraph()
public List<String> getBiographicalCategories()
public Optional<Integer> getColumnNumber()
public Optional<Date> getCorrectionDate()
public String getCredit()
public Optional<String> getDayOfWeek()
public Optional<String> getFeaturePage()
public List<String> getGeneralOnlineDescriptors()
public List<String> getLocations()
public Optional<String> getNewsDesk()
public Optional<String> getNormalizedByline()
public List<String> getOnlineDescriptors()
public List<String> getOnlineLocations()
public List<String> getOnlineOrganizations()
public List<String> getOnlinePeople()
public Optional<String> getOnlineSection()
public List<String> getOnlineTitles()
public List<String> getOrganizations()
public Optional<Date> getPublicationDate()
public Optional<Integer> getPublicationDayOfMonth()
public Optional<Integer> getPublicationMonth()
public Optional<Integer> getPublicationYear()
public Optional<String> getSection()
public Optional<String> getSeriesName()
public Optional<Path> getSourcePath()
public List<String> getTaxonomicClassifiers()
public List<String> getTypesOfMaterial()
public Optional<Integer> getWordCount()
Copyright © 2015 Johns Hopkins University HLTCOE. All rights reserved.