@TriggerSerially @InputRequirement(value=INPUT_FORBIDDEN) @Tags(value={"tail","file","log","text","source"}) @CapabilityDescription(value="\"Tails\" a file, or a list of files, ingesting data from the file as it is written to the file. The file is expected to be textual. Data is ingested only when a new line is encountered (carriage return or new-line character or combination). If the file to tail is periodically \"rolled over\", as is generally the case with log files, an optional Rolling Filename Pattern can be used to retrieve data from files that have rolled over, even if the rollover occurred while NiFi was not running (provided that the data still exists upon restart of NiFi). It is generally advisable to set the Run Schedule to a few seconds, rather than running with the default value of 0 secs, as this Processor will consume a lot of resources if scheduled very aggressively. At this time, this Processor does not support ingesting files that have been compressed when \'rolled over\'.") @Stateful(scopes={LOCAL,CLUSTER}, description="Stores state about where in the Tailed File it left off so that on restart it does not have to duplicate data. State is stored either local or clustered depend on the <File Location> property.") @WritesAttribute(attribute="tailfile.original.path", description="Path of the original file the flow file comes from.") @Restricted(restrictions=@Restriction(requiredPermission=READ_FILESYSTEM,explanation="Provides operator the ability to read from any file that NiFi has access to.")) public class TailFile extends AbstractProcessor
Modifier and Type | Class and Description |
---|---|
(package private) static class |
TailFile.NulCharacterEncounteredException |
(package private) static class |
TailFile.TailFileObject |
(package private) static class |
TailFile.TailFileState
A simple Java class to hold information about our state so that we can
maintain this state across multiple invocations of the Processor
|
Constructor and Description |
---|
TailFile() |
Modifier and Type | Method and Description |
---|---|
private void |
cleanReader(TailFile.TailFileObject tfo) |
void |
cleanup(ProcessContext context) |
void |
compileRegex(ProcessContext context) |
private TailFile.TailFileState |
consumeFileFully(File file,
ProcessContext context,
ProcessSession session,
TailFile.TailFileObject tfo)
Creates a new FlowFile that contains the entire contents of the given
file and transfers that FlowFile to success.
|
private FileChannel |
createReader(File file,
long position) |
protected Collection<ValidationResult> |
customValidate(ValidationContext context) |
private void |
flushByteArrayOutputStream(ByteArrayOutputStream baos,
OutputStream out,
Checksum checksum,
boolean ignoreRegex) |
private void |
flushLinesBuffer(OutputStream out,
Checksum checksum) |
long |
getCurrentTimeMs() |
private List<String> |
getFilesToTail(String baseDir,
String fileRegex,
boolean isRecursive,
long maxAge)
Method to list the files to tail according to the given base directory
and using the user-provided regular expression
|
Set<Relationship> |
getRelationships() |
private List<File> |
getRolledOffFiles(ProcessContext context,
long minTimestamp,
String tailFilePath)
Returns a list of all Files that match the following criteria:
Filename matches the Rolling Filename Pattern
Filename does not match the actual file being tailed
The Last Modified Time on the file is equal to or later than the
given minimum timestamp
|
(package private) Map<String,TailFile.TailFileObject> |
getState() |
private Scope |
getStateScope(ProcessContext context) |
protected List<PropertyDescriptor> |
getSupportedPropertyDescriptors() |
private void |
initStates(List<String> filesToTail,
Map<String,String> statesMap,
boolean isCleared,
String startPosition) |
private List<String> |
lookup(ProcessContext context) |
void |
onPrimaryNodeChange() |
void |
onPropertyModified(PropertyDescriptor descriptor,
String oldValue,
String newValue) |
void |
onTrigger(ProcessContext context,
ProcessSession session) |
private void |
persistState(Map<String,String> state,
ProcessSession session,
ProcessContext context) |
private void |
persistState(TailFile.TailFileObject tfo,
ProcessSession session,
ProcessContext context) |
private void |
processTailFile(ProcessContext context,
ProcessSession session,
String tailFile) |
private long |
readLines(FileChannel reader,
ByteBuffer buffer,
OutputStream out,
Checksum checksum,
Boolean reReadOnNul) |
private long |
readLines(FileChannel reader,
ByteBuffer buffer,
OutputStream out,
Checksum checksum,
Boolean reReadOnNul,
boolean readFully)
Read new lines from the given FileChannel, copying it to the given Output
Stream.
|
private boolean |
recoverRolledFiles(ProcessContext context,
ProcessSession session,
String tailFile,
List<File> rolledOffFiles,
Long expectedChecksum,
long position)
Finds any files that have rolled over and have not yet been ingested by
this Processor.
|
private boolean |
recoverRolledFiles(ProcessContext context,
ProcessSession session,
String tailFile,
Long expectedChecksum,
long timestamp,
long position)
Finds any files that have rolled over and have not yet been ingested by
this Processor.
|
void |
recoverState(ProcessContext context) |
private void |
recoverState(ProcessContext context,
List<String> filesToTail,
Map<String,String> map) |
private void |
recoverState(ProcessContext context,
Map<String,String> stateValues,
String filePath)
Updates member variables to reflect the "expected recovery checksum" and
seek to the appropriate location in the tailed file, updating our
checksum, so that we are ready to proceed with the
onTrigger(ProcessContext, ProcessSession) call. |
private void |
resetState(String filePath) |
private boolean |
tailRolledFile(ProcessContext context,
ProcessSession session,
String tailFile,
Long expectedChecksum,
long position,
TailFile.TailFileObject tfo,
File fileToTail,
boolean readFully,
boolean tailingPostRollover) |
onTrigger
getControllerServiceLookup, getIdentifier, getLogger, getNodeTypeProvider, init, initialize, isConfigurationRestored, isScheduled, toString, updateConfiguredRestoredTrue, updateScheduledFalse, updateScheduledTrue
equals, getPropertyDescriptor, getPropertyDescriptors, getSupportedDynamicPropertyDescriptor, hashCode, validate
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
getPropertyDescriptor, getPropertyDescriptors, validate
static final String MAP_PREFIX
private static final byte[] NEW_LINE_BYTES
static final AllowableValue LOCATION_LOCAL
static final AllowableValue LOCATION_REMOTE
static final AllowableValue MODE_SINGLEFILE
static final AllowableValue MODE_MULTIFILE
static final AllowableValue START_BEGINNING_OF_TIME
static final AllowableValue START_CURRENT_FILE
static final AllowableValue START_CURRENT_TIME
static final PropertyDescriptor BASE_DIRECTORY
static final PropertyDescriptor MODE
static final PropertyDescriptor FILENAME
static final PropertyDescriptor ROLLING_FILENAME_PATTERN
static final PropertyDescriptor POST_ROLLOVER_TAIL_PERIOD
static final PropertyDescriptor STATE_LOCATION
static final PropertyDescriptor START_POSITION
static final PropertyDescriptor RECURSIVE
static final PropertyDescriptor LOOKUP_FREQUENCY
static final PropertyDescriptor MAXIMUM_AGE
static final PropertyDescriptor REREAD_ON_NUL
static final PropertyDescriptor LINE_START_PATTERN
static final PropertyDescriptor MAX_BUFFER_LENGTH
static final Relationship REL_SUCCESS
private volatile Map<String,TailFile.TailFileObject> states
private volatile AtomicLong lastLookup
private volatile AtomicBoolean isMultiChanging
private volatile boolean requireStateLookup
private volatile ByteArrayOutputStream linesBuffer
private volatile Pattern lineStartPattern
private volatile long maxBufferBytes
protected List<PropertyDescriptor> getSupportedPropertyDescriptors()
getSupportedPropertyDescriptors
in class AbstractConfigurableComponent
public Set<Relationship> getRelationships()
getRelationships
in interface Processor
getRelationships
in class AbstractSessionFactoryProcessor
public void onPropertyModified(PropertyDescriptor descriptor, String oldValue, String newValue)
onPropertyModified
in interface ConfigurableComponent
onPropertyModified
in class AbstractConfigurableComponent
protected Collection<ValidationResult> customValidate(ValidationContext context)
customValidate
in class AbstractConfigurableComponent
@OnPrimaryNodeStateChange public void onPrimaryNodeChange()
private List<String> lookup(ProcessContext context)
@OnScheduled public void compileRegex(ProcessContext context)
@OnScheduled public void recoverState(ProcessContext context) throws IOException
IOException
private void initStates(List<String> filesToTail, Map<String,String> statesMap, boolean isCleared, String startPosition)
private void recoverState(ProcessContext context, List<String> filesToTail, Map<String,String> map) throws IOException
IOException
private List<String> getFilesToTail(String baseDir, String fileRegex, boolean isRecursive, long maxAge)
baseDir
- base directory to recursively look intofileRegex
- expression regular used to match files to tailisRecursive
- true if looking for file recursively, false otherwiseprivate void recoverState(ProcessContext context, Map<String,String> stateValues, String filePath) throws IOException
onTrigger(ProcessContext, ProcessSession)
call.context
- the ProcessContextstateValues
- the values that were recovered from state that was
previously stored. This Map should be populated with the keys defined in
TailFile.TailFileState.StateKeys
.filePath
- the file of the file for which state must be recoveredIOException
- if unable to seek to the appropriate location in the
tailed file.private void resetState(String filePath)
@OnStopped public void cleanup(ProcessContext context)
private void cleanReader(TailFile.TailFileObject tfo)
public void onTrigger(ProcessContext context, ProcessSession session) throws ProcessException
onTrigger
in class AbstractProcessor
ProcessException
private void processTailFile(ProcessContext context, ProcessSession session, String tailFile)
private long readLines(FileChannel reader, ByteBuffer buffer, OutputStream out, Checksum checksum, Boolean reReadOnNul) throws IOException
IOException
private long readLines(FileChannel reader, ByteBuffer buffer, OutputStream out, Checksum checksum, Boolean reReadOnNul, boolean readFully) throws IOException
reader
- The FileChannel to read data frombuffer
- the buffer to use for copying dataout
- the OutputStream to copy the data tochecksum
- the Checksum object to use in order to calculate checksum
for recovery purposesreReadOnNul
- If set to 'true', ASCII NUL characters will be treated as
temporary values and a NulCharacterEncounteredException is thrown.
This allows the caller to re-attempt a read from the same position.
If set to 'false' these characters will be treated as regular content.readFully
- If set to 'true' the last chunk of bytes after the last whole line
will be also written to the OutputStreamIOException
- if an I/O error occurs.private void flushByteArrayOutputStream(ByteArrayOutputStream baos, OutputStream out, Checksum checksum, boolean ignoreRegex) throws IOException
IOException
private void flushLinesBuffer(OutputStream out, Checksum checksum) throws IOException
IOException
private List<File> getRolledOffFiles(ProcessContext context, long minTimestamp, String tailFilePath) throws IOException
The List that is returned will be ordered by file timestamp, providing the oldest file first.
context
- the ProcessContext to use in order to determine Processor
configurationminTimestamp
- any file with a Last Modified Time before this
timestamp will not be returnedIOException
- if unable to perform the listing of filesprivate Scope getStateScope(ProcessContext context)
private void persistState(TailFile.TailFileObject tfo, ProcessSession session, ProcessContext context)
private void persistState(Map<String,String> state, ProcessSession session, ProcessContext context)
private FileChannel createReader(File file, long position)
Map<String,TailFile.TailFileObject> getState()
private boolean recoverRolledFiles(ProcessContext context, ProcessSession session, String tailFile, Long expectedChecksum, long timestamp, long position)
context
- the ProcessContext to use in order to obtain Processor
configuration.session
- the ProcessSession to use in order to interact with
FlowFile creation and content.expectedChecksum
- the checksum value that is expected for the
oldest file from offset 0 through <position>.timestamp
- the latest Last Modified Timestamp that has been
consumed. Any data that was written before this data will not be
ingested.position
- the byte offset in the file being tailed, where tailing
last left off.true
if the file being tailed has rolled over,
false
otherwiseprivate boolean recoverRolledFiles(ProcessContext context, ProcessSession session, String tailFile, List<File> rolledOffFiles, Long expectedChecksum, long position)
context
- the ProcessContext to use in order to obtain Processor
configuration.session
- the ProcessSession to use in order to interact with
FlowFile creation and content.expectedChecksum
- the checksum value that is expected for the
oldest file from offset 0 through <position>.position
- the byte offset in the file being tailed, where tailing
last left off.true
if the file being tailed has rolled over, false
otherwiseprivate boolean tailRolledFile(ProcessContext context, ProcessSession session, String tailFile, Long expectedChecksum, long position, TailFile.TailFileObject tfo, File fileToTail, boolean readFully, boolean tailingPostRollover) throws IOException
IOException
private TailFile.TailFileState consumeFileFully(File file, ProcessContext context, ProcessSession session, TailFile.TailFileObject tfo) throws IOException
file
- the file to ingestcontext
- the ProcessContextsession
- the ProcessSessiontfo
- the current stateIOException
public long getCurrentTimeMs()
Copyright © 2021 Apache NiFi Project. All rights reserved.