T
- type of data that will be streamed to the Python processpublic class StreamingPythonScriptExecutor<T> extends PythonExecutorBase
tool
module in gatktool
Python package. These include functions
for managing an acknowledgement FIFO that is used to signal completion of Python commands, and a data FIFO that
can be used to stream data to Python.
- construct the executor
- start the remote process (start(java.util.List<java.lang.String>)
.
- optionally call #getStreamWriter
to initialize and create a data transfer fifo.
- send one or more synchronous or asynchronous commands to be executed in Python
- optionally send data one or more times of type through the async writer
- execute python code to close the data fifo
- terminate the executor terminate()
Guidelines for writing GATK tools that use Python interactively:
- Program correctness should not rely on consumption of anything written by Python to stdout/stderr. All
data should be transferred through the stream writer or a file.
- Python code should write errors to stderr.
- Prefer single line commands that run a script, vs. multi-line Python code embedded in Java
- Terminate commands with a newline.
- Try not to be chatty (maximize use of the fifo buffer by writing to it in batches before reading from Python)PythonExecutorBase.PythonExecutableName
PYTHON_EXTENSION
externalScriptExecutableName, ignoreExceptions
Constructor and Description |
---|
StreamingPythonScriptExecutor(boolean ensureExecutableExists)
The start method must be called to actually start the remote executable.
|
StreamingPythonScriptExecutor(PythonExecutorBase.PythonExecutableName pythonExecutableName,
boolean ensureExecutableExists)
The start method must be called to actually start the remote executable.
|
Modifier and Type | Method and Description |
---|---|
ProcessOutput |
getAccumulatedOutput()
Return all data accumulated since the last call to
getAccumulatedOutput() (either directly, or
indirectly through sendSynchronousCommand(java.lang.String) . |
java.lang.String |
getApproximateCommandLine()
/**
Return a (not necessarily executable) string representing the current command line for this executor
for error reporting purposes.
|
protected java.lang.Process |
getProcess()
Get the Process object associated with this executor.
|
void |
initStreamWriter(java.util.function.Function<T,java.io.ByteArrayOutputStream> itemSerializer)
Obtain a stream writer that serializes and writes batches of items of type
T on a background thread. |
void |
sendAsynchronousCommand(java.lang.String line)
Send a command to the remote process without waiting for a response.
|
ProcessOutput |
sendSynchronousCommand(java.lang.String line)
Send a command to Python, and wait for an ack, returning all accumulated output
since the last call to either or
|
boolean |
start(java.util.List<java.lang.String> pythonProcessArgs)
Start the Python process.
|
boolean |
start(java.util.List<java.lang.String> pythonProcessArgs,
boolean enableJournaling,
java.io.File profileResults)
Start the Python process.
|
void |
startBatchWrite(java.lang.String pythonCommand,
java.util.List<T> batchList)
Request that a batch of items be written to the stream on a background thread.
|
void |
terminate()
Terminate the remote process, closing the fifo if any.
|
ProcessOutput |
waitForAck()
Wait for an acknowledgement (which must have been previously requested).
|
java.util.concurrent.Future<java.lang.Integer> |
waitForPreviousBatchCompletion()
Waits for a batch that was previously initiated via
startBatchWrite(String, List) }
to complete, flushes the target stream and returns the corresponding completed Future. |
getScriptException
executableMissing, executeCuratedArgs, externalExecutableExists, setIgnoreExceptions
public StreamingPythonScriptExecutor(boolean ensureExecutableExists)
ensureExecutableExists
- throw if the python executable cannot be locatedpublic StreamingPythonScriptExecutor(PythonExecutorBase.PythonExecutableName pythonExecutableName, boolean ensureExecutableExists)
pythonExecutableName
- name of the python executable to startensureExecutableExists
- throw if the python executable cannot be foundpublic boolean start(java.util.List<java.lang.String> pythonProcessArgs)
pythonProcessArgs
- args to be passed to the python processpublic boolean start(java.util.List<java.lang.String> pythonProcessArgs, boolean enableJournaling, java.io.File profileResults)
pythonProcessArgs
- args to be passed to the python processenableJournaling
- true to enable Journaling, which records all interprocess IO to a file. This is
expensive and should only be used for debugging purposes.public ProcessOutput sendSynchronousCommand(java.lang.String line)
line
- data to be sent to the remote processUserException
- if a timeout occurspublic void sendAsynchronousCommand(java.lang.String line)
line
- data to send to the remote processpublic ProcessOutput waitForAck()
ProcessOutput
when positive acknowledgement (ack) has been received, otherwise throwsPythonScriptExecutorException
- if nck was receivedpublic java.lang.String getApproximateCommandLine()
getApproximateCommandLine
in class PythonExecutorBase
public void initStreamWriter(java.util.function.Function<T,java.io.ByteArrayOutputStream> itemSerializer)
T
on a background thread.itemSerializer
- Function
that accepts items of type T
and converts them to a
ByteArrayOutputStream
that is subsequently written to the streampublic void startBatchWrite(java.lang.String pythonCommand, java.util.List<T> batchList)
waitForPreviousBatchCompletion()
.pythonCommand
- command that will be executed asynchronously to cconsume the data written to the streambatchList
- a list of items to be writtenpublic java.util.concurrent.Future<java.lang.Integer> waitForPreviousBatchCompletion()
startBatchWrite(String, List)
}
to complete, flushes the target stream and returns the corresponding completed Future. The Future representing
a given batch can only be obtained via this method once. If no work is outstanding, and/or the previous batch
has already been retrieved, null is returned.protected java.lang.Process getProcess()
public void terminate()
public ProcessOutput getAccumulatedOutput()
getAccumulatedOutput()
(either directly, or
indirectly through sendSynchronousCommand(java.lang.String)
.
Note that the output returned is somewhat non-deterministic, in that there is no guaranty that all of
the output from the previous command has been flushed at the time this call is made.UserException
- if a timeout occurs waiting for outputPythonScriptExecutorException
- if a traceback is detected in the output