Class StreamingPythonScriptExecutor<T>
java.lang.Object
org.broadinstitute.hellbender.utils.runtime.ScriptExecutor
org.broadinstitute.hellbender.utils.python.PythonExecutorBase
org.broadinstitute.hellbender.utils.python.StreamingPythonScriptExecutor<T>
- Type Parameters:
T
- type of data that will be streamed to the Python process
Python executor used to interact with a cooperative, keep-alive Python process. The executor issues commands
to call Python functions in the
tool
module in gatktool
Python package. These include functions
for managing an acknowledgement FIFO that is used to signal completion of Python commands, and a data FIFO that
can be used to stream data to Python.
- construct the executor
- start the remote process (start(java.util.List<java.lang.String>)
.
- optionally call #getStreamWriter
to initialize and create a data transfer fifo.
- send one or more synchronous or asynchronous commands to be executed in Python
- optionally send data one or more times of type through the async writer
- execute python code to close the data fifo
- terminate the executor terminate()
Guidelines for writing GATK tools that use Python interactively:
- Program correctness should not rely on consumption of anything written by Python to stdout/stderr. All
data should be transferred through the stream writer or a file.
- Python code should write errors to stderr.
- Prefer single line commands that run a script, vs. multi-line Python code embedded in Java
- Terminate commands with a newline.
- Try not to be chatty (maximize use of the fifo buffer by writing to it in batches before reading from Python)-
Nested Class Summary
Nested classes/interfaces inherited from class org.broadinstitute.hellbender.utils.python.PythonExecutorBase
PythonExecutorBase.PythonExecutableName
-
Field Summary
Fields inherited from class org.broadinstitute.hellbender.utils.python.PythonExecutorBase
PYTHON_EXTENSION
Fields inherited from class org.broadinstitute.hellbender.utils.runtime.ScriptExecutor
externalScriptExecutableName, ignoreExceptions
-
Constructor Summary
ConstructorsConstructorDescriptionStreamingPythonScriptExecutor
(boolean ensureExecutableExists) The start method must be called to actually start the remote executable.StreamingPythonScriptExecutor
(PythonExecutorBase.PythonExecutableName pythonExecutableName, boolean ensureExecutableExists) The start method must be called to actually start the remote executable. -
Method Summary
Modifier and TypeMethodDescriptionReturn all data accumulated since the last call togetAccumulatedOutput()
(either directly, or indirectly throughsendSynchronousCommand(java.lang.String)
./** Return a (not necessarily executable) string representing the current command line for this executor for error reporting purposes.protected Process
Get the Process object associated with this executor.void
initStreamWriter
(Function<T, ByteArrayOutputStream> itemSerializer) Obtain a stream writer that serializes and writes batches of items of typeT
on a background thread.void
Send a command to the remote process without waiting for a response.sendSynchronousCommand
(String line) Send a command to Python, and wait for an ack, returning all accumulated output since the last call to either <link #sendSynchronousCommand/> or <line #getAccumulatedOutput/> This is a blocking call - if no acknowledgment is received from the remote process, it will block indefinitely.boolean
Start the Python process.boolean
Start the Python process.void
startBatchWrite
(String pythonCommand, List<T> batchList) Request that a batch of items be written to the stream on a background thread.void
Terminate the remote process, closing the fifo if any.Wait for an acknowledgement (which must have been previously requested).Waits for a batch that was previously initiated viastartBatchWrite(String, List)
} to complete, flushes the target stream and returns the corresponding completed Future.Methods inherited from class org.broadinstitute.hellbender.utils.python.PythonExecutorBase
getScriptException
Methods inherited from class org.broadinstitute.hellbender.utils.runtime.ScriptExecutor
executableMissing, executeCuratedArgs, executeCuratedArgsAndGetOutput, externalExecutableExists, getExceptionMessageFromScriptError, setIgnoreExceptions
-
Constructor Details
-
StreamingPythonScriptExecutor
public StreamingPythonScriptExecutor(boolean ensureExecutableExists) The start method must be called to actually start the remote executable.- Parameters:
ensureExecutableExists
- throw if the python executable cannot be located
-
StreamingPythonScriptExecutor
public StreamingPythonScriptExecutor(PythonExecutorBase.PythonExecutableName pythonExecutableName, boolean ensureExecutableExists) The start method must be called to actually start the remote executable.- Parameters:
pythonExecutableName
- name of the python executable to startensureExecutableExists
- throw if the python executable cannot be found
-
-
Method Details
-
start
Start the Python process.- Parameters:
pythonProcessArgs
- args to be passed to the python process- Returns:
- true if the process is successfully started
-
start
Start the Python process.- Parameters:
pythonProcessArgs
- args to be passed to the python processenableJournaling
- true to enable Journaling, which records all interprocess IO to a file. This is expensive and should only be used for debugging purposes.- Returns:
- true if the process is successfully started
-
sendSynchronousCommand
Send a command to Python, and wait for an ack, returning all accumulated output since the last call to either <link #sendSynchronousCommand/> or <line #getAccumulatedOutput/> This is a blocking call - if no acknowledgment is received from the remote process, it will block indefinitely. If an exception is raised in the Python code, or a negative acknowledgment is received, an PythonScriptExecutorException will be thrown. The caller is required to terminate commands with the correct number of newline(s) as appropriate for the command being issued. Since white space is significant in Python, failure to do so properly can leave the Python parser blocked waiting for more newlines to terminate indented code blocks.- Parameters:
line
- data to be sent to the remote process- Returns:
- ProcessOutput
- Throws:
UserException
- if a timeout occurs
-
sendAsynchronousCommand
Send a command to the remote process without waiting for a response. This method should only be used for responses that will block the remote process. NOTE: Before executing further synchronous statements after calling this method, getAccumulatedOutput should be called to enforce a synchronization point. The caller is required to terminate commands with the correct number of newline(s) as appropriate for the command being issued. Since white space is significant in Python, failure to do so properly can leave the Python parser blocked waiting for more newlines to terminate indented code blocks.- Parameters:
line
- data to send to the remote process
-
waitForAck
Wait for an acknowledgement (which must have been previously requested).- Returns:
ProcessOutput
when positive acknowledgement (ack) has been received, otherwise throws- Throws:
PythonScriptExecutorException
- if nck was received
-
getApproximateCommandLine
/** Return a (not necessarily executable) string representing the current command line for this executor for error reporting purposes.- Specified by:
getApproximateCommandLine
in classPythonExecutorBase
- Returns:
- A string representing the command line used for this executor.
-
initStreamWriter
Obtain a stream writer that serializes and writes batches of items of typeT
on a background thread.- Parameters:
itemSerializer
-Function
that accepts items of typeT
and converts them to aByteArrayOutputStream
that is subsequently written to the stream
-
startBatchWrite
Request that a batch of items be written to the stream on a background thread. Any previously requested batch must have already been completed and retrieved viawaitForPreviousBatchCompletion()
.- Parameters:
pythonCommand
- command that will be executed asynchronously to cconsume the data written to the streambatchList
- a list of items to be written
-
waitForPreviousBatchCompletion
Waits for a batch that was previously initiated viastartBatchWrite(String, List)
} to complete, flushes the target stream and returns the corresponding completed Future. The Future representing a given batch can only be obtained via this method once. If no work is outstanding, and/or the previous batch has already been retrieved, null is returned.- Returns:
- returns null if no previous work to complete, otherwise a completed Future
-
getProcess
Get the Process object associated with this executor. For testing only.- Returns:
-
terminate
public void terminate()Terminate the remote process, closing the fifo if any. -
getAccumulatedOutput
Return all data accumulated since the last call togetAccumulatedOutput()
(either directly, or indirectly throughsendSynchronousCommand(java.lang.String)
. Note that the output returned is somewhat non-deterministic, in that there is no guaranty that all of the output from the previous command has been flushed at the time this call is made.- Returns:
- ProcessOutput containing all accumulated output from stdout/stderr
- Throws:
UserException
- if a timeout occurs waiting for outputPythonScriptExecutorException
- if a traceback is detected in the output
-