CLUSTER_OUTPUT_RECORD
- The type of record that this converter will convert to.public abstract class BasecallsConverter<CLUSTER_OUTPUT_RECORD>
extends java.lang.Object
The underlying IlluminaDataProvider apply several optional transformations that can include EAMSS filtering, non-PF read filtering and quality score recoding using a BclQualityEvaluationStrategy.
The converter can also limit the scope of data that is converted from the data provider by setting the tile to start on (firstTile) and the total number of tiles to process (tileLimit).
Additionally, BasecallsConverter can optionally demultiplex reads by outputting barcode specific reads to their associated writers..
Modifier and Type | Class and Description |
---|---|
protected static interface |
BasecallsConverter.ClusterDataConverter<OUTPUT_RECORD>
Interface that defines a converter that takes ClusterData and returns OUTPUT_RECORD type objects.
|
protected class |
BasecallsConverter.CompletedWorkChecker
CompletedWorkChecker is notified by the TileProcessor threads as work on a tile is complete and the
records are ready for writing.
|
protected static interface |
BasecallsConverter.ConvertedClusterDataWriter<OUTPUT_RECORD>
Interface that defines a writer that will write out OUTPUT_RECORD type objects.
|
Modifier and Type | Field and Description |
---|---|
protected java.util.Map<java.lang.String,? extends BasecallsConverter.ConvertedClusterDataWriter<CLUSTER_OUTPUT_RECORD>> |
barcodeRecordWriterMap |
protected java.util.Map<java.lang.Integer,java.util.List<? extends java.lang.Runnable>> |
completedWork |
protected ThreadPoolExecutorWithExceptions |
completedWorkExecutor |
protected BasecallsConverter.ClusterDataConverter<CLUSTER_OUTPUT_RECORD> |
converter |
static java.util.Set<IlluminaDataType> |
DATA_TYPES_WITH_BARCODE |
static java.util.Set<IlluminaDataType> |
DATA_TYPES_WITHOUT_BARCODE |
protected boolean |
demultiplex |
protected IlluminaDataProviderFactory |
factory |
protected boolean |
ignoreUnexpectedBarcodes |
protected boolean |
includeNonPfReads |
protected static htsjdk.samtools.util.Log |
log |
protected int |
numThreads |
protected htsjdk.samtools.util.ProgressLogger |
readProgressLogger |
static java.util.Comparator<java.lang.Integer> |
TILE_NUMBER_COMPARATOR
A comparator used to sort Illumina tiles in their proper order.
|
protected boolean |
tileProcessingComplete |
protected ThreadPoolExecutorWithExceptions |
tileReadExecutor |
protected java.util.List<java.lang.Integer> |
tiles |
protected ThreadPoolExecutorWithExceptions |
tileWriteExecutor |
protected htsjdk.samtools.util.ProgressLogger |
writeProgressLogger |
Constructor and Description |
---|
BasecallsConverter(java.io.File basecallsDir,
java.io.File barcodesDir,
int lane,
ReadStructure readStructure,
java.util.Map<java.lang.String,? extends BasecallsConverter.ConvertedClusterDataWriter<CLUSTER_OUTPUT_RECORD>> barcodeRecordWriterMap,
boolean demultiplex,
int numThreads,
java.lang.Integer firstTile,
java.lang.Integer tileLimit,
BclQualityEvaluationStrategy bclQualityEvaluationStrategy,
boolean ignoreUnexpectedBarcodes,
boolean applyEamssFiltering,
boolean includeNonPfReads,
int numWriteThreads)
Constructs a new BasecallsConverter object.
|
Modifier and Type | Method and Description |
---|---|
protected void |
awaitTileProcessingCompletion() |
protected static java.util.Set<IlluminaDataType> |
getDataTypesFromReadStructure(ReadStructure readStructure,
boolean demultiplex)
Given a read structure return the data types that need to be parsed for this run
|
protected IlluminaDataProviderFactory |
getFactory()
Gets the data provider factory used to create the underlying data provider.
|
static java.io.File[] |
getTiledFiles(java.io.File baseDirectory,
java.util.regex.Pattern pattern)
Applies an lane and tile based regex to return all files matching that regex for each tile.
|
protected void |
interruptAndShutdownExecutors(ThreadPoolExecutorWithExceptions... executors) |
protected void |
notifyWorkComplete(int tileNum,
java.util.List<? extends java.lang.Runnable> pumpList) |
abstract void |
processTilesAndWritePerSampleOutputs(java.util.Set<java.lang.String> barcodes)
Abstract method for processing tiles of data and outputting records for each barcode.
|
protected void |
setConverter(BasecallsConverter.ClusterDataConverter<CLUSTER_OUTPUT_RECORD> converter)
Must be called before doTileProcessing.
|
protected void |
setTileLimits(java.lang.Integer firstTile,
java.lang.Integer tileLimit)
Uses the firstTile and tileLimit parameters to set which tiles will be processed.
|
public static final java.util.Set<IlluminaDataType> DATA_TYPES_WITH_BARCODE
public static final java.util.Set<IlluminaDataType> DATA_TYPES_WITHOUT_BARCODE
protected static final htsjdk.samtools.util.Log log
protected final IlluminaDataProviderFactory factory
protected final boolean demultiplex
protected final boolean ignoreUnexpectedBarcodes
protected final java.util.Map<java.lang.String,? extends BasecallsConverter.ConvertedClusterDataWriter<CLUSTER_OUTPUT_RECORD>> barcodeRecordWriterMap
protected final boolean includeNonPfReads
protected final int numThreads
protected final htsjdk.samtools.util.ProgressLogger readProgressLogger
protected final htsjdk.samtools.util.ProgressLogger writeProgressLogger
protected final java.util.Map<java.lang.Integer,java.util.List<? extends java.lang.Runnable>> completedWork
protected final ThreadPoolExecutorWithExceptions tileWriteExecutor
protected final ThreadPoolExecutorWithExceptions tileReadExecutor
protected final ThreadPoolExecutorWithExceptions completedWorkExecutor
protected BasecallsConverter.ClusterDataConverter<CLUSTER_OUTPUT_RECORD> converter
protected java.util.List<java.lang.Integer> tiles
protected boolean tileProcessingComplete
public static final java.util.Comparator<java.lang.Integer> TILE_NUMBER_COMPARATOR
public BasecallsConverter(java.io.File basecallsDir, java.io.File barcodesDir, int lane, ReadStructure readStructure, java.util.Map<java.lang.String,? extends BasecallsConverter.ConvertedClusterDataWriter<CLUSTER_OUTPUT_RECORD>> barcodeRecordWriterMap, boolean demultiplex, int numThreads, java.lang.Integer firstTile, java.lang.Integer tileLimit, BclQualityEvaluationStrategy bclQualityEvaluationStrategy, boolean ignoreUnexpectedBarcodes, boolean applyEamssFiltering, boolean includeNonPfReads, int numWriteThreads)
basecallsDir
- Where to read basecalls from.barcodesDir
- Where to read barcodes from (optional; use basecallsDir if not specified).lane
- What lane to process.readStructure
- How to interpret each cluster.barcodeRecordWriterMap
- Map from barcode to CLUSTER_OUTPUT_RECORD writer. If demultiplex is false, must contain
one writer stored with key=null.demultiplex
- If true, output is split by barcode, otherwise all are written to the same output stream.numThreads
- Controls number of threads.firstTile
- (For debugging) If non-null, start processing at this tile.tileLimit
- (For debugging) If non-null, process no more than this many tiles.bclQualityEvaluationStrategy
- The basecall quality evaluation strategy that is applyed to decoded base calls.ignoreUnexpectedBarcodes
- If true, will ignore reads whose called barcode is not found in barcodeRecordWriterMap.applyEamssFiltering
- If true, apply EAMSS filtering if parsing BCLs for bases and quality scores.includeNonPfReads
- If true, will include ALL reads (including those which do not have PF set).
This option does nothing for instruments that output cbcls (Novaseqs)public abstract void processTilesAndWritePerSampleOutputs(java.util.Set<java.lang.String> barcodes)
barcodes
- The barcodes used optionally for demultiplexing. Must contain at least a single null value if
no demultiplexing is being done.protected void awaitTileProcessingCompletion()
protected void notifyWorkComplete(int tileNum, java.util.List<? extends java.lang.Runnable> pumpList)
public static java.io.File[] getTiledFiles(java.io.File baseDirectory, java.util.regex.Pattern pattern)
baseDirectory
- The directory to search for tiled files.pattern
- The pattern used to match files.protected static java.util.Set<IlluminaDataType> getDataTypesFromReadStructure(ReadStructure readStructure, boolean demultiplex)
readStructure
- The read structure that defines how the read is set up.demultiplex
- If true, output is split by barcode, otherwise all are written to the same output stream.protected IlluminaDataProviderFactory getFactory()
protected void setConverter(BasecallsConverter.ClusterDataConverter<CLUSTER_OUTPUT_RECORD> converter)
converter
- Converts ClusterData to CLUSTER_OUTPUT_RECORDprotected void setTileLimits(java.lang.Integer firstTile, java.lang.Integer tileLimit)
firstTile
- The tile to begin processing at.tileLimit
- The maximum number of tiles to process.protected void interruptAndShutdownExecutors(ThreadPoolExecutorWithExceptions... executors)