Class FastaReferenceWriter

java.lang.Object
htsjdk.samtools.reference.FastaReferenceWriter
All Implemented Interfaces:
AutoCloseable

public final class FastaReferenceWriter extends Object implements AutoCloseable
Writes a FASTA formatted reference file. In addition it can also compose the index and dictionary files for the newly written reference file.

Example:

 String[] seqNames = ...;
 byte[][] seqBases = ...;
 ...
 try (final FastaReferenceWriter writer = new FastaReferenceFileWriter(outputFile)) {
      for (int i = 0; i < seqNames.length; i++) {
          writer.startSequence(seqNames[i]).appendBases(seqBases[i]);
      }
 }
 

The two main operations that one can invoke on a opened writer is startSequence(java.lang.String) and appendBases(java.lang.String). The former indicates that we are going to append a new sequence to the output and is invoked once per sequence. The latter adds bases to the current sequence and can be called as many times as is needed.

The writer will make sure that the output adheres to the FASTA reference sequence file format restrictions:

  • Sequence names are valid (non-empty, without space/blank, control characters),
  • Sequence description are valid (without control characters),
  • Bases are valid nucleotides or IUPAC redundancy codes and X [ACGTNX...] (lower or uppercase are accepted),
  • Sequence cannot have 0 length,
  • And that each sequence can only appear once in the output

  • Field Details

    • DEFAULT_BASES_PER_LINE

      public static final int DEFAULT_BASES_PER_LINE
      Default number of bases per line.
      See Also:
    • HEADER_START_CHAR

      public static final char HEADER_START_CHAR
      Sequence header start character.
      See Also:
    • HEADER_NAME_AND_DESCRIPTION_SEPARATOR

      public static final char HEADER_NAME_AND_DESCRIPTION_SEPARATOR
      Character used to separate the sequence name and the description if any.
      See Also:
  • Method Details

    • startSequence

      public FastaReferenceWriter startSequence(String sequenceName) throws IOException
      Starts the input of the bases of a new sequence.

      This operation automatically closes the previous sequence base input if any.

      The sequence name cannot contain any blank characters (as determined by Character.isWhitespace(char)), control characters (as determined by Character.isISOControl(char)) or the the FASTA header start character '>'. It cannot be the empty string either ("").

      No description is included in the output.

      The input bases-per-line is set to the default provided at construction or DEFAULT_BASES_PER_LINE if none was provided.

      This method cannot be called after the writer has been closed.

      It also will fail if no base was added to the previous sequence if any.

      Parameters:
      sequenceName - the name of the new sequence.
      Returns:
      this instance.
      Throws:
      IllegalArgumentException - if any argument does not comply with requirements listed above or if a sequence with the same name has already been added to the writer.
      IllegalStateException - if no base was added to the previous sequence or the writer is already closed.
      IOException - if such exception is thrown when writing into the output resources.
    • startSequence

      public FastaReferenceWriter startSequence(String sequenceName, int basesPerLine) throws IOException
      Starts the input of the bases of a new sequence.

      This operation automatically closes the previous sequence base input if any.

      The sequence name cannot contain any blank characters (as determined by Character.isWhitespace(char)), control characters (as determined by Character.isISOControl(char)) or the the FASTA header start character '>'. It cannot be the empty string either ("").

      The input bases-per-line must be 1 or greater.

      This method cannot be called after the writer has been closed.

      It also will fail if no base was added to the previous sequence if any.

      Parameters:
      sequenceName - the name of the new sequence.
      basesPerLine - number of bases per line for this sequence.
      Returns:
      this instance.
      Throws:
      IllegalArgumentException - if any argument does not comply with requirements listed above or if a sequence with the same name has already been added to the writer.
      IllegalStateException - if no base was added to the previous sequence or the writer is already closed.
      IOException - if such exception is thrown when writing into the output resources.
    • startSequence

      public FastaReferenceWriter startSequence(String sequenceName, String description) throws IOException
      Starts the input of the bases of a new sequence.

      This operation automatically closes the previous sequence base input if any.

      The sequence name cannot contain any blank characters (as determined by Character.isWhitespace(char)), control characters (as determined by Character.isISOControl(char)) or the the FASTA header start character '>'. It cannot be the empty string either ("").

      The description cannot contain Character.isISOControl(char). If set to null or the empty string ("") no description will be outputted.

      The input bases-per-line is set to the default provided at construction or DEFAULT_BASES_PER_LINE if none was provided.

      This method cannot be called after the writer has been closed.

      It also will fail if no base was added to the previous sequence if any.

      Parameters:
      sequenceName - the name of the new sequence.
      description - optional description for that sequence.
      Returns:
      this instance.
      Throws:
      IllegalArgumentException - if any argument does not comply with requirements listed above or if a sequence with the same name has already been added to the writer.
      IllegalStateException - if no base was added to the previous sequence or the writer is already closed.
      IOException - if such exception is thrown when writing into the output resources.
    • startSequence

      public FastaReferenceWriter startSequence(String sequenceName, String description, int basesPerLine) throws IOException
      Starts the input of the bases of a new sequence.

      This operation automatically closes the previous sequence base input if any.

      The sequence name cannot contain any blank characters (as determined by Character.isWhitespace(char)), control characters (as determined by Character.isISOControl(char)) or the the FASTA header start character '>'. It cannot be the empty string either ("").

      The description cannot contain Character.isISOControl(char). If set to null or the empty string ("") no description will be outputted.

      The input bases-per-line must be 1 or greater.

      This method cannot be called after the writer has been closed.

      It also will fail if no base was added to the previous sequence if any.

      Parameters:
      sequenceName - the name of the new sequence.
      description - optional description for that sequence.
      basesPerLine - number of bases per line for this sequence.
      Returns:
      this instance.
      Throws:
      IllegalArgumentException - if any argument does not comply with requirements listed above.
      IllegalStateException - if no base was added to the previous sequence or the writer is already closed of the sequence has been already added.
      IOException - if such exception is thrown when writing into the output resources.
    • appendBases

      public FastaReferenceWriter appendBases(String basesBases) throws IOException
      Adds bases to current sequence from a byte array.
      Parameters:
      basesBases - String containing the bases to be added. string will be interpreted using ascii and will throw if any character is >= 127.
      Returns:
      this instance.
      Throws:
      IllegalArgumentException - if bases is null or the input array contains invalid bases (as assessed by: SequenceUtil.isIUPAC(byte)).
      IllegalStateException - if no sequence was started or the writer is already closed.
      IOException - if such exception is throw when writing in any of the outputs.
    • appendBases

      public FastaReferenceWriter appendBases(byte[] bases) throws IOException
      Adds bases to current sequence from a byte array. Will throw if any character is >= 127.
      Parameters:
      bases - array containing the bases to be added.
      Returns:
      this instance.
      Throws:
      IllegalArgumentException - if bases is null or the input array contains invalid bases (as assessed by: SequenceUtil.isIUPAC(byte)).
      IllegalStateException - if no sequence was started or the writer is already closed.
      IOException - if such exception is throw when writing in any of the outputs.
    • appendBases

      public FastaReferenceWriter appendBases(byte[] bases, int offset, int length) throws IOException
      Adds bases to current sequence from a range in a byte array. Will throw if any character is >= 127.
      Parameters:
      bases - array containing the bases to be added.
      offset - the position of the first base to add.
      length - how many bases to be added starting from position offset.
      Returns:
      this instance.
      Throws:
      IllegalArgumentException - if bases is null or offset and length do not entail a valid range in bases or that range in base contain invalid bases (as assessed by: SequenceUtil.isIUPAC(byte)).
      IllegalStateException - if no sequence was started or the writer is already closed.
      IOException - if such exception is throw when writing in any of the outputs.
    • addSequence

      public FastaReferenceWriter addSequence(ReferenceSequence sequence) throws IOException
      Appends a new sequence to the output.

      This is a convenient short handle for startSequence(name).appendBases(bases).

      The new sequence remains open meaning that additional bases for that sequence can be added with additional calls to appendBases(java.lang.String).

      Parameters:
      sequence - a ReferenceSequence to add.
      Returns:
      a reference to this very same writer.
      Throws:
      IOException - if such an exception is thrown when actually writing into the output streams/channels.
      IllegalArgumentException - if either name or bases is null or contains an invalid value (e.g. unsupported bases or sequence names).
      IllegalStateException - if the writer is already closed, a previous sequence (if any was opened) has no base appended to it or a sequence with such name was already appended to this writer.
    • appendSequence

      public FastaReferenceWriter appendSequence(String name, String description, byte[] bases) throws IOException
      Appends a new sequence to the output with or without a description.

      This is a convenient short handle for startSequence(name, description).appendBases(bases).

      A null or empty ("") description will be ignored (no description will be output).

      The new sequence remains open meaning that additional bases for that sequence can be added with additional calls to appendBases(java.lang.String).

      Parameters:
      name - the name of the new sequence.
      bases - the (first) bases of the sequence.
      description - the description for the new sequence.
      Returns:
      a reference to this very same writer.
      Throws:
      IOException - if such an exception is thrown when actually writing into the output streams/channels.
      IllegalArgumentException - if either name or bases is null or contains an invalid value (e.g. unsupported bases or sequence names). Also when the description contains unsupported characters.
      IllegalStateException - if the writer is already closed, a previous sequence (if any was opened) has no base appended to it or a sequence with such name was already appended to this writer.
    • appendSequence

      public FastaReferenceWriter appendSequence(String name, String description, int basesPerLine, byte[] bases) throws IOException
      Appends a new sequence to the output with or without a description and an alternative number of bases-per-line.

      This is a convenient short handle for startSequence(name, description, bpl).appendBases(bases).

      A null or empty ("") description will be ignored (no description will be output).

      The new sequence remains open meaning that additional bases for that sequence can be added with additional calls to appendBases(java.lang.String).

      Parameters:
      name - the name of the new sequence.
      bases - the (first) bases of the sequence.
      description - the description for the sequence.
      basesPerLine - alternative number of bases per line to be used for the sequence.
      Returns:
      a reference to this very same writer.
      Throws:
      IOException - if such an exception is thrown when actually writing into the output streams/channels.
      IllegalArgumentException - if either name or bases is null or contains an invalid value (e.g. unsupported bases or sequence names). Also when the description contains unsupported characters or basesPerLine is 0 or negative.
      IllegalStateException - if the writer is already closed, a previous sequence (if any was opened) has no base appended to it or a sequence with such name was already appended to this writer.
    • close

      public void close() throws IOException
      Closes this writer flushing all remaining writing operation input the output resources.

      Further calls to appendBases(java.lang.String) or startSequence(java.lang.String) will result in an exception.

      Specified by:
      close in interface AutoCloseable
      Throws:
      IOException - if such exception is thrown when closing output writers and output streams.
      IllegalStateException - if closing without writing any sequences or closing when writing a sequence is in progress
    • writeSingleSequenceReference

      public static void writeSingleSequenceReference(Path whereTo, boolean makeIndex, boolean makeDict, String name, String description, byte[] bases) throws IOException
      Convenient method to write a FASTA file with a single sequence.
      Parameters:
      whereTo - the path to. must not be null.
      makeIndex - whether the index file should be written at its standard location.
      makeDict - whether the dictionary file should be written at it standard location.
      name - the sequence name, cannot contain white space, or control chracter or the header start character.
      description - the sequence description, can be null or "" if no description.
      bases - the sequence bases, cannot be null.
      Throws:
      IOException - if such exception is thrown when writing in the output resources.
    • writeSingleSequenceReference

      public static void writeSingleSequenceReference(Path whereTo, int basesPerLine, boolean makeIndex, boolean makeDict, String name, String description, byte[] bases) throws IOException
      Convenient method to write a FASTA file with a single sequence.
      Parameters:
      whereTo - the path to. must not be null.
      basesPerLine - number of bases per line. must be 1 or greater.
      makeIndex - whether the index file should be written at its standard location.
      makeDict - whether the dictionary file should be written at it standard location.
      name - the sequence name, cannot contain white space, or control chracter or the header start character.
      description - the sequence description, can be null or "" if no description.
      bases - the sequence bases, cannot be null.
      Throws:
      IOException - if such exception is thrown when writing in the output resources.