Class FastaReferenceWriter
- java.lang.Object
-
- htsjdk.samtools.reference.FastaReferenceWriter
-
- All Implemented Interfaces:
AutoCloseable
public final class FastaReferenceWriter extends Object implements AutoCloseable
Writes a FASTA formatted reference file. In addition it can also compose the index and dictionary files for the newly written reference file.Example:
String[] seqNames = ...; byte[][] seqBases = ...; ... try (final FastaReferenceWriter writer = new FastaReferenceFileWriter(outputFile)) { for (int i = 0; i < seqNames.length; i++) { writer.startSequence(seqNames[i]).appendBases(seqBases[i]); } }
The two main operations that one can invoke on a opened writer is
startSequence(java.lang.String)
andappendBases(java.lang.String)
. The former indicates that we are going to append a new sequence to the output and is invoked once per sequence. The latter adds bases to the current sequence and can be called as many times as is needed.The writer will make sure that the output adheres to the FASTA reference sequence file format restrictions:
- Sequence names are valid (non-empty, without space/blank, control characters),
- Sequence description are valid (without control characters),
- Bases are valid nucleotides or IUPAC redundancy codes and X [ACGTNX...] (lower or uppercase are accepted),
- Sequence cannot have 0 length,
- And that each sequence can only appear once in the output
-
-
Field Summary
Fields Modifier and Type Field Description static int
DEFAULT_BASES_PER_LINE
Default number of bases per line.static char
HEADER_NAME_AND_DESCRIPTION_SEPARATOR
Character used to separate the sequence name and the description if any.static char
HEADER_START_CHAR
Sequence header start character.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description FastaReferenceWriter
addSequence(ReferenceSequence sequence)
Appends a new sequence to the output.FastaReferenceWriter
appendBases(byte[] bases)
Adds bases to current sequence from abyte
array.FastaReferenceWriter
appendBases(byte[] bases, int offset, int length)
Adds bases to current sequence from a range in abyte
array.FastaReferenceWriter
appendBases(String basesBases)
Adds bases to current sequence from abyte
array.FastaReferenceWriter
appendSequence(String name, String description, byte[] bases)
Appends a new sequence to the output with or without a description.FastaReferenceWriter
appendSequence(String name, String description, int basesPerLine, byte[] bases)
Appends a new sequence to the output with or without a description and an alternative number of bases-per-line.void
close()
Closes this writer flushing all remaining writing operation input the output resources.FastaReferenceWriter
startSequence(String sequenceName)
Starts the input of the bases of a new sequence.FastaReferenceWriter
startSequence(String sequenceName, int basesPerLine)
Starts the input of the bases of a new sequence.FastaReferenceWriter
startSequence(String sequenceName, String description)
Starts the input of the bases of a new sequence.FastaReferenceWriter
startSequence(String sequenceName, String description, int basesPerLine)
Starts the input of the bases of a new sequence.static void
writeSingleSequenceReference(Path whereTo, boolean makeIndex, boolean makeDict, String name, String description, byte[] bases)
Convenient method to write a FASTA file with a single sequence.static void
writeSingleSequenceReference(Path whereTo, int basesPerLine, boolean makeIndex, boolean makeDict, String name, String description, byte[] bases)
Convenient method to write a FASTA file with a single sequence.
-
-
-
Field Detail
-
DEFAULT_BASES_PER_LINE
public static final int DEFAULT_BASES_PER_LINE
Default number of bases per line.- See Also:
- Constant Field Values
-
HEADER_START_CHAR
public static final char HEADER_START_CHAR
Sequence header start character.- See Also:
- Constant Field Values
-
HEADER_NAME_AND_DESCRIPTION_SEPARATOR
public static final char HEADER_NAME_AND_DESCRIPTION_SEPARATOR
Character used to separate the sequence name and the description if any.- See Also:
- Constant Field Values
-
-
Method Detail
-
startSequence
public FastaReferenceWriter startSequence(String sequenceName) throws IOException
Starts the input of the bases of a new sequence.This operation automatically closes the previous sequence base input if any.
The sequence name cannot contain any blank characters (as determined by
Character.isWhitespace(char)
), control characters (as determined byCharacter.isISOControl(char)
) or the the FASTA header start character 62. It cannot be the empty string either ("").No description is included in the output.
The input bases-per-line is set to the default provided at construction or
DEFAULT_BASES_PER_LINE
if none was provided.This method cannot be called after the writer has been closed.
It also will fail if no base was added to the previous sequence if any.
- Parameters:
sequenceName
- the name of the new sequence.- Returns:
- this instance.
- Throws:
IllegalArgumentException
- if any argument does not comply with requirements listed above or if a sequence with the same name has already been added to the writer.IllegalStateException
- if no base was added to the previous sequence or the writer is already closed.IOException
- if such exception is thrown when writing into the output resources.
-
startSequence
public FastaReferenceWriter startSequence(String sequenceName, int basesPerLine) throws IOException
Starts the input of the bases of a new sequence.This operation automatically closes the previous sequence base input if any.
The sequence name cannot contain any blank characters (as determined by
Character.isWhitespace(char)
), control characters (as determined byCharacter.isISOControl(char)
) or the the FASTA header start character 62. It cannot be the empty string either ("").The input bases-per-line must be 1 or greater.
This method cannot be called after the writer has been closed.
It also will fail if no base was added to the previous sequence if any.
- Parameters:
sequenceName
- the name of the new sequence.basesPerLine
- number of bases per line for this sequence.- Returns:
- this instance.
- Throws:
IllegalArgumentException
- if any argument does not comply with requirements listed above or if a sequence with the same name has already been added to the writer.IllegalStateException
- if no base was added to the previous sequence or the writer is already closed.IOException
- if such exception is thrown when writing into the output resources.
-
startSequence
public FastaReferenceWriter startSequence(String sequenceName, String description) throws IOException
Starts the input of the bases of a new sequence.This operation automatically closes the previous sequence base input if any.
The sequence name cannot contain any blank characters (as determined by
Character.isWhitespace(char)
), control characters (as determined byCharacter.isISOControl(char)
) or the the FASTA header start character 62. It cannot be the empty string either ("").The description cannot contain
Character.isISOControl(char)
. If set tonull
or the empty string ("") no description will be outputted.The input bases-per-line is set to the default provided at construction or
DEFAULT_BASES_PER_LINE
if none was provided.This method cannot be called after the writer has been closed.
It also will fail if no base was added to the previous sequence if any.
- Parameters:
sequenceName
- the name of the new sequence.description
- optional description for that sequence.- Returns:
- this instance.
- Throws:
IllegalArgumentException
- if any argument does not comply with requirements listed above or if a sequence with the same name has already been added to the writer.IllegalStateException
- if no base was added to the previous sequence or the writer is already closed.IOException
- if such exception is thrown when writing into the output resources.
-
startSequence
public FastaReferenceWriter startSequence(String sequenceName, String description, int basesPerLine) throws IOException
Starts the input of the bases of a new sequence.This operation automatically closes the previous sequence base input if any.
The sequence name cannot contain any blank characters (as determined by
Character.isWhitespace(char)
), control characters (as determined byCharacter.isISOControl(char)
) or the the FASTA header start character 62. It cannot be the empty string either ("").The description cannot contain
Character.isISOControl(char)
. If set tonull
or the empty string ("") no description will be outputted.The input bases-per-line must be 1 or greater.
This method cannot be called after the writer has been closed.
It also will fail if no base was added to the previous sequence if any.
- Parameters:
sequenceName
- the name of the new sequence.description
- optional description for that sequence.basesPerLine
- number of bases per line for this sequence.- Returns:
- this instance.
- Throws:
IllegalArgumentException
- if any argument does not comply with requirements listed above.IllegalStateException
- if no base was added to the previous sequence or the writer is already closed of the sequence has been already added.IOException
- if such exception is thrown when writing into the output resources.
-
appendBases
public FastaReferenceWriter appendBases(String basesBases) throws IOException
Adds bases to current sequence from abyte
array.- Parameters:
basesBases
- String containing the bases to be added. string will be interpreted using ascii and will throw if any character is >= 127.- Returns:
- this instance.
- Throws:
IllegalArgumentException
- ifbases
isnull
or the input array contains invalid bases (as assessed by:SequenceUtil.isIUPAC(byte)
).IllegalStateException
- if no sequence was started or the writer is already closed.IOException
- if such exception is throw when writing in any of the outputs.
-
appendBases
public FastaReferenceWriter appendBases(byte[] bases) throws IOException
Adds bases to current sequence from abyte
array. Will throw if any character is >= 127.- Parameters:
bases
- array containing the bases to be added.- Returns:
- this instance.
- Throws:
IllegalArgumentException
- ifbases
isnull
or the input array contains invalid bases (as assessed by:SequenceUtil.isIUPAC(byte)
).IllegalStateException
- if no sequence was started or the writer is already closed.IOException
- if such exception is throw when writing in any of the outputs.
-
appendBases
public FastaReferenceWriter appendBases(byte[] bases, int offset, int length) throws IOException
Adds bases to current sequence from a range in abyte
array. Will throw if any character is >= 127.- Parameters:
bases
- array containing the bases to be added.offset
- the position of the first base to add.length
- how many bases to be added starting from positionoffset
.- Returns:
- this instance.
- Throws:
IllegalArgumentException
- ifbases
isnull
oroffset
andlength
do not entail a valid range inbases
or that range inbase
contain invalid bases (as assessed by:SequenceUtil.isIUPAC(byte)
).IllegalStateException
- if no sequence was started or the writer is already closed.IOException
- if such exception is throw when writing in any of the outputs.
-
addSequence
public FastaReferenceWriter addSequence(ReferenceSequence sequence) throws IOException
Appends a new sequence to the output.This is a convenient short handle for
startSequence(name).appendBases(bases)
.The new sequence remains open meaning that additional bases for that sequence can be added with additional calls to
appendBases(java.lang.String)
.- Parameters:
sequence
- aReferenceSequence
to add.- Returns:
- a reference to this very same writer.
- Throws:
IOException
- if such an exception is thrown when actually writing into the output streams/channels.IllegalArgumentException
- if eithername
orbases
isnull
or contains an invalid value (e.g. unsupported bases or sequence names).IllegalStateException
- if the writer is already closed, a previous sequence (if any was opened) has no base appended to it or a sequence with such name was already appended to this writer.
-
appendSequence
public FastaReferenceWriter appendSequence(String name, String description, byte[] bases) throws IOException
Appends a new sequence to the output with or without a description.This is a convenient short handle for
startSequence(name, description).appendBases(bases)
.A
null
or empty ("") description will be ignored (no description will be output).The new sequence remains open meaning that additional bases for that sequence can be added with additional calls to
appendBases(java.lang.String)
.- Parameters:
name
- the name of the new sequence.bases
- the (first) bases of the sequence.description
- the description for the new sequence.- Returns:
- a reference to this very same writer.
- Throws:
IOException
- if such an exception is thrown when actually writing into the output streams/channels.IllegalArgumentException
- if eithername
orbases
isnull
or contains an invalid value (e.g. unsupported bases or sequence names). Also when thedescription
contains unsupported characters.IllegalStateException
- if the writer is already closed, a previous sequence (if any was opened) has no base appended to it or a sequence with such name was already appended to this writer.
-
appendSequence
public FastaReferenceWriter appendSequence(String name, String description, int basesPerLine, byte[] bases) throws IOException
Appends a new sequence to the output with or without a description and an alternative number of bases-per-line.This is a convenient short handle for
startSequence(name, description, bpl).appendBases(bases)
.A
null
or empty ("") description will be ignored (no description will be output).The new sequence remains open meaning that additional bases for that sequence can be added with additional calls to
appendBases(java.lang.String)
.- Parameters:
name
- the name of the new sequence.bases
- the (first) bases of the sequence.description
- the description for the sequence.basesPerLine
- alternative number of bases per line to be used for the sequence.- Returns:
- a reference to this very same writer.
- Throws:
IOException
- if such an exception is thrown when actually writing into the output streams/channels.IllegalArgumentException
- if eithername
orbases
isnull
or contains an invalid value (e.g. unsupported bases or sequence names). Also when thedescription
contains unsupported characters orbasesPerLine
is 0 or negative.IllegalStateException
- if the writer is already closed, a previous sequence (if any was opened) has no base appended to it or a sequence with such name was already appended to this writer.
-
close
public void close() throws IOException
Closes this writer flushing all remaining writing operation input the output resources.Further calls to
appendBases(java.lang.String)
orstartSequence(java.lang.String)
will result in an exception.- Specified by:
close
in interfaceAutoCloseable
- Throws:
IOException
- if such exception is thrown when closing output writers and output streams.IllegalStateException
- if closing without writing any sequences or closing when writing a sequence is in progress
-
writeSingleSequenceReference
public static void writeSingleSequenceReference(Path whereTo, boolean makeIndex, boolean makeDict, String name, String description, byte[] bases) throws IOException
Convenient method to write a FASTA file with a single sequence.- Parameters:
whereTo
- the path to. must not be null.makeIndex
- whether the index file should be written at its standard location.makeDict
- whether the dictionary file should be written at it standard location.name
- the sequence name, cannot contain white space, or control chracter or the header start character.description
- the sequence description, can be null or "" if no description.bases
- the sequence bases, cannot benull
.- Throws:
IOException
- if such exception is thrown when writing in the output resources.
-
writeSingleSequenceReference
public static void writeSingleSequenceReference(Path whereTo, int basesPerLine, boolean makeIndex, boolean makeDict, String name, String description, byte[] bases) throws IOException
Convenient method to write a FASTA file with a single sequence.- Parameters:
whereTo
- the path to. must not be null.basesPerLine
- number of bases per line. must be 1 or greater.makeIndex
- whether the index file should be written at its standard location.makeDict
- whether the dictionary file should be written at it standard location.name
- the sequence name, cannot contain white space, or control chracter or the header start character.description
- the sequence description, can be null or "" if no description.bases
- the sequence bases, cannot benull
.- Throws:
IOException
- if such exception is thrown when writing in the output resources.
-
-