Class ReferenceSequenceFileFactory


  • public class ReferenceSequenceFileFactory
    extends Object
    Factory class for creating ReferenceSequenceFile instances for reading reference sequences store in various formats.
    • Constructor Detail

      • ReferenceSequenceFileFactory

        public ReferenceSequenceFileFactory()
    • Method Detail

      • getReferenceSequenceFile

        public static ReferenceSequenceFile getReferenceSequenceFile​(File file)
        Attempts to determine the type of the reference file and return an instance of ReferenceSequenceFile that is appropriate to read it. Sequence names will be truncated at first whitespace, if any.
        Parameters:
        file - the reference sequence file on disk
      • getReferenceSequenceFile

        public static ReferenceSequenceFile getReferenceSequenceFile​(File file,
                                                                     boolean truncateNamesAtWhitespace)
        Attempts to determine the type of the reference file and return an instance of ReferenceSequenceFile that is appropriate to read it.
        Parameters:
        file - the reference sequence file on disk
        truncateNamesAtWhitespace - if true, only include the first word of the sequence name
      • getReferenceSequenceFile

        public static ReferenceSequenceFile getReferenceSequenceFile​(File file,
                                                                     boolean truncateNamesAtWhitespace,
                                                                     boolean preferIndexed)
        Attempts to determine the type of the reference file and return an instance of ReferenceSequenceFile that is appropriate to read it.
        Parameters:
        file - the reference sequence file on disk
        truncateNamesAtWhitespace - if true, only include the first word of the sequence name
        preferIndexed - if true attempt to return an indexed reader that supports non-linear traversal, else return the non-indexed reader
      • getReferenceSequenceFile

        public static ReferenceSequenceFile getReferenceSequenceFile​(Path path)
        Attempts to determine the type of the reference file and return an instance of ReferenceSequenceFile that is appropriate to read it. Sequence names will be truncated at first whitespace, if any.
        Parameters:
        path - the reference sequence file on disk
      • getReferenceSequenceFile

        public static ReferenceSequenceFile getReferenceSequenceFile​(Path path,
                                                                     boolean truncateNamesAtWhitespace)
        Attempts to determine the type of the reference file and return an instance of ReferenceSequenceFile that is appropriate to read it.
        Parameters:
        path - the reference sequence file on disk
        truncateNamesAtWhitespace - if true, only include the first word of the sequence name
      • getReferenceSequenceFile

        public static ReferenceSequenceFile getReferenceSequenceFile​(Path path,
                                                                     boolean truncateNamesAtWhitespace,
                                                                     boolean preferIndexed)
        Attempts to determine the type of the reference file and return an instance of ReferenceSequenceFile that is appropriate to read it.
        Parameters:
        path - the reference sequence file path
        truncateNamesAtWhitespace - if true, only include the first word of the sequence name
        preferIndexed - if true attempt to return an indexed reader that supports non-linear traversal, else return the non-indexed reader
      • canCreateIndexedFastaReader

        public static boolean canCreateIndexedFastaReader​(Path fastaFile)
        Checks if the provided FASTA file can be open as indexed.

        For a FASTA file to be indexed, it requires to have:

        Parameters:
        fastaFile - the reference sequence file path.
        Returns:
        true if the file can be open as indexed; false otherwise.
      • getReferenceSequenceFile

        public static ReferenceSequenceFile getReferenceSequenceFile​(String source,
                                                                     SeekableStream in,
                                                                     FastaSequenceIndex index)
        Return an instance of ReferenceSequenceFile using the given fasta sequence file stream, optional index stream, and no sequence dictionary
        Parameters:
        source - The named source of the reference file (used in error messages).
        in - The input stream to read the fasta file from.
        index - The index, or null to return a non-indexed reader.
      • getReferenceSequenceFile

        public static ReferenceSequenceFile getReferenceSequenceFile​(String source,
                                                                     SeekableStream in,
                                                                     FastaSequenceIndex index,
                                                                     SAMSequenceDictionary dictionary,
                                                                     boolean truncateNamesAtWhitespace)
        Return an instance of ReferenceSequenceFile using the given fasta sequence file stream and optional index stream and sequence dictionary.
        Parameters:
        source - The named source of the reference file (used in error messages).
        in - The input stream to read the fasta file from.
        index - The index, or null to return a non-indexed reader.
        dictionary - The sequence dictionary, or null if there isn't one.
        truncateNamesAtWhitespace - if true, only include the first word of the sequence name
      • getDefaultDictionaryForReferenceSequence

        public static File getDefaultDictionaryForReferenceSequence​(File file)
        Returns the default dictionary name for a FASTA file.
        Parameters:
        file - the reference sequence file on disk.
      • getDefaultDictionaryForReferenceSequence

        public static Path getDefaultDictionaryForReferenceSequence​(Path path)
        Returns the default dictionary name for a FASTA file.
        Parameters:
        path - the reference sequence file path.
      • loadDictionary

        public static SAMSequenceDictionary loadDictionary​(InputStream in)
        Loads the sequence dictionary from a FASTA file input stream.
        Parameters:
        in - the FASTA file input stream.
        Returns:
        the sequence dictionary, or null if the header has no dictionary or it was empty.
      • getFastaExtension

        public static String getFastaExtension​(Path path)
        Returns the FASTA extension for the path.
        Parameters:
        path - the reference sequence file path.
        Throws:
        IllegalArgumentException - if the file is not a supported reference file.
      • getFastaIndexFileName

        public static Path getFastaIndexFileName​(Path fastaFile)
        Returns the index name for a FASTA file.
        Parameters:
        fastaFile - the reference sequence file path.