SeqAn3  3.2.0-rc.1
The Modern C++ library for sequence analysis.
seqan3::fm_index< alphabet_t, text_layout_mode_, sdsl_index_type_ > Class Template Reference

The SeqAn FM Index. More...

#include <seqan3/search/fm_index/fm_index.hpp>

+ Inheritance diagram for seqan3::fm_index< alphabet_t, text_layout_mode_, sdsl_index_type_ >:

Friends

template<typename bi_fm_index_t >
class bi_fm_index_cursor
 
template<typename fm_index_t >
class fm_index_cursor
 

Detailed Description

template<semialphabet alphabet_t, text_layout text_layout_mode_, detail::sdsl_index sdsl_index_type_ = default_sdsl_index_type>
class seqan3::fm_index< alphabet_t, text_layout_mode_, sdsl_index_type_ >

The SeqAn FM Index.

Template Parameters
alphabet_tThe alphabet type; must model seqan3::semialphabet.
text_layout_mode_Indicates whether this index works on a text collection or a single text. See seqan3::text_layout.
sdsl_index_type_The type of the underlying SDSL index, must model seqan3::sdsl_index.

The seqan3::fm_index is a fast and space-efficient string index to search strings and collections of strings.

General information

Here is a short example on how to build an index and search a pattern using an cursor. Please note that there is a very powerful search module with a high-level interface seqan3::search that encapsulates the use of cursors.

#include <vector>
int main()
{
using namespace seqan3::literals;
std::vector<seqan3::dna4> genome{"ATCGATCGAAGGCTAGCTAGCTAAGGGA"_dna4};
seqan3::fm_index index{genome}; // build the index
auto cur = index.cursor(); // create a cursor
cur.extend_right("AAGG"_dna4); // search the pattern "AAGG"
seqan3::debug_stream << "Number of hits: " << cur.count() << '\n'; // outputs: 2
seqan3::debug_stream << "Positions in the genome: ";
for (auto && pos : cur.locate()) // outputs: (0, 8), (0, 22)
seqan3::debug_stream << pos << ' ';
return 0;
}
bool extend_right() noexcept
Tries to extend the query by the smallest possible character to the right such that the query is foun...
Definition: fm_index_cursor.hpp:260
The SeqAn FM Index.
Definition: fm_index.hpp:189
Provides seqan3::debug_stream and related types.
Provides seqan3::dna4, container aliases and string literals.
debug_stream_type debug_stream
A global instance of seqan3::debug_stream_type.
Definition: debug_stream.hpp:37
The SeqAn namespace for literals.
Meta-header for the Search / FM Index submodule .
Attention
When building an index for a single text over any alphabet, the symbol with rank 255 is reserved and may not occur in the text.

Here is an example using a collection of strings (e.g. a genome with multiple chromosomes or a protein database):

#include <vector>
int main()
{
using namespace seqan3::literals;
std::vector<std::vector<seqan3::dna4>> genomes{"ATCTGACGAAGGCTAGCTAGCTAAGGGA"_dna4,
"TAGCTGAAGCCATTGGCATCTGATCGGACT"_dna4,
"ACTGAGCTCGTC"_dna4,
"TGCATGCACCCATCGACTGACTG"_dna4,
"GTACGTACGTTACG"_dna4};
seqan3::fm_index index{genomes}; // build the index
auto cur = index.cursor(); // create a cursor
cur.extend_right("CTGA"_dna4); // search the pattern "CTGA"
seqan3::debug_stream << "Number of hits: " << cur.count() << '\n'; // outputs: 5
seqan3::debug_stream << "Positions in the genomes: ";
for (auto && pos : cur.locate()) // outputs: (3,16) (2,1) (1,3) (0,2) (1,19)
seqan3::debug_stream << pos << ' ';
return 0;
}
Attention
When building an index for a text collection over any alphabet, the symbols with rank 254 and 255 are reserved and may not be used in the text.

The documentation for this class was generated from the following file: