vg
tools for working with variation graphs
Public Types | Public Member Functions | Static Public Member Functions | Public Attributes | Static Public Attributes | List of all members
vg::GaplessExtender Class Reference

#include <gapless_extender.hpp>

Public Types

typedef GaplessExtension::seed_type seed_type
 
typedef pair_hash_set< seed_typecluster_type
 

Public Member Functions

 GaplessExtender ()
 Create an empty GaplessExtender. More...
 
 GaplessExtender (const gbwtgraph::GBWTGraph &graph, const Aligner &aligner)
 Create a GaplessExtender using the given GBWTGraph and Aligner objects. More...
 
std::vector< GaplessExtensionextend (cluster_type &cluster, const std::string &sequence, const gbwtgraph::CachedGBWTGraph *cache=nullptr, size_t max_mismatches=MAX_MISMATCHES, double overlap_threshold=OVERLAP_THRESHOLD) const
 
void unfold_haplotypes (const std::unordered_set< nid_t > &subgraph, std::vector< std::vector< handle_t >> &haplotype_paths, bdsg::HashGraph &unfolded, const gbwtgraph::CachedGBWTGraph *cache=nullptr) const
 
void transform_alignment (Alignment &aln, const std::vector< std::vector< handle_t >> &haplotype_paths) const
 

Static Public Member Functions

static seed_type to_seed (pos_t pos, size_t read_offset)
 Convert (graph position, read offset) to a seed. More...
 
static pos_t get_pos (seed_type seed)
 Get the graph position from a seed. More...
 
static handle_t get_handle (seed_type seed)
 Get the handle from a seed. More...
 
static size_t get_node_offset (seed_type seed)
 Get the node offset from a seed. More...
 
static size_t get_read_offset (seed_type seed)
 Get the read offset from a seed. More...
 

Public Attributes

const gbwtgraph::GBWTGraph * graph
 
const Aligneraligner
 

Static Public Attributes

constexpr static size_t MAX_MISMATCHES = 4
 The default value for the maximum number of mismatches. More...
 
constexpr static double OVERLAP_THRESHOLD = 0.8
 

Detailed Description

A class that supports haplotype-consistent seed extension using GBWTGraph. Each seed is a pair of matching read/graph positions and each extension is a gapless alignment of an interval of the read to a haplotype. A cluster is an unordered set of distinct seeds. Seeds in the same node with the same (read_offset - node_offset) difference are considered equivalent. All seeds in a cluster should correspond to the same alignment or positions near it. GaplessExtender also needs an Aligner object for scoring the extension candidates.

Member Typedef Documentation

◆ cluster_type

◆ seed_type

Constructor & Destructor Documentation

◆ GaplessExtender() [1/2]

vg::GaplessExtender::GaplessExtender ( )

Create an empty GaplessExtender.

◆ GaplessExtender() [2/2]

vg::GaplessExtender::GaplessExtender ( const gbwtgraph::GBWTGraph &  graph,
const Aligner aligner 
)
explicit

Create a GaplessExtender using the given GBWTGraph and Aligner objects.

Member Function Documentation

◆ extend()

std::vector< GaplessExtension > vg::GaplessExtender::extend ( cluster_type cluster,
const std::string &  sequence,
const gbwtgraph::CachedGBWTGraph *  cache = nullptr,
size_t  max_mismatches = MAX_MISMATCHES,
double  overlap_threshold = OVERLAP_THRESHOLD 
) const

Find the highest-scoring extension for each seed in the cluster. If there is a full-length extension with at most max_mismatches mismatches, return the (up to two) best full-length extensions with less than overlap_threshold overlap, sorted by score in descending order. If that is not possible, trim the extensions to maximize score, sort them by read interval, and remove duplicates. Allow any number of mismatches in the initial node, at least max_mismatches mismatches in the entire extension, and at least max_mismatches / 2 mismatches on each flank. Use the provided CachedGBWTGraph or allocate a new one.

◆ get_handle()

static handle_t vg::GaplessExtender::get_handle ( seed_type  seed)
inlinestatic

Get the handle from a seed.

◆ get_node_offset()

static size_t vg::GaplessExtender::get_node_offset ( seed_type  seed)
inlinestatic

Get the node offset from a seed.

◆ get_pos()

static pos_t vg::GaplessExtender::get_pos ( seed_type  seed)
inlinestatic

Get the graph position from a seed.

◆ get_read_offset()

static size_t vg::GaplessExtender::get_read_offset ( seed_type  seed)
inlinestatic

Get the read offset from a seed.

◆ to_seed()

static seed_type vg::GaplessExtender::to_seed ( pos_t  pos,
size_t  read_offset 
)
inlinestatic

Convert (graph position, read offset) to a seed.

◆ transform_alignment()

void vg::GaplessExtender::transform_alignment ( Alignment aln,
const std::vector< std::vector< handle_t >> &  haplotype_paths 
) const

Transform an alignment to a single node in the unfold_haplotypes() graph to an alignment to the corresponding path in the original graph.

◆ unfold_haplotypes()

void vg::GaplessExtender::unfold_haplotypes ( const std::unordered_set< nid_t > &  subgraph,
std::vector< std::vector< handle_t >> &  haplotype_paths,
bdsg::HashGraph unfolded,
const gbwtgraph::CachedGBWTGraph *  cache = nullptr 
) const

Find the distinct local haplotypes in the given subgraph and return the corresponding paths. For each path haplotype_paths[i], the output graph will contain node 2i + 1 with sequence corresponding to the path and node 2i + 2 with the reverse complement of the sequence. Use the provided CachedGBWTGraph or allocate a new one.

Member Data Documentation

◆ aligner

const Aligner* vg::GaplessExtender::aligner

◆ graph

const gbwtgraph::GBWTGraph* vg::GaplessExtender::graph

◆ MAX_MISMATCHES

constexpr size_t vg::GaplessExtender::MAX_MISMATCHES = 4
staticconstexpr

The default value for the maximum number of mismatches.

◆ OVERLAP_THRESHOLD

constexpr double vg::GaplessExtender::OVERLAP_THRESHOLD = 0.8
staticconstexpr

Two full-length alignments are distinct, if the fraction of overlapping position pairs is at most this.


The documentation for this class was generated from the following files: