Package org.snpeff.snpEffect.factory
Class SnpEffPredictorFactory
- java.lang.Object
-
- org.snpeff.snpEffect.factory.SnpEffPredictorFactory
-
- Direct Known Subclasses:
SnpEffPredictorFactoryFeatures
,SnpEffPredictorFactoryGenesFile
,SnpEffPredictorFactoryGff
,SnpEffPredictorFactoryKnownGene
,SnpEffPredictorFactoryRefSeq
public abstract class SnpEffPredictorFactory extends java.lang.Object
This class creates a SnpEffectPredictor from a file (or a set of files) and a configuration- Author:
- pcingola
-
-
Field Summary
Fields Modifier and Type Field Description static int
MARK
static int
MIN_TOTAL_FRAME_COUNT
-
Constructor Summary
Constructors Constructor Description SnpEffPredictorFactory(Config config, int inOffset)
-
Method Summary
All Methods Instance Methods Abstract Methods Concrete Methods Modifier and Type Method Description protected void
add(Cds cds)
protected void
add(Chromosome chromo)
protected void
add(Exon exon)
protected void
add(Gene gene)
Add a Geneprotected void
add(Marker marker)
Add a generic Markerprotected void
add(Transcript tr)
Add a transcriptprotected void
addMarker(Marker marker, boolean unique)
Add a marker to the collectionprotected void
addSequences(java.lang.String chr, java.lang.String chrSeq)
Add genomic reference sequencesprotected void
adjustChromosomes()
Adjust chromosome length using gene information This is used when the sequence is not available (which makes sense on test-cases and debugging only)protected void
adjustTranscripts()
Adjust transcripts: recalculate start, end, strand, etc.protected void
beforeExonSequences()
Perform some actions before reading sequencesprotected void
codingFromCds()
Only coding transcripts have CDS: Make sure that transcripts having CDS are protein coding It might not be always "precise" though: $ grep CDS genes.gtf | cut -f 2 | ~/snpEff/scripts/uniqCount.pl 113 IG_C_gene 64 IG_D_gene 24 IG_J_gene 366 IG_V_gene 21 TR_C_gene 3 TR_D_gene 82 TR_J_gene 296 TR_V_gene 461 non_stop_decay 63322 nonsense_mediated_decay 905 polymorphic_pseudogene 34 processed_transcript 1340112 protein_codingprotected void
collapseZeroLenIntrons()
Collapse exons having zero size introns between themabstract SnpEffectPredictor
create()
protected void
createRandSequences()
Create random sequences for exons Note: This is only used for test cases!protected void
deleteRedundant()
Consolidate transcripts: If two exons are one right next to the other, join them E.g.protected void
exonsFromCds()
Create exons from CDS infoprotected void
exonsFromCds(Transcript tr)
Create exons from CDS info WARNING: We might end up with redundant exons if some exons existed before this processprotected Gene
findGene(java.lang.String id)
protected Gene
findGene(java.lang.String geneId, java.lang.String id)
protected Marker
findMarker(java.lang.String id)
protected Transcript
findTranscript(java.lang.String id)
protected Transcript
findTranscript(java.lang.String trId, java.lang.String id)
protected Chromosome
getOrCreateChromosome(java.lang.String chromoName)
Get a chromosome.java.util.Map<java.lang.String,java.lang.String>
getProteinByTrId()
protected int
parsePosition(java.lang.String posStr)
Parse a string as a 'position'.protected void
readExonSequences()
Read exon sequences from a FASTA fileprotected void
replaceTranscript(Transcript trOld, Transcript trNew)
void
setCreateRandSequences(boolean createRandSequences)
void
setDebug(boolean debug)
void
setFastaFile(java.lang.String fastaFile)
void
setFileName(java.lang.String fileName)
void
setRandom(java.util.Random random)
void
setReadSequences(boolean readSequences)
Read sequences? Note: This is only used for debugging and testingvoid
setStoreSequences(boolean storeSequences)
void
setVerbose(boolean verbose)
protected java.lang.String
showChromoNamesDifferences()
Shw differences in chromosome names
-
-
-
Field Detail
-
MARK
public static final int MARK
- See Also:
- Constant Field Values
-
MIN_TOTAL_FRAME_COUNT
public static int MIN_TOTAL_FRAME_COUNT
-
-
Constructor Detail
-
SnpEffPredictorFactory
public SnpEffPredictorFactory(Config config, int inOffset)
-
-
Method Detail
-
add
protected void add(Cds cds)
-
add
protected void add(Chromosome chromo)
-
add
protected void add(Exon exon)
-
add
protected void add(Gene gene)
Add a Gene
-
add
protected void add(Marker marker)
Add a generic Marker
-
add
protected void add(Transcript tr)
Add a transcript
-
addMarker
protected void addMarker(Marker marker, boolean unique)
Add a marker to the collection
-
addSequences
protected void addSequences(java.lang.String chr, java.lang.String chrSeq)
Add genomic reference sequences
-
adjustChromosomes
protected void adjustChromosomes()
Adjust chromosome length using gene information This is used when the sequence is not available (which makes sense on test-cases and debugging only)
-
adjustTranscripts
protected void adjustTranscripts()
Adjust transcripts: recalculate start, end, strand, etc.
-
beforeExonSequences
protected void beforeExonSequences()
Perform some actions before reading sequences
-
codingFromCds
protected void codingFromCds()
Only coding transcripts have CDS: Make sure that transcripts having CDS are protein coding It might not be always "precise" though: $ grep CDS genes.gtf | cut -f 2 | ~/snpEff/scripts/uniqCount.pl 113 IG_C_gene 64 IG_D_gene 24 IG_J_gene 366 IG_V_gene 21 TR_C_gene 3 TR_D_gene 82 TR_J_gene 296 TR_V_gene 461 non_stop_decay 63322 nonsense_mediated_decay 905 polymorphic_pseudogene 34 processed_transcript 1340112 protein_coding
-
collapseZeroLenIntrons
protected void collapseZeroLenIntrons()
Collapse exons having zero size introns between them
-
create
public abstract SnpEffectPredictor create()
-
createRandSequences
protected void createRandSequences()
Create random sequences for exons Note: This is only used for test cases!
-
deleteRedundant
protected void deleteRedundant()
Consolidate transcripts: If two exons are one right next to the other, join them E.g. exon1:1234-2345, exon2:2346-2400 => exon:1234-2400 This happens mostly in GTF files, where the stop-codon is specified separated from the exon info.
-
exonsFromCds
protected void exonsFromCds()
Create exons from CDS info
-
exonsFromCds
protected void exonsFromCds(Transcript tr)
Create exons from CDS info WARNING: We might end up with redundant exons if some exons existed before this process- Parameters:
tr
- : Transcript with CDS info, but no exons
-
findGene
protected Gene findGene(java.lang.String id)
-
findGene
protected Gene findGene(java.lang.String geneId, java.lang.String id)
-
findMarker
protected Marker findMarker(java.lang.String id)
-
findTranscript
protected Transcript findTranscript(java.lang.String id)
-
findTranscript
protected Transcript findTranscript(java.lang.String trId, java.lang.String id)
-
getOrCreateChromosome
protected Chromosome getOrCreateChromosome(java.lang.String chromoName)
Get a chromosome. If it doesn't exist, create it
-
getProteinByTrId
public java.util.Map<java.lang.String,java.lang.String> getProteinByTrId()
-
parsePosition
protected int parsePosition(java.lang.String posStr)
Parse a string as a 'position'. Note: It subtracts 'inOffset' so that all coordinates are zero-based
-
readExonSequences
protected void readExonSequences()
Read exon sequences from a FASTA file
-
replaceTranscript
protected void replaceTranscript(Transcript trOld, Transcript trNew)
-
setCreateRandSequences
public void setCreateRandSequences(boolean createRandSequences)
-
setDebug
public void setDebug(boolean debug)
-
setFastaFile
public void setFastaFile(java.lang.String fastaFile)
-
setFileName
public void setFileName(java.lang.String fileName)
-
setRandom
public void setRandom(java.util.Random random)
-
setReadSequences
public void setReadSequences(boolean readSequences)
Read sequences? Note: This is only used for debugging and testing
-
setStoreSequences
public void setStoreSequences(boolean storeSequences)
-
setVerbose
public void setVerbose(boolean verbose)
-
showChromoNamesDifferences
protected java.lang.String showChromoNamesDifferences()
Shw differences in chromosome names
-
-