Package org.snpeff.geneSets
Class GeneSets
- java.lang.Object
-
- org.snpeff.geneSets.GeneSets
-
- All Implemented Interfaces:
java.io.Serializable
,java.lang.Iterable<GeneSet>
- Direct Known Subclasses:
GeneSetsRanked
public class GeneSets extends java.lang.Object implements java.lang.Iterable<GeneSet>, java.io.Serializable
A collection of GeneSets Genes have associated "experimental values"- Author:
- Pablo Cingolani
- See Also:
- Serialized Form
-
-
Field Summary
Fields Modifier and Type Field Description static boolean
debug
static double
LOG2
static long
PRINT_SOMETHING_TIME
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description boolean
add(java.lang.String gene)
Add a gene and aliasesboolean
add(java.lang.String gene, GeneSet geneSet)
Add a gene and it's corresponding gene setvoid
add(GeneSet geneSet)
Add a gene setboolean
addInteresting(java.lang.String gene)
Add a symbol as 'interesting' gene (to every corresponding GeneSet in this collection)void
checkInterestingGenes(java.util.Set<java.lang.String> intGenes)
Checks that every symboolID is in the set (as 'interesting' genes)protected void
copy(GeneSets geneSets)
Copy all data from geneSetsGeneSet
disjointSet(java.util.List<GeneSet> geneSetList, int activeSets)
Produce a GeneSet based on a list of GeneSets and a 'mask'static GeneSets
factory(GoTerms goTerms)
Create gene sets form GoTermsjava.util.List<GeneSet>
geneSetsSorted()
Iterate through each GeneSet in this GeneSetsjava.util.List<GeneSet>
geneSetsSortedSize(boolean reverse)
Gene sets sorted by size (if same size, sort by name).int
getGeneCount()
How many genes do we have?java.util.Set<java.lang.String>
getGenes()
Get all genes in this setGeneSet
getGeneSet(java.lang.String geneSetName)
Get a gene set named 'geneSetName'int
getGeneSetCount()
Get number of gene setsjava.util.HashSet<GeneSet>
getGeneSetsByGene(java.lang.String gene)
All gene sets that this gene belongs tojava.util.HashMap<java.lang.String,GeneSet>
getGeneSetsByName()
java.util.HashSet<java.lang.String>
getInterestingGenes()
int
getInterestingGenesCount()
java.lang.String
getLabel()
double
getValue(java.lang.String gene)
Get experimental valuejava.util.HashMap<java.lang.String,java.lang.Double>
getValueByGene()
boolean
hasGene(java.lang.String geneId)
boolean
hasValue(java.lang.String gene)
boolean
isInteresting(java.lang.String geneName)
boolean
isRanked()
protected boolean
isUsed(java.lang.String geneName)
protected boolean
isUsed(GeneSet gs)
Is this gene set used? I.e.java.util.Iterator<GeneSet>
iterator()
Iterate through each GeneSet in this GeneSetsjava.util.Iterator<GeneSet>
iteratorSorted()
Iterate through each GeneSet in this GeneSetsjava.util.Set<java.lang.String>
keySet()
java.util.List<GeneSet>
listTopTerms(int numberToSelect)
Select a number of GeneSetsjava.util.List<java.lang.String>
loadExperimentalValues(java.lang.String fileName, boolean maskException)
Reads a file with a list of genes and experimental values.boolean
loadMSigDb(java.lang.String gmtFile, boolean maskException)
Read an MSigDBfile and add every Gene set (do not add relationships between nodes in DAG)void
remove(GeneSet geneSet)
void
removeGeneSet(java.lang.String geneSetName)
Remove a GeneSetvoid
removeUnusedSets()
Remove unused gene setsvoid
reset()
Reset every 'interesting' gene or ranked gene (on every single GeneSet in this GeneSets)void
saveGseaGeneSets(java.lang.String fileName)
Save gene sets file for GSEA analysis Format specification: http://www.broad.mit.edu/cancer/software/gsea/wiki/index.php/Data_formats#GMT:_Gene_Matrix_Transposed_file_format_.28.2A.gmt.29void
setDoNotAddIfNotInGeneSet(boolean doNotAddIfNotInGeneSet)
void
setGeneSetByName(java.util.HashMap<java.lang.String,GeneSet> geneSets)
void
setInterestingGenes(java.util.HashSet<java.lang.String> interestingGenesIdSet)
void
setValue(java.lang.String geneId, double value)
Set experimental value for this genevoid
setVerbose(boolean verbose)
java.lang.String
toString()
java.util.Collection<GeneSet>
values()
-
-
-
Constructor Detail
-
GeneSets
public GeneSets()
Default constructor
-
GeneSets
public GeneSets(GeneSets geneSets)
-
GeneSets
public GeneSets(java.lang.String msigDb)
-
-
Method Detail
-
factory
public static GeneSets factory(GoTerms goTerms)
Create gene sets form GoTerms- Parameters:
goTerms
- : GoTerms to use
-
add
public void add(GeneSet geneSet)
Add a gene set- Parameters:
geneSetName
-geneSet
-
-
add
public boolean add(java.lang.String gene)
Add a gene and aliases
-
add
public boolean add(java.lang.String gene, GeneSet geneSet)
Add a gene and it's corresponding gene set- Parameters:
gene
-geneSet
-- Returns:
-
addInteresting
public boolean addInteresting(java.lang.String gene)
Add a symbol as 'interesting' gene (to every corresponding GeneSet in this collection)
-
checkInterestingGenes
public void checkInterestingGenes(java.util.Set<java.lang.String> intGenes)
Checks that every symboolID is in the set (as 'interesting' genes)- Parameters:
intGenes
- : A set of interesting genes Throws an exception on error
-
copy
protected void copy(GeneSets geneSets)
Copy all data from geneSets- Parameters:
geneSets
-
-
disjointSet
public GeneSet disjointSet(java.util.List<GeneSet> geneSetList, int activeSets)
Produce a GeneSet based on a list of GeneSets and a 'mask'- Parameters:
geneSetList
- : A list of GeneSetsactiveSets
- : An integer (binary mask) that specifies weather a set in the list should be taken into account or not. The operation performed is: Intersection{ GeneSets where mask_bit == 1 } - Union{ GeneSets where mask_bit == 0 } ) where the minus sign '-' is actually a 'set minus' operation. This operation is done for both sets in GeneSet (i.e. genes and interestingGenes)- Returns:
- A GeneSet
-
geneSetsSorted
public java.util.List<GeneSet> geneSetsSorted()
Iterate through each GeneSet in this GeneSets
-
geneSetsSortedSize
public java.util.List<GeneSet> geneSetsSortedSize(boolean reverse)
Gene sets sorted by size (if same size, sort by name).- Parameters:
reverse
- : Reverse size sorting (does not affect name sorting)- Returns:
-
getGeneCount
public int getGeneCount()
How many genes do we have?- Returns:
-
getGenes
public java.util.Set<java.lang.String> getGenes()
Get all genes in this set- Returns:
-
getGeneSet
public GeneSet getGeneSet(java.lang.String geneSetName)
Get a gene set named 'geneSetName'- Parameters:
geneSetName
-- Returns:
-
getGeneSetCount
public int getGeneSetCount()
Get number of gene sets- Returns:
-
getGeneSetsByGene
public java.util.HashSet<GeneSet> getGeneSetsByGene(java.lang.String gene)
All gene sets that this gene belongs to- Parameters:
gene
-- Returns:
-
getGeneSetsByName
public java.util.HashMap<java.lang.String,GeneSet> getGeneSetsByName()
-
getInterestingGenes
public java.util.HashSet<java.lang.String> getInterestingGenes()
-
getInterestingGenesCount
public int getInterestingGenesCount()
-
getLabel
public java.lang.String getLabel()
-
getValue
public double getValue(java.lang.String gene)
Get experimental value- Parameters:
gene
-- Returns:
-
getValueByGene
public java.util.HashMap<java.lang.String,java.lang.Double> getValueByGene()
-
hasGene
public boolean hasGene(java.lang.String geneId)
-
hasValue
public boolean hasValue(java.lang.String gene)
-
isInteresting
public boolean isInteresting(java.lang.String geneName)
-
isRanked
public boolean isRanked()
-
isUsed
protected boolean isUsed(GeneSet gs)
Is this gene set used? I.e. is there at least one gene 'used'? (e.g. interesting or ranked)- Parameters:
gs
-- Returns:
-
isUsed
protected boolean isUsed(java.lang.String geneName)
-
iterator
public java.util.Iterator<GeneSet> iterator()
Iterate through each GeneSet in this GeneSets- Specified by:
iterator
in interfacejava.lang.Iterable<GeneSet>
-
iteratorSorted
public java.util.Iterator<GeneSet> iteratorSorted()
Iterate through each GeneSet in this GeneSets
-
keySet
public java.util.Set<java.lang.String> keySet()
-
listTopTerms
public java.util.List<GeneSet> listTopTerms(int numberToSelect)
Select a number of GeneSets- Parameters:
numberToSelect
-- Returns:
-
loadExperimentalValues
public java.util.List<java.lang.String> loadExperimentalValues(java.lang.String fileName, boolean maskException)
Reads a file with a list of genes and experimental values. Format: "gene \t value \n"- Parameters:
fileName
-- Returns:
- A list of genes not found
-
loadMSigDb
public boolean loadMSigDb(java.lang.String gmtFile, boolean maskException)
Read an MSigDBfile and add every Gene set (do not add relationships between nodes in DAG)- Parameters:
gmtFile
-geneSetType
-
-
remove
public void remove(GeneSet geneSet)
-
removeGeneSet
public void removeGeneSet(java.lang.String geneSetName)
Remove a GeneSet
-
removeUnusedSets
public void removeUnusedSets()
Remove unused gene sets
-
reset
public void reset()
Reset every 'interesting' gene or ranked gene (on every single GeneSet in this GeneSets)
-
saveGseaGeneSets
public void saveGseaGeneSets(java.lang.String fileName)
Save gene sets file for GSEA analysis Format specification: http://www.broad.mit.edu/cancer/software/gsea/wiki/index.php/Data_formats#GMT:_Gene_Matrix_Transposed_file_format_.28.2A.gmt.29- Parameters:
fileName
-
-
setDoNotAddIfNotInGeneSet
public void setDoNotAddIfNotInGeneSet(boolean doNotAddIfNotInGeneSet)
-
setGeneSetByName
public void setGeneSetByName(java.util.HashMap<java.lang.String,GeneSet> geneSets)
-
setInterestingGenes
public void setInterestingGenes(java.util.HashSet<java.lang.String> interestingGenesIdSet)
-
setValue
public void setValue(java.lang.String geneId, double value)
Set experimental value for this gene- Parameters:
geneId
-value
-
-
setVerbose
public void setVerbose(boolean verbose)
-
toString
public java.lang.String toString()
- Overrides:
toString
in classjava.lang.Object
-
values
public java.util.Collection<GeneSet> values()
-
-