Package org.snpeff.geneOntology
Class GoTerms
- java.lang.Object
-
- org.snpeff.geneOntology.GoTerms
-
- All Implemented Interfaces:
java.io.Serializable
,java.lang.Iterable<GoTerm>
public class GoTerms extends java.lang.Object implements java.lang.Iterable<GoTerm>, java.io.Serializable
A collection of GO terms- Author:
- Pablo Cingolani
- See Also:
- Serialized Form
-
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description GoTerm
add(GoTerm goTerm)
Add a GOTerm (if not already in this GOTerms) WARNING: Creates 'fake' symbolNames based on symbolIds.void
addInterestingSymbol(java.lang.String symbolId, int rank, java.util.HashSet<java.lang.String> noGoTermFound)
Add a symbol as 'interesting' symbol (to every corresponding GOTerm in this set)boolean
addSymbolId(GoTerm goTerm, java.lang.String symbolId)
Add a symbolId (as well as all needed mappings)void
addSymbolsFromChilds()
Use symbols for chids in DAG For every GOTerm, each child's symbols are added to the term so that root term contains every symbol and every interestingSymboljava.util.Set<java.lang.String>
allSymbols()
Create a set with all the symbolsvoid
checkInterestingSymbolIds(java.util.Set<java.lang.String> interestingSymbolIds)
Checks that every symboolID is in the set (as 'interesting' symbols)GoTerm
disjointSet(java.util.List<GoTerm> goTermList, int activeSets)
Produce a GOTerm based on a list of GOTerms and a 'mask'GoTerm
getGoTerm(java.lang.String goTermAcc)
java.util.HashMap<java.lang.String,GoTerm>
getGoTermsByGoTermAcc()
java.util.HashMap<java.lang.String,java.util.Set<GoTerm>>
getGoTermsBySymbolId()
java.util.Set<GoTerm>
getGoTermsBySymbolId(java.lang.String symbolId)
java.util.HashSet<java.lang.String>
getInterestingSymbolIdsSet()
int
getInterestingSymbolIdsSize()
java.lang.String
getLabel()
int
getMaxRank()
java.lang.String
getNameSpace()
int
getRank(java.lang.String symbolId)
Get symbol's rankjava.util.HashMap<java.lang.String,java.lang.Integer>
getRankSymbolId()
java.util.Iterator<GoTerm>
iterator()
Iterate through each GOterm in this GOTermsjava.util.Set<java.lang.String>
keySet()
int
levels()
Calculate each node's level (in DAG)java.util.List<GoTerm>
listTopTerms(int numberToSelect)
Select a number of GOTermsint
numberOfInterestingSymbols()
Calculate how many interesting symbol-IDs in are there in all these GOTermsint
numberOfNodes()
Number of nodes in this DAGint
numberOfNodesWithOneInterestingSymbol()
Calculate the number of nodes in that have at least one interesting symbolint
numberOfNodesWithOneSymbol()
Calculate the number of nodes in that have at least one annotated symbolint
numberOfSymbols()
Calculate how many symbol-IDs in are there in all these GOTermsvoid
readGeneAssocFile(java.lang.String goGenesFile, boolean useGeneId)
Reads a file containing every gene (names and ids) associated GO termsvoid
readInterestingSymbolIdsFile(java.lang.String fileName)
Reads a file with a list of 'interesting' genes (one per line)void
readOboFile(java.lang.String oboFile, boolean removeObsolete)
Read an OBO filevoid
removeGOTerm(java.lang.String goTermAcc)
Remove a GOTermvoid
resetInterestingSymbolIds()
Reset every 'interesting' symbolId (on every single GOTerm in this GOTerms)java.util.Set<GoTerm>
rootNodes()
void
saveGseaGeneSets(java.lang.String fileName)
Save gene sets file for GSEA analysis Format specification: http://www.broad.mit.edu/cancer/software/gsea/wiki/index.php/Data_formats#GMT:_Gene_Matrix_Transposed_file_format_.28.2A.gmt.29void
setLabel(java.lang.String label)
java.lang.String
toString()
java.util.Collection<GoTerm>
values()
-
-
-
Constructor Detail
-
GoTerms
public GoTerms()
Default constructor
-
GoTerms
public GoTerms(java.lang.String oboFile, java.lang.String nameSpace, java.lang.String interestingGenesFile, java.lang.String geneAssocFile, boolean removeObsolete, boolean useGeneId)
Constructor- Parameters:
oboFile
- : Path to OBO description filenameSpace
- : Can be 'null' for "all namespaces"interestingGenesFile
- : Path to a file containing a list of 'interesting' genes (one geneName per line)geneAssocFile
- : A file containing lines like: "GOterm \t gene_product_id \t gene_name \n"
-
-
Method Detail
-
add
public GoTerm add(GoTerm goTerm)
Add a GOTerm (if not already in this GOTerms) WARNING: Creates 'fake' symbolNames based on symbolIds. This method is used mostly for testing / debugging
-
addInterestingSymbol
public void addInterestingSymbol(java.lang.String symbolId, int rank, java.util.HashSet<java.lang.String> noGoTermFound)
Add a symbol as 'interesting' symbol (to every corresponding GOTerm in this set)- Parameters:
symbolName
- : Symbol's namerank
- : symbol's ranknoGoTermFound
- : Add symbol here if there are no GOTerms associated with this symbol
-
addSymbolId
public boolean addSymbolId(GoTerm goTerm, java.lang.String symbolId)
Add a symbolId (as well as all needed mappings)- Parameters:
goTermAcc
-symbolId
-symbolName
-goTermType
-description
-- Returns:
- true if OK, false on error (GOTerm 'goTermAcc' not found)
-
addSymbolsFromChilds
public void addSymbolsFromChilds()
Use symbols for chids in DAG For every GOTerm, each child's symbols are added to the term so that root term contains every symbol and every interestingSymbol
-
allSymbols
public java.util.Set<java.lang.String> allSymbols()
Create a set with all the symbols
-
checkInterestingSymbolIds
public void checkInterestingSymbolIds(java.util.Set<java.lang.String> interestingSymbolIds)
Checks that every symboolID is in the set (as 'interesting' symbols)- Parameters:
interestingSymbolIds
- : A set of interesting symbols Throws an exception on error
-
disjointSet
public GoTerm disjointSet(java.util.List<GoTerm> goTermList, int activeSets)
Produce a GOTerm based on a list of GOTerms and a 'mask'- Parameters:
goTermList
- : A list of GOTermsactiveSets
- : An integer (binary mask) that specifies weather a set in the list should be taken into account or not. The operation performed is: Intersection{ GOTerms where mask_bit == 1 } - Union{ GOTerms where mask_bit == 0 } ) where the minus sign '-' is actually a 'set minus' operation. This operation is done for both sets in GOTerm (i.e. symbolIds and interestingSymbolIds)- Returns:
- A GOTerm
-
getGoTerm
public GoTerm getGoTerm(java.lang.String goTermAcc)
-
getGoTermsByGoTermAcc
public java.util.HashMap<java.lang.String,GoTerm> getGoTermsByGoTermAcc()
-
getGoTermsBySymbolId
public java.util.HashMap<java.lang.String,java.util.Set<GoTerm>> getGoTermsBySymbolId()
-
getGoTermsBySymbolId
public java.util.Set<GoTerm> getGoTermsBySymbolId(java.lang.String symbolId)
-
getInterestingSymbolIdsSet
public java.util.HashSet<java.lang.String> getInterestingSymbolIdsSet()
-
getInterestingSymbolIdsSize
public int getInterestingSymbolIdsSize()
-
getLabel
public java.lang.String getLabel()
-
getMaxRank
public int getMaxRank()
-
getNameSpace
public java.lang.String getNameSpace()
-
getRank
public int getRank(java.lang.String symbolId)
Get symbol's rank- Parameters:
symbolId
-- Returns:
-
getRankSymbolId
public java.util.HashMap<java.lang.String,java.lang.Integer> getRankSymbolId()
-
iterator
public java.util.Iterator<GoTerm> iterator()
Iterate through each GOterm in this GOTerms- Specified by:
iterator
in interfacejava.lang.Iterable<GoTerm>
-
keySet
public java.util.Set<java.lang.String> keySet()
-
levels
public int levels()
Calculate each node's level (in DAG)- Returns:
- maximum level
-
listTopTerms
public java.util.List<GoTerm> listTopTerms(int numberToSelect)
Select a number of GOTerms- Parameters:
numberToSelect
-- Returns:
-
numberOfInterestingSymbols
public int numberOfInterestingSymbols()
Calculate how many interesting symbol-IDs in are there in all these GOTerms- Returns:
- Number of interesting symbols
-
numberOfNodes
public int numberOfNodes()
Number of nodes in this DAG- Returns:
-
numberOfNodesWithOneInterestingSymbol
public int numberOfNodesWithOneInterestingSymbol()
Calculate the number of nodes in that have at least one interesting symbol- Returns:
-
numberOfNodesWithOneSymbol
public int numberOfNodesWithOneSymbol()
Calculate the number of nodes in that have at least one annotated symbol- Returns:
-
numberOfSymbols
public int numberOfSymbols()
Calculate how many symbol-IDs in are there in all these GOTerms- Returns:
- Number of interesting symbols
-
readGeneAssocFile
public void readGeneAssocFile(java.lang.String goGenesFile, boolean useGeneId)
Reads a file containing every gene (names and ids) associated GO terms- Parameters:
goGenesFile
- : A file containing gene associations to GO terms
-
readInterestingSymbolIdsFile
public void readInterestingSymbolIdsFile(java.lang.String fileName)
Reads a file with a list of 'interesting' genes (one per line)- Parameters:
fileName
- : Can be "-" for no-file
-
readOboFile
public void readOboFile(java.lang.String oboFile, boolean removeObsolete)
Read an OBO file- Parameters:
oboFile
-nameSpace
-
-
removeGOTerm
public void removeGOTerm(java.lang.String goTermAcc)
Remove a GOTerm
-
resetInterestingSymbolIds
public void resetInterestingSymbolIds()
Reset every 'interesting' symbolId (on every single GOTerm in this GOTerms)
-
rootNodes
public java.util.Set<GoTerm> rootNodes()
-
saveGseaGeneSets
public void saveGseaGeneSets(java.lang.String fileName)
Save gene sets file for GSEA analysis Format specification: http://www.broad.mit.edu/cancer/software/gsea/wiki/index.php/Data_formats#GMT:_Gene_Matrix_Transposed_file_format_.28.2A.gmt.29- Parameters:
fileName
-
-
setLabel
public void setLabel(java.lang.String label)
-
toString
public java.lang.String toString()
- Overrides:
toString
in classjava.lang.Object
-
values
public java.util.Collection<GoTerm> values()
-
-