OpenMS  2.6.0
PeptideIndexer

Refreshes the protein references for all peptide hits from an idXML file and adds target/decoy information.

pot. predecessor tools $ \longrightarrow $ PeptideIndexer $ \longrightarrow $ pot. successor tools
IDFilter or
any protein/peptide processing tool
FalseDiscoveryRate

PeptideIndexer refreshes target/decoy information and mapping of peptides to proteins. The target/decoy information is crucial for the FalseDiscoveryRate tool. (For FDR calculations, "target+decoy" peptide hits count as target hits.)

PeptideIndexer allows for ambiguous amino acids (B|J|Z|X) in the protein database, but not in the peptide sequences. For the latter only I/L can be treated as equivalent (see 'IL_equivalent' flag), but 'J' is not allowed.

Enzyme cutting rules and partial specificity can be specified (derived from input idXML automatically by default).

Resulting protein hits appear in the order of the FASTA file, except for orphaned proteins, which will appear first with an empty target_decoy metavalue. Duplicate protein accessions & sequences will not raise a warning, but create multiple hits (PeptideIndexer scans over the FASTA file once for efficiency reasons, and thus might not see all accessions & sequences at once).

All peptide and protein hits are annotated with target/decoy information, using the meta value "target_decoy". For proteins the possible values are "target" and "decoy", depending on whether the protein accession contains the decoy pattern (parameter decoy_string) as a suffix or prefix, respectively (see parameter prefix).

Peptide hits are annotated with metavalue 'protein_references', and if matched to at least one protein also with metavalue 'target_decoy'. The possible values for 'target_decoy' are "target", "decoy" and "target+decoy", depending on whether the peptide sequence is found only in target proteins, only in decoy proteins, or in both. The metavalue is not present, if the peptide is unmatched.

Runtime: PeptideIndexer is usually very fast (loading and storing the data takes the most time) and search speed can be further improved (linearly), but using more threads. Avoid allowing too many (>=4) ambiguous amino acids if your database contains long stretches of 'X' (exponential search space).

PeptideIndexer supports relative database filenames, which (when not found in the current working directory) are looked up in the directories specified by OpenMS.ini:id_db_dir (see TOPP for Advanced Users). The database is by default derived from the input idXML's metainformation ('auto' setting), but can be specified explicitly.

Further details can be found in the underlying OpenMS::PeptideIndexing implementation.

Note
Currently mzIdentML (mzid) is not directly supported as an input/output format of this tool. Convert mzid files to/from idXML using IDFileConverter if necessary.

The command line parameters of this tool are:

INI file documentation of this tool: