Home

Overview

Cross-linking and processing of biological information is of central importance in biological research. This is particularly difficult where extensive protein datasets are concerned. Especially in mass spectrometry, which is the most common method for protein identification in proteomic studies, increasingly powerful mass spectrometers are used. This implicates exploding amounts of result data. The typical procedure for mass spectrometric protein identification begins with the proteolytic digestion of a protein sample (Figure 1). The emerging peptides, separated by liquid chromatography, are subsequently analyzed by a mass spectrometer. A search algorithm carries out the assignment of the resulting experimental fragmentation spectra to theoretical fragmentation spectra of proteins generated in silico from primary sequence databases (Aebersold et al., 2003), e.g. NCBInr at the NCBI. A widely used search algorithm for identification of proteins analyzed by MS/MS is Mascot which provides a discrimination between true and false positive search results by means of a score (Perkins et al., 1999). To allow visualization, filtering, and sorting of the protein-identification results by several criteria, the scientist requires a summary of the results in tabular form. Mascot provides a function to export the Mascot search results to an XML-file, which includes detailed information about the measured ions. However, there is no algorithm included in Mascot, which provides further automated processing of the peptide data. Therefore, the evaluation of Mascot result files is often done by tedious manual arrangement of the filtered results in a tabular form, e.g. a Microsoft Excel® spreadsheet. Additionally, most scientists are interested in more detailed information about the proteins identified by mass spectrometry. Therefore, the detailed analysis of mass spectrometry result files is often followed by extensive manual bioinformatic analysis of the identified proteins, e.g. with regard to physico-chemical parameters, protein domains or bioinformatic prediction of certain protein properties. The time necessary for this tedious procedure can be enormous, especially in cases of large-scale proteome studies.

The software presented at this website, MFA (Mascot File Analyzer) and PIC (Protein Information Crawler) - "Mayer's Spider", facilitate the fast and easy analysis of Mascot HTML- and XML-result files and automatic bulk-collection of the respective bioinformatic data.

Workflow

 

 

last modified 2008/02/07, 05:01