Overview
Cross-linking and processing of biological information is
of central importance in biological research. This is particularly difficult where extensive protein datasets are
concerned. Especially in mass spectrometry, which is the most common method for protein identification in proteomic
studies, increasingly powerful mass spectrometers are used.
This implicates exploding amounts of result data. The typical
procedure for mass spectrometric protein identification
begins with the proteolytic digestion of a protein sample
(Figure 1). The emerging peptides, separated by liquid
chromatography, are subsequently analyzed by a mass
spectrometer. A search algorithm carries out the assignment
of the resulting experimental fragmentation spectra to theoretical fragmentation spectra of proteins generated in
silico from primary sequence databases (Aebersold et al.,
2003), e.g. NCBInr at the NCBI. A widely used search
algorithm for identification of proteins analyzed by MS/MS is
Mascot which provides a discrimination between true and
false positive search results by means of a score (Perkins et
al., 1999). To allow visualization, filtering, and sorting of the
protein-identification results by several criteria, the scientist
requires a summary of the results in tabular form. Mascot provides a function to export the Mascot search results to an
XML-file, which includes detailed information about the
measured ions. However, there is no algorithm included in
Mascot, which provides further automated processing of the peptide data. Therefore, the evaluation of Mascot result files
is often done by tedious manual arrangement of the filtered
results in a tabular form, e.g. a Microsoft Excel® spreadsheet.
Additionally, most scientists are interested in more detailed
information about the proteins identified by mass
spectrometry. Therefore, the detailed analysis of mass
spectrometry result files is often followed by extensive
manual bioinformatic analysis of the identified proteins, e.g.
with regard to physico-chemical parameters, protein domains
or bioinformatic prediction of certain protein properties. The
time necessary for this tedious procedure can be enormous,
especially in cases of large-scale proteome studies.
The software presented at this website, MFA (Mascot File Analyzer) and PIC (Protein Information Crawler) - "Mayer's Spider", facilitate the fast and
easy analysis of Mascot HTML- and XML-result files and
automatic bulk-collection of the respective bioinformatic
data.

|