Last update on 25. Nov 2018 .
Protein Alignments






Protein gene sequences and (predicted) protein primary structures (= amino acid sequences) as well as protein secondary structures can be stored in the ARB database and protein alignments can be created. Using import filters amino acid sequences and/or protein secondary structures can be imported from DSSP files. Refer to ´Import Foreign Data(bases)´ and especially ´NOTES: dssp´ for information on how this is done, please. Description of the DSSP code and format as well as an example file can be found there, too.

Once a protein secondary structure is present as species in the database it can be converted to an SAI (see ´Convert Species to SAI´) to use it as reference for comparing other protein secondary structures or amino acid sequences. SAIs can be created from the protein secondary structure information in a special field named 'sec_struct', too (see ´Create SAI from protein secondary structure´). This is useful, if one has a protein secondary structure aligned along with the amino acid sequence.

An approach for visualizing matches between protein structures has been incorporated in ARB. The match computation for sequences and secondary structures is based on the Chou-Fasman algorithm (see below) or adaptions to it and depends on the used match method. The match methods are described in detail in ´Protein Match Settings´ along with all other related settings that can be configured via the 'Properties' menu.


Overview of the Chou-Fasman Algorithm

The Chou-Fasman algorithm is a statistical method for predicting a protein secondary structure from its amino acid sequence. It is based on the fact that certain amino acids tend to form or break alpha-helices ('H'), beta-sheets ('E') and beta-turns ('T'). The experimentally obtained Chou-Fasman parameters (former and breaker values) are used to predict the possible occurrence of the individual structure types which can then be merged to create a secondary structure summary. Further information on how this approach is used for protein structure match computation can be found in ´Protein Match Settings´ in section 'Description of Match Methods'.



[1] Chou-Fasman Algorithm

Details on the Chou-Fasman algorithm can be found in the original paper: "Chou, P. and Fasman, G. (1978). Prediction of the secondary structure of proteins from their amino acid sequence. Advanced Enzymology, 47, 45-148.".

[2] DSSP

The DSSP program was developed to standardize secondary structure assignment. It assigns protein secondary structures to amino acid sequences from the amino acids' crystallographic atom coordinates as specified by protein entries in the Protein Data Bank (PDB). The program can be found on the web at "". Details on the algorithm can be found in "Kabsch, W. and Sander, C. (1983). Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers, 22 (12), 2577-2637. PMID: 6667333; UI: 84128824."


The used method for protein secondary structure prediction, i.e. the Chou-Faman algorithm, is fast which was the main reason for choosing it. Performance is important for a large number of sequences loaded in the editor. However, it is not very accurate and should only be used as rough estimation. Thus, the match computation can only give an approximate overview if a given amino acid sequence matches a certain secondary structure.






Protein secondary structure in the field 'sec_struct' is not aligned automatically with the sequence (yet). It has to be aligned manually!



The editor might be unstable and may crash if sequences are not formatted.