More docs on the ARB website.
See also index of helppages.
Last update on 04. Mar 2022 .
Main topics:
Related topics:

Protein Match Settings


ARB_EDIT4/Properties/Protein Match Settings



In the 'Protein Match Settings' window the protein structure match computation can be configured. The settings are described in the following section.



Show protein structure match: Toggle display of protein match symbols

Selected Protein Structure SAI: The protein secondary structure SAI used as reference for match computation. The default is 'PFOLD'.

Filter SAI names for: Via a filter the SAIs shown in the option menu can be narrowed down to a selection of SAIs whose names contain the specified string. This is useful for a great number of SAIs to quickly find the one that should be used. Default is 'pfold'.

Match Method: The used method for protein structure match computation. Default is 'Secondary Structure <-> Sequence' which is most probable the method of choice. Details on the different methods can be found below in section 'Description of Match Methods'.

Match Symbols (only relevant for the match method 'Secondary Structure <-> Sequence'): Ten symbols that represent the match quality ranging from 0 - 100% in steps of 10%. Take care to enter exactly ten symbols. Note that spaces (' ') are symbols, too.

Pair definitions (only relevant for the match methods 'Secondary Structure <-> Secondary Structure' and 'Secondary Structure <-> Sequence (Full Prediction)'). Each line contains two textfields:

  • The left textfield contains one or more amino acid pairs. Each pair contains two characters (amino acids, gaps-characters, ...). Pairs are separated by spaces (' ').
  • The right textfield contains the match symbol used for each of the specified pairs.


Description of Match Methods  

Match Method 'Secondary Structure <-> Secondary Structure'

Use this method if you want to compare protein secondary structures only. The characters representing species secondary structures are compared one by one with the ones of the selected secondary structure SAI using the pair definitions and the defined match symbols. If undefined pairs are encountered the 'Unknown_match' symbol is displayed.

Match Method 'Secondary Structure <-> Sequence'

Species amino acid sequences are compared with the selected secondary structure SAI by taking cohesive parts of the structure - gaps in the alignment are skipped - and computing values from 0 - 100% (in steps of 10%) for the match quality which are mapped to the defined match symbols. The whole part is marked with that symbol. Note that bends ('S') are assumed to fit everywhere (=> best match symbol), and if a structure is encountered but no corresponding amino acid the worst match symbol is displayed.

Match Method 'Secondary Structure <-> Sequence (Full Prediction)'

Species amino acid sequences are compared with the selected secondary structure SAI using a full prediction of secondary structures from their sequences (via the Chou-Fasman algorithm) and comparing the characters one by one with the reference structure SAI. Note that not the structure summaries are used for comparison, but individually predicted alpha-helices ('H'), beta-sheets ('E') and beta-turns ('T'). The pair definitions are searched in ascending order, i.e. good matches first, then the worse ones. If a match is found the corresponding match symbol is displayed. Note that if a structure is encountered but no corresponding amino acid the worst match symbol is displayed.


  • The menu entry 'Properties -> Protein Match Settings' is only shown for protein alignments ('Alignment Information -> <Type of Sequences>: pro', see ´Alignment Administration´).
  • The match computation for sequences and secondary structures is based on the Chou-Fasman algorithm or adaptions to it. See ´Protein Alignments´ for explanation and reference.






!!! The match computation can only give a rough overview if a given amino acid sequence matches a certain secondary structure. Do not fully rely on it but use it as hints for aligning your amino acid sequences. !!!

!!! The match method 'Secondary Structure <-> Sequence (Full Prediction)' is experimental. It is probably not very reliable and requires a lot of computation. Thus, it should not be used for a large number of species loaded in the editor. !!!



The editor might be unstable and can crash if sequences are not formatted.